Corpus
Corpus is viewed as a principled collection of authentic electronic texts that are usually used for further language investigation [Bennet 2010, p. 12 ]. Therefore, the corpus should meet three requirements: - be principled; - include authentic texts only; - be stored electronically.
There are many types of corpuses, such as: - a written corpus contains texts that have been produced or published in written format (e.g. traditional books, novels, textbooks, newspapers, magazines or unpublished letters and diaries etc.); - a spoken corpus consists entirely of transcribed speech (e.g. spontaneous informal conversations, meetings, debates, classroom situations etc.); - a static corpus is intended to be of a particular size – once that target is reached, no more texts are included in it; - a dynamic corpus is continually growing over time, as opposed to a static corpus, which does not change in size once it has been built; - a specialized corpus has been designed for a particular research project; - a raw corpus has not been processed in any way, it contains no annotation; - a pedagogic corpus is used for language teaching and consists of all of the language to which a learner has been exposed in the classroom; for example, the texts and exercises that the teacher has used; - a learner corpus is a special corpus type, consisting of language output produced by learners of a language; - a parallel corpus consists of two or more corpora that have been sampled in the same way from different languages; - a national corpus is a large corpus that attempts to represent a range of the language used in a particular national language community; - a dialect corpus is a specialized spoken corpus, which is compiled in order to carry out studies of regional variation; - a diachronic corpus is a corpus that has been carefully built in order to be representative of a language or language variety over a particular period of time, so that it is possible for researchers to track linguistic changes within it; - a balanced corpus contains texts from a wide range of different language genres and text domains and the relative sizes of each of its subsections have been chosen with the aim of adequately representing the range of language that exists in the population of texts being sampled.
⠀ Жуковська, В.В. (2013). Вступ до корпусної лінгвістики. Навчальний посібник. Вид-во ЖДУ ім. І.Франка. Житомир. 142 с.
⠀ Bennet, G.R. (2010). Using corpora in language learning classroom: corpus linguistics for teachers. Michigan ELT. Retrieved from: https://www.press.umich.edu/371534/using_corpora_in_the_language_learning_classroom