Електронний багатомовний

термінологічний словник

Electronic Multilingual Terminological Dictionary


Linguistics

Frequency word list

Frequency word lists are a handy representation of words' functional activity in language as a whole or certain text. The parameters and properties of frequency word lists are the prime focus of NLP experts because they are used in numerous practical applications related to attribution of authorship, automatic text clustering, and classification.
The most specific statistical measures of the frequency word lists are [Sherstinova, p. 367]: Many units (tokens);
 Many units (types);
 The number of single-used words (hapax, lexical
 richness);
 Lexical density;
 Lexical diversity coefficient;
 Standardized diversity index.
There can be frequency lists of different kinds:
 Nouns, adjectives, verbs, adverbs, and other parts of speech
 Words beginning and ending, containing particular characters
 Word forms, tags, lemmas, and other attributes
 A combination of all three options above.
Three different frequency measures can be used in the wordlist: frequency per million and ARF.
The wordlist typically works on the token level. It can also be limited by setting the minimum and maximum limits [Sketch Engine].
The frequency word list counts words and lemmas. The frequency lists count different orthographic words, involving inflected and some capitalized forms (e.i., the verb “to be” can be represented by ‘is,’ ‘our,’ ‘were,’ etc.) [Wiktionary].

Sources:

⠀ Frequency word list. Wiktionary. Retrieved from: https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists#External_links.

⠀ Tatiana Sherstinova, Alexander Grebennikov, Tatiana Skrebtsova, Anna Guseva, Mary Gukasian, Irina Egoshina, Maria Turygina. (2020). Frequency Word Lists and Their Variability (the Case of Russian Fiction in 1900-1930). Proceedings of the 27th Conference o

⠀ Wordlist — frequency lists and linguistic datbases. Sketch Engine. Retrieved from: https://www.sketchengine.eu/guide/wordlist-frequency-lists/#toggle-id-2.

Part of speech Noun
Countable/uncountable countable
Type abstract
Gender neutral
Case nominative