Том 1 № 1 (2022): КОМПЬЮТЕРНАЯ ЛИНГВИСТИКА: ПРОБЛЕМЫ, РЕШЕНИЯ, ПЕРСПЕКТИВЫ
Статьи

THE EXPLOITATION OF CORPORA IN NATURAL LANGUAGE PROCESSING

Опубликован 2022-05-19

Ключевые слова

  • language analysis,
  • human intuition,
  • annotation,
  • disambiguation

Аннотация

One of the first things required for natural language processing
(NLP) tasks is a corpus. In linguistics and NLP, corpus (literally Latin for body)
refers to a collection of texts. Such collections may be formed of a single language
of texts, or can span multiple languages -- there are numerous reasons for which
multilingual corpora (the plural of corpus) may be useful. Corpora may also consist
of themed texts (historical, Biblical, etc.). Corpora are generally solely used for
statistical linguistic analysis and hypothesis testing.