Vol. 1 No. 1 (2024): COMPUTER LINGUISTICS: PROBLEMS, SOLUTIONS, PROSPECTS
Articles

CORPUS PROCESSING TOOLS IN BUILDING MULTILINGUAL CORPORA

Published 2024-05-22

Keywords

  • corpora, concordance, aligner, metadata, lemmatizator

Abstract

Corpora annotation constitutes a pivotal phase in the development of multilingual corpora, encompassing various processes like lemmatization, tagging, parsing, and more. These procedures involve the addition of metainformation into the corpus, enabling the organization of textual elements such as words into specific categories like part-of-speech, thereby facilitating comprehensive analysis of their relationships and structures. Without such meticulous sorting and understanding of interrelations, the corpus data cannot be fully leveraged to its potential. Thus, this article discusses aforementioned matters briefly.