CORPUS PROCESSING TOOLS IN BUILDING MULTILINGUAL CORPORA

Ma’rufjon Amirqulov

Vol. 1 No. 1 (2024): COMPUTER LINGUISTICS: PROBLEMS, SOLUTIONS, PROSPECTS

Articles

CORPUS PROCESSING TOOLS IN BUILDING MULTILINGUAL CORPORA

PDF

Ma’rufjon Amirqulov

Published 2024-05-22

Keywords

corpora, concordance, aligner, metadata, lemmatizator

Abstract

Corpora annotation constitutes a pivotal phase in the development of multilingual corpora, encompassing various processes like lemmatization, tagging, parsing, and more. These procedures involve the addition of metainformation into the corpus, enabling the organization of textual elements such as words into specific categories like part-of-speech, thereby facilitating comprehensive analysis of their relationships and structures. Without such meticulous sorting and understanding of interrelations, the corpus data cannot be fully leveraged to its potential. Thus, this article discusses aforementioned matters briefly.

PDF