Vol. 1 No. 1 (2022): COMPUTER LINGUISTICS: PROBLEMS, SOLUTIONS, PROSPECTS
Articles

UZBEK TEXT ANALYSIS USING ZIPF DISTRIBUTION

Published 2022-05-19

Keywords

  • Uzbek text,
  • Zipf’s law,
  • mathematical statistics,
  • word frequency

Abstract

The frequency distribution of words has been a key object of study
in statistical linguistics. This article presents a words analysis of three works in the
Uzbek language based on the mathematical-statistical law. Words frequency
distribution for each document is calculated based on Zipf’s law. Results per
document are compared with each other and they are described in visual graphs.
This article shows that human language has a highly complex, reliable structure in
frequency distribution. Some empirical phenomena related to word frequencies are
then reviewed. These facts are chosen to be informative about the mechanisms
giving rise to Zipf’s law and are then used to evaluate many of the theoretical
explanations of Zipf’s law in language.