Журнал «Современная Наука»

Russian (CIS)English (United Kingdom)
MOSCOW +7(495)-142-86-81

TOKENIZATION METHODS FOR TAJIK TEXT USING PYTHON

Istamuqlov Hasanjon   (PhD Student, Khujand State University named after academician B. Gafurov, Tajikistan, Khujand )

Muzafarov Dilshod   (Dean of the Faculty of Mathematics, Khujand State University named after academician B. Gafurov Tajikistan, Khujand )

This scientific article examines tokenization methods for Tajik text using the Python programming language. The authors analyze the characteristics of the Tajik alphabet and grammar, as well as typical tokenization problems related to its specificity. The article provides an overview of the main libraries and packages for text processing in Python and describes approaches to tokenization based on examples from other languages. The work presents the results of experiments using morphological, statistical, and neural network approaches to tokenization, and suggests directions for future research in this field.

Keywords:tokenization, Tajik language, Python programming language, morphological approach, statistical approach, neural networks, deep learning, natural language processing, alphabet, grammar

 

Read the full article …



Citation link:
Istamuqlov H. , Muzafarov D. TOKENIZATION METHODS FOR TAJIK TEXT USING PYTHON // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2023. -№06/2. -С. 78-82 DOI 10.37882/2223-2966.2023.6-2.16
LEGAL INFORMATION:
Reproduction of materials is permitted only for non-commercial purposes with reference to the original publication. Protected by the laws of the Russian Federation. Any violations of the law are prosecuted.
© ООО "Научные технологии"