Wednesday, 25 September 2019

ARPIT: Ambiguity Resolver for POS Tagging of Telugu, an Indian Language

Volume 7 Issue 1 March - May 2019

Research Paper

ARPIT: Ambiguity Resolver for POS Tagging of Telugu, an Indian Language

Suneetha Eluri*, Sumalatha Lingamgunta**
* Research Scholar and Assistant Professor, Department of Computer Science and Engineering, Jawaharlal Nehru Technological University Kakinada, Andhra Pradesh, India.
** Professor, Department of Computer Science and Engineering, Jawaharlal Nehru Technological University Kakinada, Andhra Pradesh, India.
Eluri, S., Lingamgunta, S.(2019). ARPIT: Ambiguity Resolver for POS Tagging of Telugu, an Indian Language, i-manager's Journal on Computer Science, 7(1), 25-35. https://doi.org/10.26634/jcom.7.1.15372

Abstract

Parts of Speech tagging (POS) is an essential preliminary task of Natural Languages Processing (NLP). Its aim is to assign parts of speech tag to each word in corpus. The basic POS tags are noun, pronoun, verb, adjective and adverb, etc. POS tags are needed for speech analysis and recognition, Machine translation, Lexical analysis like word sense disambiguation, named entity recognitions, Information retrieval and this system also helped to uncover the sentiments of given text in opinion mining. At the same time, many Indian languages lack POS taggers because the research towards building basic resources like corpora and morphological analyzers is still in its infancy. Henceforth in this paper, a POS tagger for Telugu language, a South Indian language is proposed. In this model, the lexemes are tagged with various POS tags by using pre-tagged corpus, however a word may be tagged with multiple tags. This ambiguity in tag assignment is resolved with Stochastic Machine Learning Technique, i.e. Hidden Markov Model (HMM) Bigram tagger, which uses probabilistic information built based on contextual information or word tag sequences to resolve the ambiguity. In this system, the authors have developed a pre-tagged corpus of size 11000 words with standard communal tag sets for Telugu language and the same is used for testing and training the model. This model tested with input text data consists of different number of POS tags at word level and achieved the average performance accuracy of 91.27% in resolving the ambiguity.

No comments:

Post a Comment