Language Identification System for Indian Languages |
Author(s): |
Dattesh Birappa Naik , Computer Department, Sinhgad Academy of Engineering, Pune; Pravin Prakash Maske, Computer Department, Sinhgad Academy of Engineering, Pune; Jeevan Rajendra Patil, Computer Department, Sinhgad Academy of Engineering, Pune |
Keywords: |
Text Classification, Language, Identification, Natural Language Processing, Indian Languages, Language Detection Methods, Naive Bayes Classifier, Markov Model, Feature of Classification, Devanagari Script, Hindi, Marathi, Konkani |
Abstract |
In the area of text classification, identification of the language is a big challenge and an important problem to solve in Natural Language Processing. When it comes to the Indian Languages the complexity increases many fold as there are cases of many languages sharing a single script and a single language written in multiple scripts. It is important to have the knowledge about the language of the text before giving it for further processing. The language identification system will be aimed towards study and research in language detection methods in practice e.g. Naïve Bayes Classifier, Random Forest Classifier, Artificial Neural Network, Support Vector Machine and the recent work happening in this area. This study and research will be used for building a system to identify the language of the input text based on the findings as a feature of classification. The language identification system will be targeted to be developed generically so that it can be adopted for identification of multiple Indian languages based on the training given to it. However, the initial emphasis will be given on the few languages especially those which are written using Devanagari script e.g. Hindi, Marathi, Konkani. |
Other Details |
Paper ID: IJSRDV4I10142 Published in: Volume : 4, Issue : 1 Publication Date: 01/04/2016 Page(s): 72-74 |
Article Preview |
|
|