High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Language Identification System for Indian Languages

Author(s):

Dattesh Birappa Naik , Computer Department, Sinhgad Academy of Engineering, Pune; Pravin Prakash Maske, Computer Department, Sinhgad Academy of Engineering, Pune; Jeevan Rajendra Patil, Computer Department, Sinhgad Academy of Engineering, Pune

Keywords:

Text Classification, Language, Identification, Natural Language Processing, Indian Languages, Language Detection Methods, Naive Bayes Classifier, Markov Model, Feature of Classification, Devanagari Script, Hindi, Marathi, Konkani

Abstract

In the area of text classification, identification of the language is a big challenge and an important problem to solve in Natural Language Processing. When it comes to the Indian Languages the complexity increases many fold as there are cases of many languages sharing a single script and a single language written in multiple scripts. It is important to have the knowledge about the language of the text before giving it for further processing. The language identification system will be aimed towards study and research in language detection methods in practice e.g. Naïve Bayes Classifier, Random Forest Classifier, Artificial Neural Network, Support Vector Machine and the recent work happening in this area. This study and research will be used for building a system to identify the language of the input text based on the findings as a feature of classification. The language identification system will be targeted to be developed generically so that it can be adopted for identification of multiple Indian languages based on the training given to it. However, the initial emphasis will be given on the few languages especially those which are written using Devanagari script e.g. Hindi, Marathi, Konkani.

Other Details

Paper ID: IJSRDV4I10142
Published in: Volume : 4, Issue : 1
Publication Date: 01/04/2016
Page(s): 72-74

Article Preview

Download Article