A Comparitive Analysis Multilingual Word Tagging In Transliterated Content |
Author(s): |
| Yagnesh Jani , L.D.R.P. I.T.R.; Sandip Modha, L.D.R.P. I.T.R. |
Keywords: |
| transliteration, devnagari script, roman script |
Abstract |
|
In this report we consider the problem of labeling the languages of words in mixed-language documents and remove disambiguity. For many languages that use non-Roman based indigenousscripts (e.g., Arabic, Greek and Indic languages) one can often find a large amount of user generated transliterated content on the Web in the Roman script. This content createsa monolingual or multi-lingual space with much script which we refer to as the Transliterated-Script space. This report describes a method to tag the word in multilingual transliterated content and remove disambiguity. The technique is based on the heuristic approach and comparing to different methods results. Therefore, it is a non-trivial task to tag the word and find disambiguity. |
Other Details |
|
Paper ID: IJSRDV3I21274 Published in: Volume : 3, Issue : 2 Publication Date: 01/05/2015 Page(s): 2187-2188 |
Article Preview |
|
|
|
|
