High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

A Comparitive Analysis Multilingual Word Tagging In Transliterated Content

Author(s):

Yagnesh Jani , L.D.R.P. I.T.R.; Sandip Modha, L.D.R.P. I.T.R.

Keywords:

transliteration, devnagari script, roman script

Abstract

In this report we consider the problem of labeling the languages of words in mixed-language documents and remove disambiguity. For many languages that use non-Roman based indigenousscripts (e.g., Arabic, Greek and Indic languages) one can often find a large amount of user generated transliterated content on the Web in the Roman script. This content createsa monolingual or multi-lingual space with much script which we refer to as the Transliterated-Script space. This report describes a method to tag the word in multilingual transliterated content and remove disambiguity. The technique is based on the heuristic approach and comparing to different methods results. Therefore, it is a non-trivial task to tag the word and find disambiguity.

Other Details

Paper ID: IJSRDV3I21274
Published in: Volume : 3, Issue : 2
Publication Date: 01/05/2015
Page(s): 2187-2188

Article Preview

Download Article