Automated Phrase Mining and Phrasal Segmentation from Massive Corpora |
Author(s): |
| Saranya U , Cochin College of Engineering and Technology; Uma E. S, Cochin College of Engineering and Technology |
Keywords: |
| Quality Phrases, AutoPhrase, POS Tagger |
Abstract |
|
As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality phrases from a text corpus. Compared to the state-of- the-art methods, the new method has shown significant improvements on effectiveness on five real-world datasets in different domains recently, a few data-driven methods have been developed successfully for extraction of phrases from massive domain-specific text. However, none of the state-of-the-art models is fully automated because they require human experts for designing rules or labeling phrases. In this paper, we propose a novel framework for automated phrase mining, AutoPhrase, which can achieve high performance with minimal human effort. In addition, we develop a POS-guided phrasal segmentation model, which incorporates the shallow syntactic information in part-of-speech (POS) tags to further enhance the performance, when a POS tagger is available. Note that, AutoPhrase can support any language as long as a general knowledge base (e.g., Wikipedia) in that language is available, while beneï¬ting from, but not requiring, a POS tagger. |
Other Details |
|
Paper ID: IJSRDV7I10632 Published in: Volume : 7, Issue : 1 Publication Date: 01/04/2019 Page(s): 825-828 |
Article Preview |
|
|
|
|
