High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Automated Phrase Mining and Phrasal Segmentation from Massive Corpora

Author(s):

Saranya U , Cochin College of Engineering and Technology; Uma E. S, Cochin College of Engineering and Technology

Keywords:

Quality Phrases, AutoPhrase, POS Tagger

Abstract

As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality phrases from a text corpus. Compared to the state-of- the-art methods, the new method has shown significant improvements on effectiveness on five real-world datasets in different domains recently, a few data-driven methods have been developed successfully for extraction of phrases from massive domain-specific text. However, none of the state-of-the-art models is fully automated because they require human experts for designing rules or labeling phrases. In this paper, we propose a novel framework for automated phrase mining, AutoPhrase, which can achieve high performance with minimal human effort. In addition, we develop a POS-guided phrasal segmentation model, which incorporates the shallow syntactic information in part-of-speech (POS) tags to further enhance the performance, when a POS tagger is available. Note that, AutoPhrase can support any language as long as a general knowledge base (e.g., Wikipedia) in that language is available, while benefiting from, but not requiring, a POS tagger.

Other Details

Paper ID: IJSRDV7I10632
Published in: Volume : 7, Issue : 1
Publication Date: 01/04/2019
Page(s): 825-828

Article Preview

Download Article