High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

FRECCA based Sentence Level Text Clustering

Author(s):

Nikita Gangshetty , RNS Institute of Technology, Bangalore; Prof, Mr. Manoj Kumar H, RNS Institute of Technology

Keywords:

Sentence Clustering, PageRank, Graph based Centrality.

Abstract

Sentence clustering plays an important role in many text processing activities. In comparison with hard clustering methods, in which a pattern belongs to a single cluster, fuzzy clustering algorithms allow patterns to belong to all clusters with differing degrees of membership. This is important in domains such as sentence clustering, since a sentence is likely to be related to more than one theme or topic present within a document or set of documents. For example, consider web mining, where the specific objective might be to discover some novel information from a set of documents initially retrieved in response to some query. By clustering the sentences of those documents intuitively can expect at least one of the clusters to be closely related to the concepts described by the query terms. However, because most sentence similarity measures do not represent sentences in a common metric space, conventional fuzzy clustering approaches based on prototypes or mixtures of Gaussians are generally not applicable to sentence clustering. It is a novel fuzzy clustering algorithm that operates on relational input data; i.e., data in the form of a square matrix of pair wise similarities between data objects. The algorithm uses a graph representation of the data, and operates in an Expectation-Maximization framework in which the graph centrality of an object in the graph is interpreted as likelihood. Results of applying the algorithm to sentence clustering tasks demonstrate that the algorithm is capable of identifying overlapping clusters of semantically related sentences, and that it is therefore of potential use in a variety of text mining tasks. It also includes results of applying the algorithm to benchmark data sets in several other domains.

Other Details

Paper ID: IJSRDV2I4277
Published in: Volume : 2, Issue : 4
Publication Date: 01/07/2014
Page(s): 453-456

Article Preview

Download Article