An Optimized Clustering Method for Document Indexing of Web Pages |
Author(s): |
| Puninder Kaur , ssiet derabassi; Gaganpreet Kaur, ssiet derabassi |
Keywords: |
| Clustering, data mining, graph based search, information retrieval |
Abstract |
|
Clustering is a tool in an Information Retrieval field, which try to identify the group of documents which is more similar than others. Some literature says that it is a tool to find patterns in the data. Most of these clustering algorithms rely on some external functions like similarity functions, criterion functions, algorithm and initial condition, similarity measure, etc. Suffix Tree Clustering STC is a tree based data structure to display the characteristic of documents in terms of common phrases and perform string and query matching. From a survey, it was revealed that half of the online users did not get what they are actually looking for in the web using any search engine. Suppose, you have a million of text file in your server or in your computer, then there is a need to categorize them on the basis of their content in a very efficient way. As a result, IR (Information Retrieval) tool has been developed it provides a more effective ways for users to categorize relevant data. In this research work, a new an optimized clustering method for document indexing of web pages based on document index graph (DIG) is proposed. It will generate the clusters based on the common phrases and also on the single terms. The technique will be implemented in accuracy and total time consumption of algorithm. The proposed technique will be implemented in online or in offline bases. In the online side, the webpage checking and phrase searching in web pages are implemented. In the offline side, the downloaded pages of different websites are processed, clustered and histogram is designed. The results obtained are satisfactory. |
Other Details |
|
Paper ID: IJSRDV5I40902 Published in: Volume : 5, Issue : 4 Publication Date: 01/07/2017 Page(s): 750-754 |
Article Preview |
|
|
|
|
