High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Crawler with Search Engine based Simple Web Application System for Forum Mining

Author(s):

Parina Shah , L.J.Institute of Engineering and Technology; Ms.Gayatri Pandi (Jain), L.J.Institute of Engineering and Technology

Keywords:

URL elimination, ITF, Web Crawler

Abstract

A Web Crawler or a Bot or an Indexer is a program that visits Web sites for reading the content of pages and other information so that it can create index for search engine. Here, the aim of web crawler is to crawl relevant content from the Web Forum with minimal overhead. Forums are an open source portal for information exchange. Duplicate URL elimination as well as grouping of Page Flipping URLs having similar layout is done. Web Forums have navigation paths which are similar that are connected by specific URL types which lead users from entry page to thread page. Last modified date of the post, number of the threads or posts is also collected to know about the updated thread or post. The precision and recall value achieved for the entry pages were 98.03% and 96.02% respectively. Crawler achieved 98.96% coverage and 98.32% effectiveness by eliminating irrelevant information and URLs.

Other Details

Paper ID: IJSRDV3I40687
Published in: Volume : 3, Issue : 4
Publication Date: 01/07/2015
Page(s): 1273-1280

Article Preview

Download Article