High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Survey on techniques used to crawl Web Forums

Author(s):

PARINA SHAH , L. J. Institute of Engineering & Technology, Ahmedabad; Ms. Gayatri Pandi(Jain), L. J. Institute of Engineering & Technology, Ahmedabad

Keywords:

web crawler, FoCUS, Forum Mining

Abstract

A Web Crawler is a computer program that browses the World Wide Web in automated manner, methodical or in an orderly fashion. The aim is to crawl relevant forum content from the web with minimal overhead. Forums have become very popular almost all over the world as they are open for discussions. There are innumerable new posts created by millions of Internet user's everyday upon various topics and issues. Forum crawling consists of various forum sites which are crawled depending upon the user search query of the user. Forums have similar navigation paths which are connected by specific URL types to lead users from entry pages to thread pages. Crawler reduces the web forum crawling problem by a URL type recognition problem. It also shows how to learn accurate and effective regular expression patterns of various navigation paths from automatically created training sets. Patterns of URLs are extracted and crawling is made more efficient.

Other Details

Paper ID: IJSRDV2I9380
Published in: Volume : 2, Issue : 9
Publication Date: 01/12/2014
Page(s): 730-733

Article Preview

Download Article