High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Web Crawling by Means of Multithreading and Google Numerical Weighting Technique

Author(s):

Amrita Banjare , Dr. C.V. Raman University, Bilaspur; Rohit Miri, Dr. C.V. Raman University, Bilaspur; khushboo sharma, Dr. C.V. Raman University, Bilaspur

Keywords:

WWW, Bot, Spider, PR, BFS

Abstract

A web crawler (also known as a spider or a robot) is an organism for the volume downloading of web pages. Web spidering may emerge to be simply an application of BFS (Breadth First Search) technique, the genuineness is that there are numerous challenges ranging from systems concerns like organizing very large data structures. For such a massive data structures, it became a substantial challenge for single process crawlers. Web crawlers are meant for various purposes. Most importantly, they are one of the main components of search engines, SEO (Search Engine Optimization). Henceforth compelling algorithms are in demand for efficient web crawling. As a consequence it has become very important to make effectual crawling procedure, so as to finish crawling process in a prudent amount of time. There are a lot of programs out there for web crawling but it required a Web Crawler that allowed trouble-free customization. In this paper we have proposed an effectual crawling mechanism in which integration of multithreaded crawler and Google Numerical weighting technique has been done. Numerical weight of webpage is a “vote” by all other pages on the web. By applying PR (Page Rank) it should bring high quality documents so that the user gets the required pertinent information within satisfactory time.

Other Details

Paper ID: IJSRDV3I40467
Published in: Volume : 3, Issue : 4
Publication Date: 01/07/2015
Page(s): 746-751

Article Preview

Download Article