High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Survey on Crawler for Harvesting Deep Web Interfaces

Author(s):

Pooja M. Taide , Gurunanak Institute of Technology, Nagpur; Vijaya Kamble, Gurunanak Institute of Technology, Nagpur

Keywords:

Deep Web, Crawler, Feature Selection, Ranking, Adaptive Learning

Abstract

Due to heavy usage of internet large amount of diverse data is spread over it which provides access to particular data or to search most relevant data. It is very challenging for search engine to fetch relevant data as per user’s need and which consumes more time. So, to reduce large amount of time spend on searching most relevant data we proposed the “Smart Crawler”. In this proposed crawler framework, which efficiently retrieves deep-web interfaces from large-scale sites, easily get the information which is stored in web databases. In the first stage, Smart Crawler performs site-based searching for center pages with the help of search engines, avoiding visiting a large number of pages. To achieve more accurate results for a focused crawl, Smart Crawler ranks websites to prioritize highly relevant ones for a given topic. In the second stage, Smart Crawler achieves fast in-site searching by excavating most relevant links with an adaptive link-ranking. To eliminate bias on visiting some highly relevant links in hidden web directories, we design a link tree data structure to achieve wider coverage for a website. Our experimental results on a set of representative domains show the agility and accuracy of our proposed crawler framework, which efficiently retrieves deep-web interfaces from large-scale sites and achieves higher harvest rates than other crawlers. We propose a two stages framework, namely Smart Crawler, for efficient harvesting deep web interfaces.

Other Details

Paper ID: IJSRDV6I110160
Published in: Volume : 6, Issue : 11
Publication Date: 01/11/2019
Page(s): 271-274

Article Preview

Download Article