Automatic Data Extraction from Deep Web Page

Sagar G.R; Rampur Srinath

Automatic Data Extraction from Deep Web Page

Author(s):

Sagar G.R , The National Institute of Engineering; Rampur Srinath, The National Institute of Engineering

Keywords:

data annotation, web database, wrapper generation

Abstract

There is large volume of information available in the World Wide Web. The information on the Web is contained in the form of structured and unstructured objects, which is known as data records. Our paper mainly concentrate on mined the data from the deep web pages, because most of data unit returned from the database are usually encoded into the result pages dynamically for human browsing. Some of the approaches used to solve this problem are manual approach, supervised learning, and automatic techniques. The manual method is not suitable for large number of pages. It is a challenging work to retrieve appropriate and useful information from Web pages. Currently, many web retrieval systems called web wrappers, web crawler have been designed. For the encoded data units to be machine process able, this is essential for many applications such as deep web data collection. Then most importantly our method displays the result as single page output. More fast and accurate at the same time, however, extracting the content from the original HTML document is complicated by the large amount of less informative and typically unrelated material such as navigation menus, forms, user comments, and ads. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database. Our experiments indicate that the proposed approach is highly effective.

Other Details

Paper ID: IJSRDV3I31051
Published in: Volume : 3, Issue : 3
Publication Date: 01/06/2015
Page(s): 2386-2388

Article Preview

Download Article

Email To A Friend

CALL FOR PAPERS : Aug-2026

ADVANCED SEARCH

NEWS & UPDATES

FOR AUTHORS

FOR REVIEWERS

ARCHIVES

DOWNLOADS