Automatic Data Extraction from Deep Web Page |
Author(s): |
Sagar G.R , The National Institute of Engineering; Rampur Srinath, The National Institute of Engineering |
Keywords: |
data annotation, web database, wrapper generation |
Abstract |
There is large volume of information available in the World Wide Web. The information on the Web is contained in the form of structured and unstructured objects, which is known as data records. Our paper mainly concentrate on mined the data from the deep web pages, because most of data unit returned from the database are usually encoded into the result pages dynamically for human browsing. Some of the approaches used to solve this problem are manual approach, supervised learning, and automatic techniques. The manual method is not suitable for large number of pages. It is a challenging work to retrieve appropriate and useful information from Web pages. Currently, many web retrieval systems called web wrappers, web crawler have been designed. For the encoded data units to be machine process able, this is essential for many applications such as deep web data collection. Then most importantly our method displays the result as single page output. More fast and accurate at the same time, however, extracting the content from the original HTML document is complicated by the large amount of less informative and typically unrelated material such as navigation menus, forms, user comments, and ads. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database. Our experiments indicate that the proposed approach is highly effective. |
Other Details |
Paper ID: IJSRDV3I31051 Published in: Volume : 3, Issue : 3 Publication Date: 01/06/2015 Page(s): 2386-2388 |
Article Preview |
|
|