Testbed for Information Extraction from Deep Web

1. Introduction
Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages has been referred to as the Deep Web. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114540 pages with forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.

2. Download 3. Files
4. Evaluation measure

A listing of publications about Directory Architecture for Integrated Search Engines


Hirokawa lab.
Mail: daisen(at)matu.cc.kyushu-u.ac.jp
Kyushu University
Hakozaki 6-10-1, Higasi-ku, Fukuoka 812-8581, Japan
Tel: +81-92-642-2296
Fax: +81-92-642-2294