You can read summary of research and development in this page.
Thesis is here. Abstract in Japanese.
Research and development of the next-generation search engine
by dynamic integration of search sites
The information on WWW continues increasing explosively. It is an important problem how we utilize them in an information society now and in the future. General search engines, such as Yahoo! and Google, are expected as keys to this problem. But there is the problem with respect to the quality of a reference result. To provide information with high quality, many companies and organizations have their own search facility on their sites, which we call search sites. In this research and development, we realized a system which integrates these search sites suitable for the users' purpose.
The system consists of the feature extraction system, the data management system of search sites, and the clustering system. Users can choose search sites according to the keywords which are provided by the system, or can traverse the directory for selecting the search sites. The directory mapping of the search sites are realized by comparing the feature vector of the sites and that of the node in the directory. Another core technology of the system is automatic wrapper generation. This is based on the algorithm of extraction of tag patterns from HTML files.
This system can be a basis for efficient communication of high quality information, from the provider side to the consumer side. We expect that this system gives the big social impact as a next-generation search engine.