Informatik, TU Wien

Towards a Distributed Search Engine

In the ocean of Web data, Web search engines are the primary way to access content.

The Distributed Systems Group of the Information Systems Institute invites to the following talk:

Abstract

In the ocean of Web data, Web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters. Web data, however, is always evolving. The number of Web sites continues to grow rapidly (Over 200 millions nowadays) and there are currently more than 20 billion indexed pages. On the other hand, Internet users are above one billion and hundreds of million of queries are issued each day. In the near future, centralized systems are likely to become less effective against such a data-query load, thus suggesting the need of fully distributed search engines. Such engines need to maintain high quality answers, fast response time, high query throughput, high availability and scalability; in spite of network latency and scattered data. In this talk we present the main challenges behind the design of a distributed Web retrieval system and our research in all the components of a search engine: crawling, indexing, and query processing.

Biography

Ricardo Baeza-Yates is VP of Yahoo! Research for Europe and Latin America, leading the labs at Barcelona, Spain and Santiago, Chile. Previously full professor at Univ. of Chile and ICREA research professor at UPF in Barcelona. Co-author of Modern Information Retrieval (Addison-Wesley, 1999) among other books and publications. Member of the ACM (Fellow), AMS, IEEE (Senior), SIAM and SCCC, as well as the Chilean Academy of Sciences. Awards from American Organization States, Institute of Engineers of Chile, and COMPAQ. His research interests includes algorithms and data structures, information retrieval, web mining, text and multimedia databases, software and database visualization, and user interfaces.