An Introduction to Multisearch

Grid Information Retrieval
Grid Information Retrieval (GIR) is a combination of Grid computing and Information Retrieval, which seeks to allow users to find documents and other media through searching with keywords in a distrubed environment. More information about GIR can be found on GIR-WG.

GIR uses the standards generated by the grid community to create distributed search engines. Monolithic search engines, such as Google, are currently being used by the majority of users of the internet. However, there are benefits of distributed searching, such as allowing users with the correct credentials to have access to private or secure information, which typical search engines generally don't have access to. Distributed searching also enables smaller services to be combined into one.

However, GIR also relies on having each server returning its own result list, which can lead to complications. How can one front-end search get various result sets from many different servers, combine them in a meaningful way, and produce them to the user in a reasonable amount of time? Doing research on different algorithms and methods for this is very important to developing GIR.

Multisearch is designed to be a research software package to enable groups to test new algorithms and other ideas.

Multisearch's Goal
Multisearch's goal is to provide users with a software base to test new algorithms for GIR.

In 2007, Multisearch used the standards provided by OGSA-DAI WSI 2.2 to run searches over multiple backends. In 2006, Multisearch used Axis Web Services. Now, in 2008, Multisearch uses both Axis Web Services and OGSA-DAI WSI to create backend services to search, and it also uses the Hadoop Architecture to run the search quickly.

In the summer of 2006, Multisearch was created as a servlet to have different merge algorithms tested. Merge algorithms define how all the result sets are combined into a single set. Later on, Fallen and Newby decided to explore methods to improve the performance of Multisearch on a large scale. So, in summer 2007, they also wanted to test algorithms that would restrict the number of servers to be searched.

Currently, Multisearch has server-selection or restriction algorithms as well as merge algorithms. It was designed to be an architecture on which other groups can build, so it is easy to add new restriction and merge algorithms to be tested in Multisearch.

It is hoped that, by provided architecture and a basis for research, that Multisearch can be expanded and used to study many different aspects of GIR.

Arctic Region Supercomputing Center
PO Box 756020, Fairbanks, AK 99775

© Arctic Region Supercomputing Center 2006-2008. This page was last updated on 8 August 2008.
These files are part of a portfolio for Kylie McCormick's online resume. See the disclaimer for more information.