GIR Logo

Arctic Region Supercomputing Center

Multisearch

The Basics
Apache Hadoop architecture. It has an additional servlet component from Apache Tomcat but does not require it as a middle ware.

Multisearch is a front-end that can have backends plugged into it. A backend is any online service that allows for searching of a given index. Generally speaking, Multisearch sends a query to the backend and gets a result set of documents back.

Multisearch Front-End
The Multisearch front-end presents the user with an interface and interacts with the backends. Instead of having a configuration file, Multisearch has a number of command-line components (or servlet-entered components) that are used.

Commands for Multisearch
java edu.arsc.multisearch.Multisearch -q query terms -i input/folder -r restriction.algorithm.name -m merge.algorithm.name -l #-of-services -o Output.Style.type
 
In short, the parameters are the following:
-q query terms as many as could be
OR -f filename.txt for input (only one or the other)
-i input/folder/path (default is allfiles/)
-r restriction.class.full.name (default is none)
-l integer - # of services desired for limit
-m merge.algorithm.name (default is Naive Merge)
-o output.style.type (default is TREC)

The query or filename are required. The file must have at least one query, and the format must conform to the following:

#:query
#2:query

Each number must be a unique integer. Queries do not need to be unique.In order to modify the servlet (to add new classes), one must have a working knowledge of HTML. Edit the MultisearchServlet.java class and go to the printForm() function. Each variable has a section. I will use OutputFormat for this example.

<b>OutputFormat</b>:
<select size="1" name="OutputFormat">
<option value="edu.arsc.multisearch.ResultSetOutputFormat">Normal</option>
<option value="edu.arsc.multisearch.TRECOutputFormat">TREC</option>
</select>

The user sees the value in between <option></option>. However, the servlet returns the value stored in the value="" section. In this case, they are Strings that point to the class names of the output format.

Multisearch Backends
There is an entire section on Backends. Since Multisearch can use both OGSA-DAI resources and Axis resources, it is possible to have a heterogenous grouping of backends and Multisearch will still run smoothly.

ARSC UAF

Arctic Region Supercomputing Center
PO Box 756020, Fairbanks, AK 99775

© Arctic Region Supercomputing Center 2006-2008. This page was last updated on 29 May 2008.
These files are part of a portfolio for Kylie McCormick's online resume. See the disclaimer for more information.