GIR Logo

Arctic Region Supercomputing Center

Backends: Information and Installation

Introduction to Backends
Backends are the service-side of a web service. Multisearch is compatible with OGSA-DAI and Axis services right now. Most of the backends use Apache Lucene, although a few of them use Lemur for indexing and searching. Since there are different kinds of backends now, the specifications of the web services would depend on the type.

OGSA-DAI Service Backends
Since OGSA-DAI is the main architecture for Multisearch, all the backends for Multisearch are created with OGSA-DAI. This means that they are build off of Axis, and if you follow their URIs, they will generally say "Here is an Axis Service!" unless a means to invoke the service have additionally been added.

OGSA-DAI services consist of a Data Resource Accessor and an Activity that can be performed. Presently, Multisearch Backends have an Activity that allows for searching through Lucene Indexes. However, if you are using a non-Lucene backend, or need to modify the analyzer for searching the Lucene backend, or want to change the XML of the result set being returned to the Multisearch client, you'll have to modify or add your own Activity and Data Resource Accessor.

WSI 2.2 User Guide has help for developing Activities and Data Resource Accessors. It is recommended that you create the Data Resource Accessor before you make the Activity.

Multisearch exceptions and message bundles are OGSA-DAI compliant so that they can be logged in meanful ways. If you are having issues or errors with Multisearch, the best way to handle it is to turn on the logging function associated with OGSA-DAI. More information can be found at How to Enable Logging.

Axis Service backends
Axis is an Apache resource for launching services onto Tomcat. The user has to design a program and then add a wrapper. There is a much better description of this method on On Java. It describes how to wrap the code and put it on Tomcat.

Making Files Available for New Services
If you add a new service, even with Lucene, you might want to make the files available to read for people who use that service. Multisearch servlet does this by providing a link to where the file is hosted on a server. While it is possible to send all of the text over the wire with the result set in XML, it greatly increases bandwidth and the time it takes to transfer.

Making files accessible on a server is not difficult. Before you index the files with Lucene (or any other indexer), you can place them so they can be found online. First, ensure Tomcat is installed and working. Then, move the folder with all of the data to $CATALINA_HOME/webapps/someName/. Edit the file $CATALINA_HOME/conf/server.xml and add the following line between the tags.

<Context path="/someName" debug="0" reloadable="false"></Context>

This will allow anyone to load http://server.name:8080/someName and find all of your files. When the XML is returned to the Front-End client, the client can use the BaseURI of the server to generate a document location. It would be best, however, if you simply passed the entire URI of the file as the file location, as this will prevent errors in location. If, for some reason, this is not working for you, look at the code in SearchThread.java, under makeDocs() and in Document() in Document.java.

If you don't host the files on a server and don't pass their contents over the wire in an XML file, you might want to provide a method of reading those files some other way, in case the user needs to check the contents of the file.

Document Standards
While different types of documents might have different representations, Multisearch has a standard need for certain variables of a Document. More can be added, but at least these are required to work with the current set up of Multisearch.

Title: The title of the document, or "Untitled"
Rank: The rank of the document
Score: The score of the document
Service Name: The name of the backend where it comes from

Old rank and old score will be added, but all doucments only need the above.

Customize Backends: OGSA-DAI
To create your own backend, you must first know what kind of data you're working with, how you plan on indexing it, and how you plan on searching it. You must establish what kind of information you'll need from the client-side to run a search. For instance, Multisearch currently sends a query string.

To add an OGSA-DAI service, it's important to read-up on the following:

  1. Read the Architecture Data for OGSA-DAI, and ensure OGSA-DAI is successfully installed on your server. You can play with the various Client Toolkit examples to do this.
     
  2. Read up on, and create, a Data Resource Accessor.
    This will give you a solid understanding of some of the functions you might call in the Activity, as well as knowledge of how to deal with files. Most of the Resource Accessor can probably be taken from examples provided in OGSA-DAI. Downloading the source code and looking over it can be helpful. You can also Download the DRA used for Lucene in Multisearch, and see how it works.
     
  3. Put together an Activity (or, if your needs are more advanced, a Configurable Activitiy).
     
  4. Write a Client-Side Toolkit to work with Multisearch. (See Multisearch 2.0 - Custom Backends for writing client-side toolkits.
  5. As an example, launching a backend on Snowy for Multisearch 2.0 or 3.0 is as simple as the following commands: (in the directory /home/mccormic/ogsadai with a username that has declared the Ddai containter at $CATALINA_HOME)

    ant deployService -Ddai.service.name=service/name
     
    ant deployResource -Ddai.resource.file=dai.prop.file
     
    ant exposeResource -Ddai.service.name=service/name -Ddai.resource.id=id

    Contents of dai.prop.file
    dai.resource.id=id-name
     
    # dai.data.resource.type=Relational
    # dai.data.resource.type=XML
    dai.data.resource.type=Files
    # dai.data.resource.type=MultiResource
     
    dai.product.name=Lucene
    dai.product.vendor=Apache
    dai.product.version=1.4.3
    dai.data.resource.uri=/home/mccormic/merge/luceneservice
     
    dai.credential=
    dai.user.name=
    dai.password=

    Deploying resouces and services on OGSA-DAI can be found in detail at OGSA-DAI's website. Also, Multisearch 2.0 - Adding Backends's Adding New Backends has other information.

    Customize Backends: Axis
    Axis backends are much less structured than OGSA-DAI. Again, On Java has a great section on how to put together a cohesive Axis web serivice. In order to contact an Axis backend, there needs to be a Client that can serialize the beans to contact it.

    Add a Backend to the Run
    Hadoop takes in a large amount of data and splits it up into sections to process it. In Multisearch's case, this data is the service backend data. A service is defined in XML by the following:

    <service>
    <url> </url>
    <clientclass> <clientclass>
    <name> <name> </service>

    The URL is the location of the backend. The name is the specific (and hopefully unique, although it is not required) name for the backend. The client class must be the full name of the class that can connect to the service given the URL and the query. (eg- edu.arsc.multisearch.client.Client)

    Each service should be in its own file in the directory of input.

ARSC UAF

Arctic Region Supercomputing Center
PO Box 756020, Fairbanks, AK 99775

© Arctic Region Supercomputing Center 2006-2008. This page was last updated on 13 August 2008.
These files are part of a portfolio for Kylie McCormick's online resume. See the disclaimer for more information.