GIR Logo

Arctic Region Supercomputing Center

Multisearch Log

These are the log files from 23 - 27 June 2008. For additional log files, please return to the main log page.

Friday, 27 June
Last night, I managed to get the Multisearch from 2006 running. I've been trying to figure out how I managed to pass this before, and it seems to be based on a complicated assrotment of bean serializers. I again tried to run the search with a modified part of the code from 2006 to get the error code below. Shame.

Now I can get the Axis code to run smoothly, but I cannot get it to pass it back to the Client. I'm going to try a few things (like simplifying the object) before I look into using bean seralizers.

Update: I fixed a string error, and now am getting a new set of errors from Axis:

Exception in thread "main" AxisFault
faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
faultSubcode:
faultString: org.xml.sax.SAXParseException: Premature end of file.
faultActor:
faultNode:
faultDetail:
      {http://xml.apache.org/axis/}stackTrace:org.xml.sax.SAXParseException: Premature end of file.

I am fairly certain the best option is to have a serializable Document object that can be passed over the wire. I'm making some important descisions regarding this design now.

edu.arsc.multisearch.Document object
 
String filename :: extended filename
String title :: title of file
int rank :: rank of document in set
double score :: score of document in set
 
int oldrank (for updating needs)
double oldscore (for updating needs)
These last two do not need to be present "over the wire"...

Since passing information over the wire in an array seems to make everything unhappy, let's wrap it in another object.

edu.arsc.multisearch.ResultSet object
 
Document[] results :: results presented by this result set
String servername :: name of the server the results are from
String info :: any information the server wishes to provide (default: "This is a server.")

The good thing about this is that we know the basics of what we're getting from the wire and we've established a bare-basics that we want from any given service attached to Multisearch. The bad news is that other services have to agree! Java XML beans will have to be used, but at least they're a net standard. I've updated 2.3.0 NetBeans.

Thursday, 26 June
I am presently trying to wrap Lucene in an Axis Service. I am running into an issue that Axis wants to pass an array and I have a LinkedList. I'll turn it into an array, I suppose.

Partial success, we now have an Axis URI after deployment! http://snowy.arsc.alaska.edu:8080/axis/services/LuceneSearcher but there seems to be a null point exception somewhere.

Update: After restarting Snowy, it's working! Woot!

Exception in thread "main" AxisFault
faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
faultSubcode:
faultString: java.lang.reflect.InvocationTargetException
faultActor:
faultNode:
faultDetail:
    {http://xml.apache.org/axis/}hostname:snowy
 
java.lang.reflect.InvocationTargetException
at org.apache.axis.message.SOAPFaultBuilder.createFault (SOAPFaultBuilder.java:221)
at org.apache.axis.message.SOAPFaultBuilder.endElement (SOAPFaultBuilder.java:128)
at org.apache.axis.encoding.DeserializationContext.endElement (DeserializationContext.java:1087)
at org.apache.xerces.parsers.AbstractSAXParser.endElement (Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement (Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl $FragmentContentDispatcher. dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument (Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse (Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse (Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse (Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse (Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse (Unknown Source)
at javax.xml.parsers.SAXParser.parse (SAXParser.java:395)
at org.apache.axis.encoding.DeserializationContext.parse (DeserializationContext.java:227)
at org.apache.axis.SOAPPart.getAsSOAPEnvelope (SOAPPart.java:696)
at org.apache.axis.Message.getSOAPEnvelope (Message.java:424)
at org.apache.axis.handlers.soap.MustUnderstandChecker.invoke (MustUnderstandChecker.java:62)
at org.apache.axis.client.AxisClient.invoke (AxisClient.java:206)
at org.apache.axis.client.Call.invokeEngine (Call.java:2765)
at org.apache.axis.client.Call.invoke (Call.java:2748)
at org.apache.axis.client.Call.invoke (Call.java:2424)
at org.apache.axis.client.Call.invoke (Call.java:2347)
at org.apache.axis.client.Call.invoke (Call.java:1804)
at edu.arsc.multisearch.backend.lucene.webservice.LuceneSearcherSoapBindingStub. search(LuceneSearcherSoapBindingStub.java:159)
at edu.arsc.multisearch.backend.LuceneTester.main (LuceneTester.java:14)

I've tried checking all the possible errors generated by mistakes (like wrong index and such) but that doesn't seem to be the case. When I run the same strings in the original search ("gov2.dsub.1165/luc" and "government spending") I get 167 results!

Wednesday, 25 June
I've managed to work a nice file system for indexing with both Lucene and Lemur. Unfortunately, I'm having some setbacks with Lemur. I know it has some files installed in /usr/local/share, but I'm not sure where all the bin files are. I've dropped Greg a line about this to see if he installed it somewhere else and I just can't find it.

Update: Greg got back to me and it turns out that the files installed to /usr/local/bin but not in a /lemur director, which means IndriBuildIndex et all are simply free-hanging, which is fine. I tried to index GX030 with Lemur, but the files are being openned/closed with 0 Documents found! I think this is because of the tagging.

Update: Turns out, the GX030 files were all HTML files, so it couldn't index them with type trectext declared. I changed it to HTML and it works just fine now, and it's indexing away.

At 1pm I met with Chris Fallen to talk about the TREC paper we're writing, and I'll be writing about the system in general and the system performance. The paper isn't due to October, and we're not likely to have the results of the query performance until much later since TREC has to give out the relevance. But, I have a part to write in a paper! Exciting.

As of 2:01pm, we also have an Indri Index from Lemur for GX030, with roughly 94,870 files! I need to complete the following:

  1. Wrap up LuceneSearcher in an Axis Service
  2. Write up a LemurSearcher
  3. Wrap up LumerSearcher in the Axis Service

I am running into the following error with my LemurSearcher:

Exception in thread "main" java.lang.UnsatisfiedLinkError: no lemur_jni in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1682)
at java.lang.Runtime.loadLibrary0(Runtime.java:823)
at java.lang.System.loadLibrary(System.java:1030)
at lemurproject.indri.indriJNI.(indriJNI.java:97)
at lemurproject.indri.QueryEnvironment.(QueryEnvironment.java:37)>br/> at edu.arsc.multisearch.backend.lemur.LemurSearcher.main(LemurSearcher.java:21)

I checked to see how I fixed this in Lemur last time, and I realized that there needs to be some connection to the library path, so I editted my .zshrc file to add the following:

export LD_LIBRARY_PATH="/usr/local/lib"

This works, except now I am getting another error from Lemur:

Exception in thread "main" java.lang.UnsatisfiedLinkError: /usr/local/lib/liblemur_jni.so: /usr/local/lib/liblemur_jni.so: wrong ELF class: ELFCLASS64 (Possible cause: architecture word width mismatch)
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1751)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1676)
at java.lang.Runtime.loadLibrary0(Runtime.java:823)
at java.lang.System.loadLibrary(System.java:1030)
at lemurproject.indri.indriJNI.(indriJNI.java:97)
at lemurproject.indri.QueryEnvironment.(QueryEnvironment.java:37)
at edu.arsc.multisearch.backend.lemur.LemurSearcher.main(LemurSearcher.java:21)

So tomorrow I'll be working out how to fix this issue. I've also met with Darren about wrapping C++ in Java, using a good resource, but unfortunately as of yet it hasn't worked. I'll be looking into that more tonight.

Tuesday, 24 June
I worked today primarily on backends, looking at how they can be made secure and such. As of right now, Greg says super-security isn't needed, as long as we can authenticate, we should be fine. I'll be doing some minor things (like adding username & password fields to Axis Services) in case people want to try it out later.

The new Lemur package can be found in /usr/local/share/.

I currently have two lucene backends and the code working for search and index based off of some of the files I had from last year's domain searching. I finished indexing them and managed to get them working for search. Since I am controling the indexing, it's important to remember that they need to correspond with the search. "filename" and "content" are sensitive strings, or else they will return null.

Since Multisearch is a package, I want to include indexers and searchers in the packaging this time, which means that now we have a new subset of code.
 
Main Package: edu.arsc.multisearch
Backend Packages: edu.arsc.multisearch.backend
Lucene Package: edu.arsc.multisearch.backend.lucene
Lemur Package: edu.arsc.multisearch.backend.lemur

Hopefully, with this new layout, it'll make everything easier to package at the end of the process. Since Greg is okay with leaving the OGSA-DAI backends, I might keep the invoke methods for those, we'll see.

Monday, 23 June
As of 10:45am, the last TREC submission (vsmdyn) has been submitted! It ran over the weekend. I'm not sure what has caused the horrific slowdown between last weekend's runs and this one in particular. However, they're all in. Huzzah!

Today, I'm going to do the following:

  1. Update Tomcat on Nimbus
  2. Update Lucene on Snowy
  3. Install Lemur 4.6 on Snowy

I'm fairly certain this will require a lot of trouble shooting on my part, and I'm also fairly certain that almost all of my OGSA-DAI backends will fail, which is why I am putting the new Tomcat on Nimbus first. This way, if it all fails, I'll know what to expect from Snowy.

An important update on Lemur just came through:

Today's Topics:
 
  1. Lemur Toolkit 4.7/Indri 2.7 released (David Fisher)
 
----------------------------------------------------------------------
 
A full listing of the current bug fixes and enhancements are in the release notes.
 
4.7 corrects various issues in the 4.6 distribution package, adds a relevance judgment UI to the Lemur Retrieval UI; a java-based trec_eval alternative; a PageRank (TM http://www.google.com/technology/) application; a performance-enhanced harvestlinks; a Firefox query log toolbar; a SOAP server for indri repositories; and more.
 
Applications compiled with the Lemur Toolkit require the following libraries: z, iberty, pthread, and m on linux, and additionally socket and nsl on solaris. Applications built in Visual Studio require the additional library wsock32.lib. The java jar files were built with Java 5 (jdk 1.5.0). The java UIs require Java 5. We have tested using GCC 3.2 (solaris), 3.2.2(linux), 3.4(linux), 3.4.3(linux x86_64), 4.0.2(linux), VC++ .NET 7.1(Windows XP), and Visual Studio 2005 (Windows XP).

Well, today is a good day to install Lemur, I suppose, since it's recently been updated.

I am presently working with updating Tomcat 6.0. It looks cleaner and easier to work with than the Tomcat I've had since last year, but it also relies heavily upon ANT and CVS, neither of which have I had experience with. I've been looking through information on the Processes for adding new applications, which seem a bit more complicated then before, although easier to package for other users. I'll be learning some of it, but it seems like it'll take a while before I can properly launch a webapp onto this new Tomcat.

Since I'm not sure how to work with .war files, I'm going to move on to Lemur until Greg replies to my e-mail. Maybe he'll have some stuff for me. If not, I'll work on the .war reading tonight.

Week-Goals:

  1. Upgrade to most recent version of Lucene
  2. Install newest version of Lemur
  3. Create at least one index with Lucene
  4. Create a Lucene searcher function that is callable from Hadoop
  5. Create at least one index with Lemur
  6. Create a Lemur Searcher function that is callable from Hadoop

Update: Greg will install Lemur for me. I have created an Indexer for Lucene and a Searcher as well, but I don't have any files to index! I'm hoping that Chris will get back to me on this one soon.

ARSC UAF

Arctic Region Supercomputing Center
PO Box 756020, Fairbanks, AK 99775

© Arctic Region Supercomputing Center 2006-2008. This page was last updated on 30 June 2008.
These files are part of a portfolio for Kylie McCormick's online resume. See the disclaimer for more information.