GIR Logo

Arctic Region Supercomputing Center

Multisearch Log

These are the log files from 28 July - 1 August. For additional log files, please return to the main log page.

Friday, 1 August
The LuceneDaiClient is still not working properly, with the error I was getting on Monday.

  1. Important: Test to ensure the input grabbing is completely functional (try converting to different input stream type)
  2. Add more Clients to deal with additional backends, namely OGSA-DAI backends (code needed to make client includes below)
  3. Solve the Lemur Indexing issue so that different .key searches can be used
  4. Add a method of output for File option on Hadoop - apending data to the same file.

Update: I am now getting a new error from OGSA-DAI!

uk.org.ogsadai.client.toolkit.exception.ResourceUnknownException: The data service resource 6 is unknown to this data service.
at uk.org.ogsadai.client.toolkit.exception.FaultToException. getResourceUnknownException(FaultToException.java:96)
at uk.org.ogsadai.client.toolkit.wsi.WSIDataService. perform(WSIDataService.java:186)
at edu.arsc.multisearch.client.LuceneDaiClient. call(LuceneDaiClient.java:85)
at edu.arsc.multisearch.XMLTest.main(XMLTest.java:53)

I am no longer getting errors (I am getting a clean execution), but Hadoop is still not getting values from it...

Now that I am getting a clean execution, I think there might be something about the time delay causing it to be killed by Map/Reduce. I also think that the XML File being generated is causing major slow-down. I am going to try to convert it to another kind of inputstream, as I did in ServiceRecordReader.

Thursday, 31 July
I'm still testing the ServiceRecordReader's read() function, specifically to check for any bugs on input. Right now, if a file only has one service XML listed, it doesn't read. If it has two, it will read it once...

Also, the DaiClient still isn't working. I'm trying to figure out what the HTTP issue is.

I've moved the different Client objects (including the Client base class) into a new package: edu.arsc.multisearch.client. I realize that the number of client objects might be quite large, and it makes sense to put them in a separate sub-folder.

To Do for 'clean up'

  1. Go through the various objects and ensure they are needed as-is
  2. Important: Test to ensure the input grabbing is completely functional (try converting to different input stream type)

To Do for added functionality

  1. Add a variable to the configuration files to select the merge function (or declare it in the command-line, or both)
  2. Add the other merge options (LeapOfFaith, Rank Shuffle)
  3. Test the file-input Multisearch interface to see how effective it is
  4. Add more Clients to deal with additional backends, namely OGSA-DAI backends (code needed to make client includes below)
  5. Solve the Lemur Indexing issue so that different .key searches can be used
  6. Ensure all documentation is in proper JavaDoc Format
  7. Enable alternative merge sets to print output.
  8. Add a method of output for File option on Hadoop - apending data to the same file.

I've added some important functionality: Adding a variable for the appropriate Merge set. I decided to not put this in the configuration file, because it is possible that more flexibility is needed than that. Instead, I've added it to the command line:

java edu.arsc.multisearch.Multisearch full.merge.class.name -q query terms

java edu.arsc.multisearch.Multisearch full.merge.class.name -f filename with queries

Unfortunately, as of right now the non-Naive Merge Classes aren't printing out the appropriate information...Update: Now they are fully functional!

I think it might be better to have various -variables to the command line, instead of a structured one like I have above.

java edu.arsc.multisearch.Multisearch
-m full.merge.class.name
-q query terms
-r restriction algorithms -f filename with queries

Obviously, -q and -f should not be used at the same time on the command-line.

Monday, 28 July
For this week, I'm working off of to-do lists, hopefully getting a lot of 'minor' but needed stuff done.

To Do for 'clean up'

  1. Revert FinalSet/NaiveMergeSet to reflect their original methods and ability
  2. Modify ServiceWritable so the readFields and write work properly
  3. Modify DocumentWritable so the readFields and write work properly
  4. Since ResultSet() and ResultSetWritable() don't have to be separate objects, merge them
  5. Go through the various objects and ensure they are needed as-is
  6. Test bad input to make sure the Hadoop Map/Reduce is robust (Mostly falls to ServiceInputType)
  7. Important: Test to ensure the input grabbing is completely functional (try converting to different input stream type)

To Do for added functionality

  1. Add TREC output format printing
  2. Add a variable to the configuration files to select the merge function (or declare it in the command-line, or both)
  3. Add the other merge options (LeapOfFaith, Rank Shuffle)
  4. Test the file-input Multisearch interface to see how effective it is
  5. Add more Clients to deal with additional backends, namely OGSA-DAI backends (code needed to make client includes below)
  6. Solve the Lemur Indexing issue so that different .key searches can be used
  7. Ensure all documentation is in proper JavaDoc Format

I'm mostly working on OGSA-DAI backends, since that's a major part of functionality. The trouble is that Multisearch 2007 lived off of OGSA-DAI and has tons of objects to represent the resources needed to run an OGSA-DAI Service. I've made a new package (edu.arsc.multisearch.ogsadai) for these objects so that groups who do not wish to use OGSA-DAI can simply remove it and the DaiClient object I am making as well.

A note on JavaDoc
 
Order of Block Tags
Include block tags in the following order:
 
* @param (classes, interfaces, methods and constructors only)
* @return (methods only)
* @exception (@throws is a synonym added in Javadoc 1.2)
* @author (classes and interfaces only, required)
* @version (classes and interfaces only, required. See footnote 1)
* @see
* @since
* @serial (or @serialField or @serialData)
* @deprecated (see How and When To Deprecate APIs)

I've done a lot of work with making the objects as functional as possible today. I'm going to move on to work with OGSA-DAI, which I think will be better in this new environment. There will be fewer objects!

LuceneDaiClient - Object to represent Communication to OGSA-DAI 2.2
 
This object will be a bit faulty, since it is meant to communicate with Lucene Backends from OGSA-DAI backends from Multisearch 2007. However, alternative OGSA-DAI clients can be made later. The following variables will have an assumed-default built in to Multisearch 2008:
 
String index
int min
int max
String ResourceID (before was Object ResourceBlock)

These changes are mostly due to the fact that the new Multisearch relies on simplicity (name, client class, URL) to work. The resourceID can be generated from the URL/name provided, and the index will be set to DEFAULT (which is what OGSA-DAI currently does in Multisearch 2007). The good thing about this is that the defaults were used anyway; the bad thing is that this class will have to be deprecated and re-done later.

uk.org.ogsadai.client.toolkit.exception.ServiceCommsException: A problem arose during communication with service http://snowy.arsc.alaska.edu:8080/axis/services/ogsadai/Gov227?WSDL.
at uk.org.ogsadai.client.toolkit. GenericServiceFetcher.findDataService(GenericServiceFetcher.java:212)
at uk.org.ogsadai.client.toolkit. GenericServiceFetcher.getDataService(GenericServiceFetcher.java:71)
at edu.arsc.multisearch.LuceneDaiClient. call(LuceneDaiClient.java:82)
at edu.arsc.multisearch.XMLTest.main(XMLTest.java:52)
Caused by: java.io.IOException: Server returned HTTP response code: 500 for URL: http://snowy.arsc.alaska.edu:8080/axis/services/ogsadai/Gov227?WSDL
at sun.net.www.protocol.http.HttpURLConnection. getInputStream(HttpURLConnection.java:1241)
at java.net.URL.openStream(URL.java:1009)
at uk.org.ogsadai.client.toolkit.GenericServiceFetcher. getWSDL(GenericServiceFetcher.java:303)
at uk.org.ogsadai.client.toolkit.GenericServiceFetcher. findDataService(GenericServiceFetcher.java:208)
... 3 more

Side Note: The query is automatically set to "government spending" for LuceneDaiClient at this momemnt, I need to revert it to something else later.

In short, the URL is not connecting, from what I can tell. I've restarted Tomcat, moved the LuceneActivity.jar around (in different /common/libs) and such. I'm not sure why it cannot find the URL, as I am certain it functions (I can visit the services' page). I'll be looking more into this...

ARSC UAF

Arctic Region Supercomputing Center
PO Box 756020, Fairbanks, AK 99775

© Arctic Region Supercomputing Center 2006-2008. This page was last updated on 1 August 2008.
These files are part of a portfolio for Kylie McCormick's online resume. See the disclaimer for more information.