Introduction Components |
Multisearch LogThese are the log files from 30 June - 3 July, 2008. For additional log files, please return to the main log page. Thursday, 3 July I think there might be a better way to do NaiveMerge, a way that allows for negative numbers to be compensated. I'll drop Chris a line about this one. Wednesday, 2 July I am planning on making a Lemur Web Service today as well, which I am very excited about. I've already indexed one index with Indri, and I'm working on another Index with the key variety in Lemur. As of right now, I have the following indexes:
I pulled a large stoplist file out to use with Lemur, since it would be cool to have different types of indexes. We ne now I have Indri, key, and Lucene! I also made a small transform.DataModify Java function. To make a Lemur Key index, you need a list of all the files containing data. The GX directories have many files that need to be indexed separately. So, there needed to be a fast way to generate the list of files that Lemur could use.
transform.DataModify - a quick how-to This can be useful later if I need to make more Lemur Key indexes. I've set up the Pseudo-Distributed Operation on Hadoop using the QuickStart instructions. I'm working on playing with the examples, too, but I wanted to ensure that all my paths were working. Now there is $HADOOP which points to /home/mccormic/hadoop/ with hadoop 0.17 installed. I also remembered that Lemur comes with different types of searching/ranking variables, all of which can be found on RetMethodManager: tfidf, okapi, kl, inquery, cori_cs, cos, ino_struct, and indri. There seems to be a minor set back with Lemur Key Indexes. Right now, it's not generating a .key file for the index, which makes it very hard to program a searcher for, since the .key file is what opens the index. I'm re-running the indexing to see if Ic an find the problem. Update: There is now a LuceneSearcher and a LemurIndriSearcher up and running, collecting and returning beans! Hopefully by tomorrow I can get a merge function going on them and get the results merged. I've put in a query about the mysterious indexing issues with Lemur, but I doubt I'll get a response before the end of the holiday weekend. Tuesday, 1 July I'm working on fully documenting the design part of Multisearch with Hadoop, although I'm certain I might have to work more on the Map half of the design when I work directly with the code. Hopefully I'll be done with the website this week. The architecture page has new graphics, although they'll need some work. I want to be able to compare the different architectures for Multisearch on this page as well, although it is very bare-bones at the moment. I've also added information to the Contacts page. Monday, 30 June The good news about this is that most services provide a client with their work because they know how they want their services to be accessed, which means it should be okay to assume the maker of a service could provide a client-side. The bad news is that it'll have to be dropped in, like the different types of algorithms and such last year. It will also have to be predicted from the service. (EG--An axis service would need to provide a client that can contact it. The same is true with an OGSA-DAI client.) I can do what I did before: use Reflection, so it's easier to add clients on the fly. However, I think I'll also need to look into other information on Axis. I've found a few resource, like this one from JavaBoutique and another OnJava reference.
Client object (Reflection, extendable, cloneable, etc.) It might be possible, given the reflection I plan on using with the Clients, to provide the object being sent over the wire and serializing it. This would mean that groups that have designed a special object would be able to use Multisearch. I've also been having an error with Lemur: Exception in thread "main" java.lang.UnsatisfiedLinkError: /usr/local/lib/liblemur_jni.so: /usr/local/lib/liblemur_jni.so: wrong ELF class: ELFCLASS64 (Possible cause: architecture word width mismatch) I've tried fixing it by updating my Library variable to include the linking file, but then I get the same error. I've tried adding -d64 to the command, which should enable it to use 64 bits, but it then tells me: Running a 64-bit JVM is not supported on this platform. Of course, this isn't true. I've dropped Greg a line about it, and Lemur is 64-bit as well as Java -- and Snowy can handle 64-bit java. I've signed up on the Lemur Toolkit Discussion page to get a username and password, but haven't gotten it yet. Hopefully I can post about this soon. Update: Post posted! Here's to hoping for fast responses! Another Update: While technically LuceneSearch and LemurSearch need an index, I'm going to remove this option from being sent over the wire. Chances are, each service will have its own index that we need not specify. Also, it's harder to do anything on-the-fly when we need to know that much information about them. Current code: LuceneSearcherImpl.java and LuceneIndexer.java I've managed to lunch a LuceneSearcher restricted to gov2.dsub.1165 currently located here as an Axis service.
Design for Multisearch Client/Service connection with Hadoop I'm not entirely certain if this makes logical sense, however, to Hadoop. In English, these are the goals (so this design will be modified around them):
I feel fairly good about the OutputReader/Reduce sections, since I am certain of how those will work. I am not certain, however, of the best way to approach the Map/OutputReader section. I am fairly certain that Multisearch will run faster on Hadoop. |
|||||||||||||||||||||
Arctic Region Supercomputing Center © Arctic Region Supercomputing Center 2006-2008. This page was last updated on 7 July 2008. |