Multisearch Log
These are the logs from the week of 11-15 August. To see more, view the main log page.
Friday, 15 August
I am working on running the larger data sets w/queries so I can run the shorter ones remotely. I'm also finishing up this week's to-do list:
- Write up website entries
- Update TREC paper
- Write MS 2008 Paper
Thursday, 14 August
Presentation given on Multisearch [PPT] [PDF]
Wednesday, 13 August
I'm trying to download files from Midnight. Chris made a few generated files with information about runs, which I would like to start using on Multisearch to gage its speed. I am doing the following runs (hopefully before I leave):
Merge Algorithm |
Restriction Algorithm |
Limit (Backends) |
Naive |
Random |
10 |
Naive |
Random |
50 |
Naive |
Random |
100 |
Naive |
None |
n/a |
Naive |
Matrix |
10 |
Naive |
Matrix |
50 |
Naive |
Matrix |
100 |
RankShuffle |
Random |
10 |
RankShuffle |
Random |
50 |
RankShuffle |
Random |
100 |
RankShuffle |
None |
n/a |
RankShuffle |
Matrix |
10 |
RankShuffle |
Matrix |
50 |
RankShuffle |
Matrix |
100 |
LeapOfFaith |
Random |
10 |
LeapOfFaith |
Random |
50 |
LeapOfFaith |
Random |
100 |
LeapOfFaith |
None |
n/a |
LeapOfFaith |
Matrix |
10 |
LeapOfFaith |
Matrix |
50 |
LeapOfFaith |
Matrix |
100 |
Restriction algorithms
- Integrate C++ to Java for MatrixSelect
- Write up website entries
- Update TREC paper
- Write MS 2008 Paper
- MS 2008 Presentation
Perhaps add a few new Lemur backends if some gov2 are missing
All of the backends are defined in allinput/, which is formally located at /home/mccormic/merge/tomcat/webapps/multisearch/WEB-INF/classes/allinput/
I've pruned through this, trying to remove redundant and/or faulty backends. Now there are about 1000 of them properly. Yay! I've also made note of the three longest runs -- the ones that will run over all backends. I might run these last (since they take so long) or first... I haven't decided yet.
Tuesday, 12 August
I've tried to run Hadoop/Multisearch with all the backends on Snowy, and the map() pans out fine, but not the merge. I want to try to change the addRanked() function of OrderedList to use binary search! Maybe that will speed things up!
Update: It seems to, definitely. I might also want to try adding quickly (to the end) and sorting at finish instead of insertion points. Now, back to the list...
Full query parsing (simply remove all special characters & lowercase)
Add timing mechanism
- Restriction algorithms
- Fix Lemur indexing issue with .key file
- Write up website entries
Load all backends from Snowy/Pileus
- Perhaps add a few new Lemur backends if some gov2 are missing
The server-selection sections are working, except the cleanup isn't. I need to delete the files generated so the next search can generate their own files. I know that a directory needs to be emptied before it can be deleted, so I am trying some stuff...
Monday, 11 August
The home stretch!
- Full query parsing (simply remove all special characters & lowercase
- Add timing mechanism
- Restriction algorithms
- Fix Lemur indexing issue with .key file
- Write up website entries
Send off TREC paper writing to Chris
- Load all backends from Snowy/Pileus
- Perhaps add a few new Lemur backends if some gov2 are missing