Class Summary
ColumnExtractor Created on November 3, 2004, 8:41 AM Extracts specified columns from a file regular args: queries_even.txt.result 4 5 6 7 8 9 10 lucene args: queries.txt.result 4 5 6 7 8 9 10 desc run args:
Defs Definitions
FileConverter Takes result file from lucene and qrel file and combines them in the file that could be used for training input...output Created on October 17, 2004, 2:39 PM
MatlabToTREC Created on November 3, 2004, 11:16 AM Merges output from matlab with the lucenes output and creates a file that could be evaluated by trec_eval TREC: qid iter docno rank sim run_id
QueryRelevance Used for storing QueryRelevance Judgments Created on October 17, 2004, 2:22 PM
ResultMerger Created on January 9, 2005, 11:34 AM Merges results from two files; one by one; Files must have equal number of docs / query
RobustEvalFix Used for fixing robust_eval script so that the queries match with the script Created on November 15, 2004, 10:21 AM
RobustEvalFixTest JUnit based test Created on November 15, 2004, 8:44 PM
RStat Represents topicid avep p10 for the use by roubust_eval
StatsParser Created on October 15, 2004, 2:34 PM Takes as input output from ret_eval -q and produces output in form: topicid avep p10 for the use by roubust_eval

TREC Module

Provides various utilities that allow to run and evaluate peformance of Lucene on the test data from TREC Robust Retrieval Track.


  1. Index Files
    I have modified org.apache.lucenesandbox.xmlindexingdemo package so that it works with TREC Robust 2004 data files. In order to index files execute org.apache.lucenesandbox.xmlindexingdemo.IndexFiles
  2. Perform Search
    I have modified SearchFiles so that it outputs data in TREC format. For usage details see javadoc. There is a sample program that could be launched be executing 'demo'.
  3. Evaluation - TREC
    It is necessary to evaluate search results with TREC programs; you can just use TREC's standard program - trec_eval, in order to perform evaluation.
    ex. /mnt/hgfs/thesis/trec_eval/trec_eval.7.0beta_linux/trec_eval -q qrels.robust2004.txt $1.result > $1.trec_eval.out
  4. Stats Parsing
    Values of P10 are necessary to run the robust_eval script, but they are not produced in the necessary form by trec_eval script. Program com.hrstcs.trec.StatsParser parses trec_eval's result file and produces necessary stats in the format usable by robust_eval.
    ex. /usr/java/jre1.5.0_01/bin/java -Xmx256m -cp /mnt/hgfs/thesis/lucene/aditional_src/bin com.hrstcs.trec.StatsParser $1.trec_eval.out $
  5. Evaluation - TREC Robust
    In some of the runs not all of the topics are present, program com.hrstcs.trec.RobustEvalFix adjusts '' script so that only present topics are evaluated.
    ex. /usr/java/jre1.5.0_01/bin/java -Xmx256m -cp /mnt/hgfs/thesis/lucene/aditional_src/bin com.hrstcs.trec.RobustEvalFix /mnt/hgfs/tdata/original/ $1.result
    Finally, in order to perform robust evaluation:
    ./ $ > $1.robust_eval.out

Reports contains utilities that could be used to create reports from various runs.

Neil O. Rouben