Package com.hrstc.trec

TREC Module

See:
          Description

Class Summary
ColumnExtractor ColumnExtractor.java Created on November 3, 2004, 8:41 AM Extracts specified columns from a file regular args: queries_even.txt.result 4 5 6 7 8 9 10 lucene args: queries.txt.result 4 5 6 7 8 9 10 desc run args:
Defs Definitions
FileConverter FileConverter.java Takes result file from lucene and qrel file and combines them in the file that could be used for training input...output Created on October 17, 2004, 2:39 PM
MatlabToTREC MatlabToTREC.java Created on November 3, 2004, 11:16 AM Merges output from matlab with the lucenes output and creates a file that could be evaluated by trec_eval TREC: qid iter docno rank sim run_id
QueryRelevance QueryRelevance.java Used for storing QueryRelevance Judgments Created on October 17, 2004, 2:22 PM
ResultMerger ResultMerger.java Created on January 9, 2005, 11:34 AM Merges results from two files; one by one; Files must have equal number of docs / query
RobustEvalFix RobustEvalFix.java Used for fixing robust_eval script so that the queries match with the script Created on November 15, 2004, 10:21 AM
RobustEvalFixTest RobustEvalFixTest.java JUnit based test Created on November 15, 2004, 8:44 PM
RStat Represents topicid avep p10 for the use by roubust_eval
StatsParser StatsParser.java Created on October 15, 2004, 2:34 PM Takes as input output from ret_eval -q and produces output in form: topicid avep p10 for the use by roubust_eval
 

Package com.hrstc.trec Description

TREC Module

Provides various utilities that allow to run and evaluate peformance of Lucene on the test data from TREC Robust Retrieval Track.

Procedure

  1. Index Files
    I have modified org.apache.lucenesandbox.xmlindexingdemo package so that it works with TREC Robust 2004 data files. In order to index files execute org.apache.lucenesandbox.xmlindexingdemo.IndexFiles
  2. Perform Search
    I have modified SearchFiles so that it outputs data in TREC format. For usage details see javadoc. There is a sample program that could be launched be executing 'demo'.
  3. Evaluation - TREC
    It is necessary to evaluate search results with TREC programs; you can just use TREC's standard program - trec_eval, in order to perform evaluation.
    ex. /mnt/hgfs/thesis/trec_eval/trec_eval.7.0beta_linux/trec_eval -q qrels.robust2004.txt $1.result > $1.trec_eval.out
  4. Stats Parsing
    Values of P10 are necessary to run the robust_eval script, but they are not produced in the necessary form by trec_eval script. Program com.hrstcs.trec.StatsParser parses trec_eval's result file and produces necessary stats in the format usable by robust_eval.
    ex. /usr/java/jre1.5.0_01/bin/java -Xmx256m -cp /mnt/hgfs/thesis/lucene/aditional_src/bin com.hrstcs.trec.StatsParser $1.trec_eval.out $1.robust_eval.in
  5. Evaluation - TREC Robust
    In some of the runs not all of the topics are present, program com.hrstcs.trec.RobustEvalFix adjusts 'robust2004_eval.pl' script so that only present topics are evaluated.
    ex. /usr/java/jre1.5.0_01/bin/java -Xmx256m -cp /mnt/hgfs/thesis/lucene/aditional_src/bin com.hrstcs.trec.RobustEvalFix /mnt/hgfs/tdata/original/robust2004_eval.pl $1.result
    Finally, in order to perform robust evaluation:
    ./robust2004_eval.pl $1.robust_eval.in > $1.robust_eval.out

Reports

com.hrstc.trec.report contains utilities that could be used to create reports from various runs.


Author:
Neil O. Rouben