Home

  LucQE - Lucene QE

  TREC Lucene

  External Ranking
  Function (Fuzzy)

  Download

  About Me

  Support This Project
  SourceForge.net Logo

 

  Neil Rubens
  email me

 

 


 

 

All of the modules are result of my Masters Project. If you are interested in more details please see my paper on the website. If you are using any of my code or ideas please make sure to cite my paper [bib] (thanks).

LucQE [lucky] Lucene Query Expansion Module

Provides a framework along with several implementations that allow to perform Query Expansion (QE) with the use of Apache Lucene.

Query Expansion - Adding search terms to a user's search. Query expansion is the process of a search
engine adding search terms to a user's weighted search. The intent is to improve precision and/or recall. The additional terms may be taken from a thesaurus. For example a search for "car" may be expanded to: car cars auto autos automobile automobiles [foldoc.org].

Following modules have been implemented:

  • Rocchio Query Expansion (QE) method.
  • gQE [geek] - Provides implementation of pseudo feedback QE utilizing Google's web API to query the world wide web in order to acquire terms for QE.
Tag Combined Topic Set 
MAP P10 %no
Lucene QE 0.2433 0.3936 18.10%
Lucene gQE 0.2332 0.3984 14%
KB-R-FIS gQE 0.2322 0.4076 14%
Lucene 0.2 0.37 15%
Tested on data from NIST TREC Robust Retrieval Track 2004 (trec.nist.gov)

MAP - mean average precision
P10 - average of precision at 10 documents retrieved
%no - percentage of topics with no relevant in the top 10 retrieved

Lucene - version 1.4.3 (unmodified)
Lucene QE - lucene with local query expansion
Lucene gQE – Lucene system that utilized Rocchio’s query expansion along with Google.
KB-R-FIS gQE – My Fuzzy Inference System that utilized Rocchio’s query expansion along with Google.

TREC Module

Provides various utilities that allow to run and evaluate peformance of Lucene on the test data from TREC Robust Retrieval Track.

Fuzzy Logic Module

Provides various utilities that allow to integrate Lucene with Matlab's Fuzzy Logic Toolbox and run and evaluate performance on the data from TREC Robust Retrieval Track.