All of the modules are result of my Masters Project. If you are interested in more details please see my paper on the website. If you are using any of my code or ideas please make sure to cite my paper [bib] (thanks).
LucQE [lucky] Lucene Query Expansion Module
Provides a framework along with several implementations that allow to perform Query Expansion (QE) with the use of Apache Lucene.
Query Expansion - Adding search terms to a user's search. Query expansion is the process of a search
engine adding search terms to a user's weighted search. The intent is to improve precision and/or recall. The additional terms may be taken from a thesaurus. For example a search for "car" may be expanded to: car cars auto autos automobile automobiles [foldoc.org].
Following modules have been implemented:
- Rocchio Query Expansion (QE) method.
- gQE [geek] - Provides implementation of pseudo feedback QE utilizing Google's web API to query the world wide web in order to acquire terms for QE.
Tag |
Combined Topic Set |
MAP |
P10 |
%no |
Lucene QE |
0.2433 |
0.3936 |
18.10% |
Lucene gQE |
0.2332 |
0.3984 |
14% |
KB-R-FIS gQE |
0.2322 |
0.4076 |
14% |
Lucene |
0.2 |
0.37 |
15% |
Tested on data from NIST TREC Robust Retrieval Track 2004 (trec.nist.gov)
MAP - mean average precision
P10 - average of precision at 10 documents retrieved
%no - percentage of topics with no relevant in the top 10 retrieved
Lucene - version 1.4.3 (unmodified)
Lucene QE - lucene with local query expansion
Lucene gQE – Lucene system that utilized Rocchio’s query expansion along with Google.
KB-R-FIS gQE – My Fuzzy Inference System that utilized Rocchio’s query expansion along with Google.
TREC Module
Provides various utilities that allow to run and evaluate peformance of Lucene on the test data from TREC Robust Retrieval Track.
Fuzzy Logic Module
Provides various utilities that allow to integrate Lucene with Matlab's Fuzzy Logic Toolbox and run and evaluate performance on the data from TREC Robust Retrieval Track.
|