|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.hrstc.lucene.queryexpansion.QueryExpansion
public class QueryExpansion
Implements Rocchio's pseudo feedback QueryExpansion algorithm
Query Expansion - Adding search terms to a user's search. Query expansion is the process of a search engine adding search terms to a user's weighted search. The intent is to improve precision and/or recall. The additional terms may be taken from a thesaurus. For example a search for "car" may be expanded to: car cars auto autos automobile automobiles [foldoc.org]. To see options that could be configured through the properties file @see Constants Section
Created on February 23, 2005, 5:29 AM
TODO: Yahoo started providing API to query www; could be nice to add yahoo implementation as well
Field Summary | |
---|---|
private org.apache.lucene.analysis.Analyzer |
analyzer
|
static java.lang.String |
DECAY_FLD
how much importance of document decays as doc rank gets higher. |
static java.lang.String |
DOC_NUM_FLD
Number of documents to use |
static java.lang.String |
DOC_SOURCE_FLD
Indicates FLD what source to use to obtain documents {google, local, null} |
static java.lang.String |
DOC_SOURCE_GOOGLE
get documents from google |
static java.lang.String |
DOC_SOURCE_LOCAL
get documents from local repository |
private java.util.Vector<org.apache.lucene.search.TermQuery> |
expandedTerms
|
private static java.util.logging.Logger |
logger
|
static java.lang.String |
METHOD_FLD
Indicates which method to use for QE |
private java.util.Properties |
prop
|
static java.lang.String |
ROCCHIO_ALPHA_FLD
Rocchio Params |
static java.lang.String |
ROCCHIO_BETA_FLD
|
static java.lang.String |
ROCCHIO_METHOD
|
private org.apache.lucene.search.Searcher |
searcher
|
private org.apache.lucene.search.Similarity |
similarity
|
static java.lang.String |
TERM_NUM_FLD
Number of terms to produce |
Constructor Summary | |
---|---|
QueryExpansion(org.apache.lucene.analysis.Analyzer analyzer,
org.apache.lucene.search.Searcher searcher,
org.apache.lucene.search.Similarity similarity,
java.util.Properties prop)
Creates a new instance of QueryExpansion |
Method Summary | |
---|---|
org.apache.lucene.search.Query |
adjust(java.util.Vector<org.apache.lucene.search.QueryTermVector> docsTermsVector,
java.lang.String queryStr,
float alpha,
float beta,
float decay,
int docsRelevantCount,
int maxExpandedQueryTerms)
Adjust term features of the docs with alpha * query; and beta; and assign weights/boost to terms (tf*idf). |
java.util.Vector<org.apache.lucene.search.TermQuery> |
combine(java.util.Vector<org.apache.lucene.search.TermQuery> queryTerms,
java.util.Vector<org.apache.lucene.search.TermQuery> docsTerms)
combine weights according to expansion formula |
org.apache.lucene.search.Query |
expandQuery(java.lang.String queryStr,
org.apache.lucene.search.Hits hits,
java.util.Properties prop)
Performs Rocchio's query expansion with pseudo feedback qm = alpha * query + ( beta / relevanDocsCount ) * Sum ( rel docs vector ) |
org.apache.lucene.search.Query |
expandQuery(java.lang.String queryStr,
java.util.Vector<org.apache.lucene.document.Document> hits,
java.util.Properties prop)
Performs Rocchio's query expansion with pseudo feedback qm = alpha * query + ( beta / relevanDocsCount ) * Sum ( rel docs vector ) |
org.apache.lucene.search.TermQuery |
find(org.apache.lucene.search.TermQuery term,
java.util.Vector<org.apache.lucene.search.TermQuery> terms)
Finds term that is equal |
private java.util.Vector<org.apache.lucene.document.Document> |
getDocs(java.lang.String query,
org.apache.lucene.search.Hits hits,
java.util.Properties prop)
Gets documents that will be used in query expansion. |
java.util.Vector<org.apache.lucene.search.QueryTermVector> |
getDocsTerms(java.util.Vector<org.apache.lucene.document.Document> hits,
int docsRelevantCount,
org.apache.lucene.analysis.Analyzer analyzer)
Extracts terms of the documents; Adds them to vector in the same order |
java.util.Vector<org.apache.lucene.search.TermQuery> |
getExpandedTerms()
Returns QueryExpansion.TERM_NUM_FLD expanded terms from the most recent query |
private void |
merge(java.util.Vector<org.apache.lucene.search.TermQuery> terms)
Gets rid of duplicates by merging termQueries with equal terms |
org.apache.lucene.search.Query |
mergeQueries(java.util.Vector<org.apache.lucene.search.TermQuery> termQueries,
int maxTerms)
Merges termQueries into a single query. |
java.util.Vector<org.apache.lucene.search.TermQuery> |
setBoost(org.apache.lucene.search.QueryTermVector termVector,
float factor)
Sets boost of terms. |
java.util.Vector<org.apache.lucene.search.TermQuery> |
setBoost(java.util.Vector<org.apache.lucene.search.QueryTermVector> docsTerms,
float factor,
float decayFactor)
Sets boost of terms. |
private void |
setExpandedTerms(java.util.Vector<org.apache.lucene.search.TermQuery> expandedTerms)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String METHOD_FLD
public static final java.lang.String ROCCHIO_METHOD
public static final java.lang.String DECAY_FLD
public static final java.lang.String DOC_NUM_FLD
public static final java.lang.String TERM_NUM_FLD
public static final java.lang.String DOC_SOURCE_FLD
public static final java.lang.String DOC_SOURCE_LOCAL
public static final java.lang.String DOC_SOURCE_GOOGLE
public static final java.lang.String ROCCHIO_ALPHA_FLD
public static final java.lang.String ROCCHIO_BETA_FLD
private java.util.Properties prop
private org.apache.lucene.analysis.Analyzer analyzer
private org.apache.lucene.search.Searcher searcher
private org.apache.lucene.search.Similarity similarity
private java.util.Vector<org.apache.lucene.search.TermQuery> expandedTerms
private static java.util.logging.Logger logger
Constructor Detail |
---|
public QueryExpansion(org.apache.lucene.analysis.Analyzer analyzer, org.apache.lucene.search.Searcher searcher, org.apache.lucene.search.Similarity similarity, java.util.Properties prop)
similarity
- analyzer
- - used to parse documents to extract termssearcher
- - used to obtain idfMethod Detail |
---|
public org.apache.lucene.search.Query expandQuery(java.lang.String queryStr, org.apache.lucene.search.Hits hits, java.util.Properties prop) throws java.io.IOException, org.apache.lucene.queryParser.ParseException
queryStr
- -
that will be expandedhits
- -
from the original query to use for expansionprop
- - properties that contain necessary values to perform query;
see constants for field names and values
java.io.IOException
org.apache.lucene.queryParser.ParseException
private java.util.Vector<org.apache.lucene.document.Document> getDocs(java.lang.String query, org.apache.lucene.search.Hits hits, java.util.Properties prop) throws java.io.IOException
QueryExpansion.DOC_NUM_FLD
from QueryExpansion.DOC_SOURCE_FLD
query
- - for which expansion is being performedhits
- - to use in case QueryExpansion.DOC_SOURCE_FLD
is not specifiedprop
- - uses QueryExpansion.DOC_SOURCE_FLD
to determine where to get docs
QueryExpansion.DOC_NUM_FLD
from QueryExpansion.DOC_SOURCE_FLD
java.io.IOException
com.google.soap.search.GoogleSearchFault
public org.apache.lucene.search.Query expandQuery(java.lang.String queryStr, java.util.Vector<org.apache.lucene.document.Document> hits, java.util.Properties prop) throws java.io.IOException, org.apache.lucene.queryParser.ParseException
queryStr
- - that will be expandedhits
- - from the original query to use for expansionprop
- - properties that contain necessary values to perform query;
see constants for field names and values
java.io.IOException
org.apache.lucene.queryParser.ParseException
public org.apache.lucene.search.Query adjust(java.util.Vector<org.apache.lucene.search.QueryTermVector> docsTermsVector, java.lang.String queryStr, float alpha, float beta, float decay, int docsRelevantCount, int maxExpandedQueryTerms) throws java.io.IOException, org.apache.lucene.queryParser.ParseException
docsTermsVector
- of the terms of the top
docsRelevantCount
documents returned by original queryqueryStr
- - that will be expandedalpha
- - factor of the equationbeta
- - factor of the equationdocsRelevantCount
- - number of the top documents to assume to be relevantmaxExpandedQueryTerms
- - maximum number of terms in expanded query
java.io.IOException
org.apache.lucene.queryParser.ParseException
public org.apache.lucene.search.Query mergeQueries(java.util.Vector<org.apache.lucene.search.TermQuery> termQueries, int maxTerms) throws org.apache.lucene.queryParser.ParseException
termQueries
into a single query.
In the future this method should probably be in Query
class.
This is akward way of doing it; but only merge queries method that is
available is mergeBooleanQueries; so actually have to make a string
term1^boost1, term2^boost and then parse it into a query
termQueries
- - to merge
org.apache.lucene.queryParser.ParseException
public java.util.Vector<org.apache.lucene.search.QueryTermVector> getDocsTerms(java.util.Vector<org.apache.lucene.document.Document> hits, int docsRelevantCount, org.apache.lucene.analysis.Analyzer analyzer) throws java.io.IOException
doc
- - from which to extract termsdocsRelevantCount
- - number of the top documents to assume to be relevantanalyzer
- - to extract terms
java.io.IOException
public java.util.Vector<org.apache.lucene.search.TermQuery> setBoost(org.apache.lucene.search.QueryTermVector termVector, float factor) throws java.io.IOException
termVector
- beta
- - adjustment factor ( ex. alpha or beta )
java.io.IOException
public java.util.Vector<org.apache.lucene.search.TermQuery> setBoost(java.util.Vector<org.apache.lucene.search.QueryTermVector> docsTerms, float factor, float decayFactor) throws java.io.IOException
docsTerms
- factor
- - adjustment factor ( ex. alpha or beta )
java.io.IOException
private void merge(java.util.Vector<org.apache.lucene.search.TermQuery> terms)
terms
- public java.util.Vector<org.apache.lucene.search.TermQuery> combine(java.util.Vector<org.apache.lucene.search.TermQuery> queryTerms, java.util.Vector<org.apache.lucene.search.TermQuery> docsTerms)
public org.apache.lucene.search.TermQuery find(org.apache.lucene.search.TermQuery term, java.util.Vector<org.apache.lucene.search.TermQuery> terms)
public java.util.Vector<org.apache.lucene.search.TermQuery> getExpandedTerms()
QueryExpansion.TERM_NUM_FLD
expanded terms from the most recent query
private void setExpandedTerms(java.util.Vector<org.apache.lucene.search.TermQuery> expandedTerms)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |