com.hrstc.lucene.queryexpansion
Class GoogleSearcher

java.lang.Object
  extended by com.hrstc.lucene.queryexpansion.GoogleSearcher

public class GoogleSearcher
extends java.lang.Object

Performs Query Expansion, utilizing google for document source

Author:
Neil O. Rouben

Field Summary
static java.lang.String AUTH_KEY_FLD
          Auth key in order to use google's web api
private  java.io.File cache
          Location where cache is stored
static java.lang.String FILE_CACHE_FLD
          Location where cache is stored
private  java.lang.String key
          Auth key in order to use google's web api
private static java.util.logging.Logger logger
           
private  java.util.Properties prop
          Properties that contain necessary values
 
Constructor Summary
GoogleSearcher(java.util.Properties prop)
           
 
Method Summary
 java.lang.String htmlToTxt(java.io.InputStream inputStream)
          Reads html and returns txt contents
 java.lang.String readURL(java.net.URL url, com.google.soap.search.GoogleSearch search)
          Attempts to read url directly; if not possible tries to read it from google's cache.
 java.util.Vector<org.apache.lucene.document.Document> search(java.lang.String queryTxt)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

AUTH_KEY_FLD

public static final java.lang.String AUTH_KEY_FLD
Auth key in order to use google's web api

See Also:
Constant Field Values

FILE_CACHE_FLD

public static final java.lang.String FILE_CACHE_FLD
Location where cache is stored

See Also:
Constant Field Values

logger

private static java.util.logging.Logger logger

prop

private java.util.Properties prop
Properties that contain necessary values


cache

private java.io.File cache
Location where cache is stored


key

private java.lang.String key
Auth key in order to use google's web api

Constructor Detail

GoogleSearcher

public GoogleSearcher(java.util.Properties prop)
Parameters:
key - to use google's web api
cacheFileName - where search results returned from google will be kept
analyzer - - used to parse documents to extract terms
searcher - - used to obtain idf
similarity -
Method Detail

search

public java.util.Vector<org.apache.lucene.document.Document> search(java.lang.String queryTxt)
                                                             throws com.google.soap.search.GoogleSearchFault,
                                                                    java.io.IOException
Parameters:
queryTxt -
Returns:
Throws:
com.google.soap.search.GoogleSearchFault
java.io.IOException

readURL

public java.lang.String readURL(java.net.URL url,
                                com.google.soap.search.GoogleSearch search)
                         throws com.google.soap.search.GoogleSearchFault,
                                java.io.IOException
Attempts to read url directly; if not possible tries to read it from google's cache. Parses html out and returns only contents.

Parameters:
args -
Throws:
com.google.soap.search.GoogleSearchFault
java.io.IOException
java.lang.Exception

htmlToTxt

public java.lang.String htmlToTxt(java.io.InputStream inputStream)
                           throws java.io.IOException
Reads html and returns txt contents

Parameters:
in -
Returns:
Throws:
java.io.IOException