/QueryExpansion

Query Expansion is the project developed at IIIT hyderabad, as part of course work. We have gone through existing applications and proposed an approach combination of N grams and Markov model, implemented the same and tested our results on data set comprising "The Telegraph - Calcutta" news stories

Primary LanguageJava

Steps to run

1. Set the properties.java with path variables depending on where the input data set is present
   and where the results need to be stored.
   
2. Run Doc2MatInput.java :This converts the dataset to format that is required by doc2mat perl script

3. Run doc2mat perl script : This generates the matrix file representation of data set
	Ex: perl E:\doc2mat-1.0\doc2mat -nostem E:\QueryEpansion_ver1_results\Output_ver1\Doc2MatInput\File E:\QueryEpansion_ver1_results\Output_ver1\Doc2MatInput\File.mat
		perl "location of perl script" -nostem "Inputfile" "Outputfile"
4. Run Cluto for clustering the documents.The last argument denotes the number of clusters to be formed
	Ex: vcluster.exe -clmethod="rb" E:\QueryEpansion_ver1_results\Output_ver1\Doc2MatInput\File.mat 150
		vcluster.exe -clmethod="rd" "Input matrix file" "No of clusters"

5. Update Properties File if needed

6. Run Cluster.java which put the documents into clusters

7. Run IndexClusters.java which indexes the clusters using lucene

8. Run tagclusters.java which associated tags to each cluster

9. Run SearchIndex.java : Enter you initial query here. Expanded queries are displayed.

For Evaluation:


1. Save all the input in text file. Start with string "Enter Query" followed by input query and augmented queries in next lines.

2. Run IndexDocuments.java which indexes the clusters using lucene

3. Run QueryDocuments.java which outputs excel sheet of precision and recall values.

Note: Change Properties.java interface accordingly