
a large-scale text-based search engine in Java indexed with Apache Lucene API

Primary LanguageJava

Text-based Search Engine

(Java, Apache Lucene, API, SVM)

  • Published a large-scale, text-based search engine in Java leveraging Apache Lucene for 33,258 astronomical publications
  • Utilized machine learning functionality such as ranking diversification and learning-to-rank with an SVM classifier and integrated multiple information-retrieval models to analyze data for classification and regression analyses
  • Constructed a predictive model for document relevance based on Cornell SVM-Rank with 18 features which improved ranking precision by 20%

Retrieval Models:

  1. unranked/ranked booleans
  2. BM25
  3. Indri

Reference: https://boston.lti.cs.cmu.edu/classes/11-642/ 2019 version