Scalable-Topic-Modeling

The Search Engine was implemented in Apache Spark and enables users to perform intelligent search over a massive collection of unstructured text documents. At its core lies an efficient and scalable implementation of Latent Semantic Analysis, a method which uses unsupervised learning to distill millions of text documents into a set of most important concepts. The search engine uses this concise representation to compute similarity between a user search query against all indexed documents to find a set of documents that are most relevant to query terms.