/Semantic-Search-Model-Experiments

Experiments in the field of Semantic Search using BM-25 Algorithm, Mean of Word Vectors, along with state of the art Transformer based models namely USE and SBERT.

Primary LanguageJupyter Notebook

Semantic-Search-Model-Experiments

Dataset Used For Semantic Search/ Information Retrieval:

CISI Dataset - Kaggle

Experiments:

Experiment-1. Using BM-25 Algorithm and Parameter Tuning For Semantic Search

BM-25 Algorithm variations used:

  • BM25Okapi
  • BM25L
  • BM25Plus

Result:

image
BEST MODEL: BM25Plus

Experiment-2. Using Mean of Word Vectors (MWV) with Pretrained Embeddings For Semantic Search

BM-25 Algorithm variations used:

  • word2vec
  • GloVe
  • FastText

Result:

image
BEST MODEL: word2vec

Experiment-3. Using LDA Topic Modelling For Semantic Search

Result:

Performs worst than BM-25

Experiment-4. Using Universal Sentence Encoder (USE) For Semantic Search

USE Model variations used:

  • Transformer Encoder
  • Deep Averaging Network(DAN) Encoder

Result:

image
BEST MODEL: USE-Transformer

Experiment-5. Using Pretrained and Finetuned Sentence Transformers (SBERT) For Semantic Search

Result:

image
BEST MODEL: Finetuned SBERT

Final Result:

image

image
Overall Best Model: Finetuned SBERT