Search Reranking Model - POC

This repository contains the source code for a search reranking model that is designed to improve the accuracy and relevance of search results for e-commerce product searches. The model is based on linear regression and uses various features of the search query results and product data to rerank the search results.

It takes in a list of products with their attributes and returns the reordered product IDs and their corresponding predicted scores.

File Structure

The directory structure of the repository is as follows:

search_reranking_model/
├── Readme.md
├── data/
│   ├── bgq_data_search.csv
│   └── cached_dataframe.joblib
├── docs/
│   └── reranking-model-arch.png
├── model/
│   └── search_reranking_model.joblib
├── model_train.py
├── notebooks/
│   ├── Workbook-assumptions_of_LR.ipynb
│   └── search_reranking_model.ipynb
├── out/
│   └── reranked_product_scores.csv
├── requirements.txt
├── rerank_products.py
└── reranking_model.py

The data directory contains the input data for the model, including the search query data (bgq_data_search.csv) and a cached version of the preprocessed data (cached_dataframe.joblib). The model directory contains the trained model (search_reranking_model.joblib).
The docs directory contains documentation related to the model architecture, including a diagram (reranking-model-arch.png) that illustrates the various components and inputs.
The model_train.py file contains the code to train the model on the input data.
The notebooks directory contains Jupyter notebooks that were used for exploratory data analysis and model testing. The Workbook-assumptions_of_LR.ipynb notebook explores the assumptions of logistic regression and the search_reranking_model.ipynb notebook demonstrates the use of the trained model on new data.
The out directory contains the final output of the model, which is a csv file (reranked_product_scores.csv) containing the reranked scores for each product based on the search query.
The requirements.txt file lists the required Python packages and their versions.
The rerank_products.py script is the main script for running the model on new data. It preprocesses the data, applies the model, and generates the reranked product scores.
The reranking_model.py file contains the code for defining and applying the logistic regression model.

Installation

Clone the repository: git clone https://github.com/akhilesh-k/search-reranking-model.git
Install the required packages: pip install -r requirements.txt

Usage

To use the product predictor, follow these steps:

Setup training data, if you want to play with the model. You can use the bgq query to import the dataframe if you have the access.
Prepare your input data as a list of products with their IDs and attributes.

Instantiate the ProductPredictor class by passing the path to the trained machine learning model:

from product_predictor import ProductPredictor
model_path = 'path/to/model.joblib'
predictor = ProductPredictor(model_path)

Add each product to the ProductPredictor instance using the add_product method: product_id = 'SKU-0000-12345' attributes = [4.0, 0, 5.0, 1, 0.020829, 0.552162, 0.005076, 0.010152, 1.0, 0.149546, 0.046763, -0.423838, -0.000822, -0.097556, -0.787021] predictor.add_product(product_id, attributes)
Call the predict_reranked_product_ids method on the ProductPredictor instance to get the predicted reordered product IDs and their corresponding scores:

reranked_product_ids, predicted_scores = predictor.predict_reranked_product_ids()
Save the predicted scores to a CSV file: results_df = pd.DataFrame({'Product ID': reranked_product_ids, 'Score': [predicted_scores[id] for id in reranked_product_ids]}) results_df.to_csv('path/to/output.csv', index=False)

License

This project is licensed under the MIT License. See the LICENSE file for details.

akhilesh-k/search-reranking-model

Search Reranking Model - POC

File Structure

Installation

Usage

License