This repository contains the source code for a search reranking model that is designed to improve the accuracy and relevance of search results for e-commerce product searches. The model is based on linear regression and uses various features of the search query results and product data to rerank the search results.
It takes in a list of products with their attributes and returns the reordered product IDs and their corresponding predicted scores.
The directory structure of the repository is as follows:
search_reranking_model/
├── Readme.md
├── data/
│ ├── bgq_data_search.csv
│ └── cached_dataframe.joblib
├── docs/
│ └── reranking-model-arch.png
├── model/
│ └── search_reranking_model.joblib
├── model_train.py
├── notebooks/
│ ├── Workbook-assumptions_of_LR.ipynb
│ └── search_reranking_model.ipynb
├── out/
│ └── reranked_product_scores.csv
├── requirements.txt
├── rerank_products.py
└── reranking_model.py
-
The
data
directory contains the input data for the model, including the search query data (bgq_data_search.csv
) and a cached version of the preprocessed data (cached_dataframe.joblib
). Themodel
directory contains the trained model (search_reranking_model.joblib
). -
The
docs
directory contains documentation related to the model architecture, including a diagram (reranking-model-arch.png
) that illustrates the various components and inputs. -
The
model_train.py
file contains the code to train the model on the input data. -
The
notebooks
directory contains Jupyter notebooks that were used for exploratory data analysis and model testing. TheWorkbook-assumptions_of_LR.ipynb
notebook explores the assumptions of logistic regression and thesearch_reranking_model.ipynb
notebook demonstrates the use of the trained model on new data. -
The
out
directory contains the final output of the model, which is a csv file (reranked_product_scores.csv
) containing the reranked scores for each product based on the search query. -
The
requirements.txt
file lists the required Python packages and their versions. -
The
rerank_products.py
script is the main script for running the model on new data. It preprocesses the data, applies the model, and generates the reranked product scores. -
The
reranking_model.py
file contains the code for defining and applying the logistic regression model.
- Clone the repository:
git clone https://github.com/akhilesh-k/search-reranking-model.git
- Install the required packages:
pip install -r requirements.txt
To use the product predictor, follow these steps:
-
Setup training data, if you want to play with the model. You can use the bgq query to import the dataframe if you have the access.
-
Prepare your input data as a list of products with their IDs and attributes.
-
Instantiate the ProductPredictor class by passing the path to the trained machine learning model:
from product_predictor import ProductPredictor model_path = 'path/to/model.joblib' predictor = ProductPredictor(model_path)
-
Add each product to the ProductPredictor instance using the
add_product
method:product_id = 'SKU-0000-12345'
attributes = [4.0, 0, 5.0, 1, 0.020829, 0.552162, 0.005076, 0.010152, 1.0, 0.149546, 0.046763, -0.423838, -0.000822, -0.097556, -0.787021]
predictor.add_product(product_id, attributes)
-
Call the predict_reranked_product_ids method on the ProductPredictor instance to get the predicted reordered product IDs and their corresponding scores:
reranked_product_ids, predicted_scores = predictor.predict_reranked_product_ids()
-
Save the predicted scores to a CSV file:
results_df = pd.DataFrame({'Product ID': reranked_product_ids, 'Score': [predicted_scores[id] for id in reranked_product_ids]})
results_df.to_csv('path/to/output.csv', index=False)
This project is licensed under the MIT License. See the LICENSE file for details.