Pre-made datasets/directory:
datasets/processed/
datasets/raw/product_name.xlsx
(from the competition e-mail)datasets/raw/product_catalog.xlsx
(from the competition e-mail)
How to run:
0. Run pip install -r requirements.txt
or conda install --file environment.yml
to install dependencies
python 01_cleaning.py
-->datasets/processed/product_name.tsv
,datasets/processed/product_catalog.tsv
(clean data)python 02_lev_search.py
-->datasets/processed/result_lev.tsv
python 03_fuzzy_search.py
-->datasets/processed/result_fuzzy.tsv
- Run
03_finalize.ipynb
datasets/final_result.tsv