Personalization with deep learning in 100 lines of code
This repository contains the companion code to the "Clothes in Space" blog post.
Running the full notebook in the original architecture requires Python 3 with dependencies installed and access to Elasticsearch or Coveo and Redis: to get Elastic up and running locally and quickly, the docker setup is recommended.
We also support building the env with docker-compose. To run the POC with Docker:
docker-compose build
to build the Jupyter container;docker-compose up --force-recreate
to start the stack; a link (e.g.http://127.0.0.1
) with a temporary token will be printed in the cli;docker-compose down
to shut down all the containers.
Once you have everything up and running, fill the variables in the notebook to let the Python clients successfully connect to the databases (fill also catalog-specific variables you know based on your specific input file). Please note that depending on your exact setup (Redis or memory, Elasticsearch or Coveo index, etc.), you may want to use specific portions of the notebook: for this reason, different "versions" of key functions are clearly marked in the code.
There are two main embedding examples in the code, product embeddings and word embeddings:
-
product embeddings in the text were generated through real session data from commerce stores. To create embeddings from your session data you will need to provide data from your store. For your convenience, a sample session file is included in the repo: if your session data are formatted in the same way, the repo can be run with no changes;
-
word embeddings in the text were generated with the 1bn word corpus. A copy of the file can be downloaded here; please note that if you use the Docker setup the file will be downloaded automatically and available in the
/tmp/corpus.txt
path. The variableMAX_SENTENCES
can be used to limit the amount of sentences for training, greatly speeding up model building.
To run the notebook end-to-end you will also need to provide a "catalog" file and
some test SKUs (i.e. product identifiers) to visualize analogies and similarities.
For your convenience a sample catalog file is included in the repo:
if you put catalog
and sessions
files in the data
folder,
you can then use in the notebook /notebooks/data
as your DATA_FOLDER
.
Please note that the sample files (catalog and sessions) are NOT enough to run the notebook end-to-end: those files are just provided as syntactic examples of *real* files you may want to use to reproduce the personalization effect explained in the original blog post.
To use your own catalog
and sessions
files, follow the sample files provided in the repo. In particular:
catalog.csv
is a csv file with four columnssku
,name
,target
,image
:sku
is the product identifier;sessions.txt
is a TAB separated txt file storing a session on each line; each session has a numerical id first and then the list of SKUs (matching the content ofcatalog.csv
, obviously) that were viewed in that session.
Thanks to Luca Bigon for adding docker-compose
support and for the
usual helpful comments; thanks to Francis Turgeon-Boutin for helping
with Coveo's setup and general feedback on the project.
All the code is released "as in" under an MIT license.