This repository consists of a Jupyter Notebook that can be shown as a RISE presentation (Reveal.js - Jupyter/IPython Slideshow Extension).
The code is written in Python and uses the redis-py Redis client. You don't need to be a Python expert to run it and understand the concepts. Not a Python developer? Don't worry - these concepts can be applied equally in other programming languages: for example with the node-redis client for Node.js, jedis for Java or NRedisStack for C#.
This presentation is based on the Redis Product Search Demo by Tyler Hutcherson (@tylerhutcherson
) and Sam Partee (@Spartee
) from Redis.
To run the code locally, you'll need to install and setup a few things:
- Python 3 (if you don't have a recent version of Python, grab one here. We've tested on Python 3.10)
- Poetry (dependency manager for Python - read the installation instructions here)
- Docker Desktop (read the installation instructions here) - we use this to provide you with a Redis Stack instance.
- Git command line tools (the
git
command). Get these from the Git website if needed. - RedisInsight - a graphical tool for viewing and managing data in Redis. Download a free copy here or get it from the Apple App Store if you're on a Macintosh.
We'll assume that you've downloaded/installed the pre-requisites above, and explain how to configure them as needed in the remainder of this README.
At the terminal, clone the repository to your local machine:
git clone https://github.com/bsbodden/redis-vss-py.git
Then, change directory into the repository folder:
cd redis-vss-py
We assume that your terminal's current directory is this folder for all subsequent commands.
We're using the Poetry tool to manage Python virtual environments and dependencies. Install the dependencies that this workshop uses with the following command:
poetry install --no-root
This installs the dependencies needed for this part of the workshop, and those for the vector similarity search part. Expect to see output similar to this:
Creating virtualenv redis-vss-py-_T_fhuK9-py3.10 in /Users/simon/Library/Caches/pypoetry/virtualenvs
Installing dependencies from lock file
Package operations: 134 installs, 0 updates, 0 removals
• Installing six (1.16.0)
• Installing attrs (23.1.0)
• Installing platformdirs (3.8.0)
... similar output...
The code uses a .env
file to store configuration information. We've provided an example file that you should be able to use without needing to make any changes.
Copy this into place:
cp env.example .env
Note that .env
files may contain secrets such as Redis passwords, API keys etc. Add them to your .gitignore
file and don't commit them to source control! We've done that for you in this repository.
We've provided a Docker Compose file for you to run an instance of Redis Stack locally. This will run on the default Redis port: 6379. If you have another instance of Redis running on this port, be sure to shut it down first.
Start Redis Stack like this:
docker-compose up -d
You should see output similar to the following:
...
Starting redis-vss-py ... done
Download the data set contents from https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset The unzip archive will create a directory like:
.
├── images
│ ├── 10000.jpg
│ ├── 10001.jpg
│ ├── 10002.jpg
│ ├── 10003.jpg
│ │ ...
│ ├── 9491.jpg
│ ├── 9497.jpg
│ └── 9892.jpg
└── styles.csv
The loader script expects the dataset to be located under the directory pointed to by the environment variable DATASET_BASE
, set this to the value of the parent directory of styles.csv and the images directory above.
Let's load the Fashion Product Images Dataset into Redis Stack. This step also builds the search indices that you'll use to query the data from RedisInsight and the Python sample application.
Run the data loader like this:
poetry run python loader.py
This process will take several minutes to vectorize all 44k product descriptions and images.
Now you have data in your Redis Stack instance, go back to RedisInsight, hit the refresh button and you should see 44K+ keys appear. Click on a key to see the JSON document stored at that key. The demo keys are prefixed with fashion:
.
poetry run jupyter notebook
To be able to re-run the presentation a small script resets certain aspects that are perform during the presentation:
- Drops the search index
- Removes the vectorize description and images for the docs with ids defined in the
DEMO_PRODUCTS
environment variable
To run the reset script use:
poetry run python reset.py
If you need help with this workshop, or just want to chat with us about the concepts and how you plan to use them in your own projects then we'd love to see you on the Redis Discord. There's channels for everything from help with different programming languages to promoting your own projects and finding a job.