This repository is the official codebase for the arxiv paper search app hosted at: https://docsearch.redisventures.com
Through the RediSearch module, vector data types and search indexes can be added to Redis. This turns Redis into a highly performant, in-memory, vector database, which can be used for many types of applications.
Here we showcase Redis vector similarity search (VSS) applied to a document search/retrieval use case. Read more about AI-powered search in our blog post (shout out to our friends at Data Science Dojo).
The steps below outline how to get this app up and running on your machine.
Install Docker Desktop.
Pull the arXiv dataset from the the following Kaggle link.
Download and extract the zip file and place the resulting json file (arxiv-metadata-oai-snapshot.json
) in the data/
directory.
1. Setup python environment:
- If you use conda, take advantage of the Makefile included here:
make env
- Otherwise, setup your virtual env however you wish and install python deps in
requirements.txt
2. Use the notebook:
- Run through the
arxiv-embeddings.ipynb
notebook to generate some sample embeddings.
This app was built as a Single Page Application (SPA) with the following components:
- Redis Stack: Vector database + JSON storage
- FastAPI (Python 3.8)
- Pydantic for schema and validation
- React (with Typescript)
- Redis OM for ORM
- Docker Compose for development
- MaterialUI for some UI elements/components
- React-Bootstrap for some UI elements
- Huggingface Tokenizers + Models for vector embedding creation
Some inspiration was taken from this Cookiecutter project and turned into a SPA application instead of a separate front-end server approach.
To launch app, run the following:
docker compose up
from the same directory asdocker-compose.yml
- Navigate to
http://localhost:8888
in a browser
Building the containers manually:
The first time you run docker compose up
it will automatically build your Docker images based on the Dockerfile
. However, in future passes when you need to rebuild, simply run: docker compose up --build
to force a new build.
It's typically easier to manipulate front end code in an interactive environment (outside of Docker) where one can test out code changes in real time. In order to use this approach:
- Follow steps from previous section with Docker Compose to deploy the backend API.
cd gui/
directory and useyarn
to install packages:yarn install --no-optional
(you may need to usenpm
to installyarn
).- Use
yarn
to serve the application from your machine:yarn start
. - Navigate to
http://localhost:3000
in a browser. - Make front end changes in realtime.
- Issues with Docker? Run
docker system prune
, restart Docker Desktop, and try again. - Open an issue here on GitHub and we will be as responsive as we can!
This is a new project. Comment on an open issue or create a new one. We can triage it from there.