This is a demo of a movie search engine. This project is inspired by Andrej Karpathy's weekend hack
This project allows three types of searches over movies: keyword-based (BM25), semantic, and hybrid searches. Additionally, it retrieves similar movies to a selected one.
- Docker
- Python
- Set the environment variables for your $OPENAI_API_KEY, $WEAVIATE_API_KEY, and $WEAVIATE_URL. If you are running Weaviate via Docker, the WEAVIATE_URL is "http://localhost:8080" and no WEAVIATE_API_KEY is needed.
Follow the following steps to reproduce the example
- Setup a virtual environment
python -m venv .venv
source .venv/bin/activate
- Set your OPENAI_API_KEY in the docker-compose.yml file and run the following command to run the weaviate docker file
docker compose up -d
- Run the following command in directory to install all required dependencies
pip install -r requirements.txt
- Run the following command to add all the data objects,you can change path of dataset at line 115 if necessary. You can also decrease the number of data objects at line 119 so that it takes less time.
python add_data.py
- After adding data run the following command to install all required node modules.
npm install
- After adding data and installing modules run the following command and navigate to http://localhost:3000/ to perform searching
npm run start
This project utilizes OpenAI models. Be advised that the usage costs for these models will be billed to the API access key you provide. Primarily, costs are incurred during data embedding. The default vectorization engine for this project is Ada v2
.
This project is built on three primary components:
- Weaviate Database: You have the option to host on Weaviate Cloud Service (WCS) or run it locally.
- Frontend: HTML,CSS,Js
- Backend: NodeJs
- 48,000+ movies dataset (License: CC0: Public Domain) for the columns: 'Id', 'Name', 'PosterLink', 'Genres', 'Actors', 'Director', 'Description', 'DatePublished', and 'Keywords'
- Wikipedia Movie Plots (License: CC BY-SA 4.0), for the column 'Plot'
Your contributions are always welcome! Feel free to contribute ideas, feedback, or create issues and bug reports if you find any!