ahmedrachid/audio-semantic-search

Jupyter Notebook

Greenplum Audio Semantic Search

This repository provides instructions and code for setting up an Audio/Music Semantic Search engine leveraging VMware Greenplum as Vector Database.

Getting Started

Follow these steps to set up the audio search system on your local machine.

Prerequisites

Greenplum Database
pgvector extension
Kaggle API Access
Docker
Jupyter Notebook
Python dependencies (requirements.txt)

Step 1: Create Database Tables

Run the script.sql file to create tables for storing metadata and embeddings in your Greenplum database:
```
$ psql -U your_username -d your_database -a -f script.sql
```

Step 2: Generate Embeddings

Use the Audio_Semantic_Search.ipynb Notebook to download the dataset and generate embeddings into Greenplum.
Install the required Python packages listed in requirements.txt.

Step 3: Build Docker Image

Build your Docker image for the Greenplum audio search system:
```
$ docker build -t greenplum-audio-search .
```

Step 4: Run Docker Container

Run the Docker container for the audio search system:
```
$ docker run -d -p 8501:8501 greenplum-audio-search
```

Step 5: Access the Web App

Once the container is running, access the web application by opening a web browser and navigating to:
```
http://localhost:8501
```