/vector_search

interview task about vector search and embeding

Primary LanguagePythonMIT LicenseMIT

Vector Search

Project Overview

This project demonstrates a search functionality for Hacker News articles using modern web technologies and natural language processing techniques. It consists of three main components:

  1. A web scraper that collects articles from Hacker News
  2. A vector embedding system that converts article text into numerical representations
  3. A FastAPI-based search API that finds similar articles based on user queries

Technologies Used

  • FastAPI
  • Pydantic
  • PostgreSQL
  • SQLAlchemy
  • BeautifulSoup
  • Sentence Transformers

Installation

git clone https://github.com/mo1ein/vector_search.git
cd vector_search
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

If packages are large (like torch) and you got timout error, you can use this command:

pip install -r requirements.txt --default-timeout=1000

Run

Set your database configs in .env file. create a database with name vector_db in your connected postgres then run migrations. I recommend to use pycharm extension or datagrip.

python main.py

Then, enjoy the app! http://127.0.0.1:8500/docs#/

Endpoints

Scrap data, embed to vector and insert to database. Body is empty. This operation may take several seconds to complete.

POST /

Search string query and find similarity.

POST /search

First should run / endpoint to get data then you can use /search.

Examples

Scrap data

curl -X POST http://0.0.0.0:8500

response:

"Text extracted, embedded and saved to db successfully!" 

Search

 curl -X POST \
  http://0.0.0.0:8500/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "rust"
}'

response:

{"similar_text":"Swift is a more convenient Rust Understanding the Y Combinator"}