This project demonstrates a search functionality for Hacker News articles using modern web technologies and natural language processing techniques. It consists of three main components:
- A web scraper that collects articles from Hacker News
- A vector embedding system that converts article text into numerical representations
- A FastAPI-based search API that finds similar articles based on user queries
- FastAPI
- Pydantic
- PostgreSQL
- SQLAlchemy
- BeautifulSoup
- Sentence Transformers
git clone https://github.com/mo1ein/vector_search.git
cd vector_search
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
If packages are large (like torch
) and you got timout error, you can use this command:
pip install -r requirements.txt --default-timeout=1000
Set your database configs in .env
file. create a database with name vector_db
in your connected postgres then run migrations.
I recommend to use pycharm extension
or datagrip
.
python main.py
Then, enjoy the app!
http://127.0.0.1:8500/docs#/
Scrap data, embed to vector and insert to database. Body is empty. This operation may take several seconds to complete.
POST /
Search string query and find similarity.
POST /search
First should run /
endpoint to get data then you can use /search
.
Scrap data
curl -X POST http://0.0.0.0:8500
response:
"Text extracted, embedded and saved to db successfully!"
Search
curl -X POST \
http://0.0.0.0:8500/search \
-H 'Content-Type: application/json' \
-d '{
"query": "rust"
}'
response:
{"similar_text":"Swift is a more convenient Rust Understanding the Y Combinator"}