⚡️ Blazing fast NRC search

[EXPERIMENTAL]

The objective of this repo is to build a blazing fast semantic Neural search using a multilingual LLM and Qdrant (Quadrant) vector db. The following techs are used

Qdrant
FastEmbed
Sentence Transformer
distiluse-base-multilingual-cased-v1 (model)
- generates aligned vector spaces, i.e., similar inputs in different languages are mapped close in vector space
- 14 languages (incl. NL)
Scalar Quantization (for faster inference time and memory efficiency)
- 8-bit

📊 Data

The data was scraped by utilizing the NRC scraper api given a set of categories. The result is a data set of NRC article items > 6500 and chunks from full articles > 45000.

🚀 Installation

pip install -r requirements.txt

📖 Usage

qdrant

http://localhost:6333

api

http://localhost:8000

web ui

index.html

swagger

http://localhost:8000/docs

ciCciC/news-article-vector-search

⚡️ Blazing fast NRC search

📊 Data

🚀 Installation

📖 Usage