/NLPWarehouse

Scalable Data Warehouse for LLM Finetuning: API Design for High Throughput Data Ingestion and RAG Retrieval. This project collects, cleans, processes, and stores text/audio data for Swahili language. It includes web scraping, database management, API development, and automated workflows to enhance NLP capabilities for African languages.

Primary LanguageJupyter Notebook

Stargazers

No one’s star this repository yet.