Sheila-Murugi/NLPWarehouse
Scalable Data Warehouse for LLM Finetuning: API Design for High Throughput Data Ingestion and RAG Retrieval. This project collects, cleans, processes, and stores text/audio data for Swahili language. It includes web scraping, database management, API development, and automated workflows to enhance NLP capabilities for African languages.
Jupyter Notebook
Stargazers
No one’s star this repository yet.