Can ask any question related to DOSM data
SETUP
- download embeddings.zip http://gofile.me/72cUv/p7lOkrbZL (pass:dosm)
- extract embeddings.zip
- go to your env if want to use env
- pip install -r requirements.txt
- python 03_build_map.py
- python 04_build_index.py
- python app.py
My pipeline:
-
Scrape using 01_scrape.py
-
Then clean the csv file into their own group with 01.5_clean.ipynb
-
Create embeddings using openai ada embeddings in 02_create_embeddings.py
-
Build mapping between index and file+row for embeddings in 03_build_map.py
-
Build index using faiss 04_build_index.py
-
Run gradio web app in app.py
I'm too lazy to clean up my code, feel free to ask question if you don't understand my code.
At first want to deploy in huggingface spaces, but embeddings too big and too lazy to store in cloud like pinecone. Feel free to improve my prompt incase you want to do other stuff like visualization.