This is an app that let's you ask questions about any data source by leveraging embeddings, vector databases, large language models and last but not least langchains
- Upload any
file(s)
or enter anypath
orurl
- The data source is detected and loaded into text documents
- The text documents are embedded using openai embeddings
- The embeddings are stored as a vector dataset to activeloop's database hub
- A langchain is created consisting of a LLM model (
gpt-3.5-turbo
by default) and the vector store as retriever - When asking questions to the app, the chain embeds the input prompt and does a similarity search of in the vector store and uses the best results as context for the LLM to generate an appropriate response
- Finally the chat history is cached locally to enable a ChatGPT like Q&A conversation
- The app only runs on
py>=3.10
! - As default context this git repository is taken so you can directly start asking question about its functionality without chosing an own data source.
- To run locally or deploy somewhere, execute
cp .env.template .env
and set credentials in the newly created.env
file. Other options are manually setting of system environment variables, or storing them into.streamlit/secrets.toml
when hosted via streamlit. - If you have credentials set like explained above, you can just hit
submit
in the authentication without reentering your credentials in the app. - Your data won't load? Feel free to open an Issue or PR and contribute!
- Yes, Chad in
DataChad
refers to the well-known meme - DataChad V2 does not support local mode, but many feature will soon come. Stay tuned!
If you like to contribute, feel free to grab any task
- Refactor utils, especially the loaders
- Add option to choose model and embeddings
- Enable fully local / private mode
- Add option to upload multiple files to a single dataset
- Decouple datachad modules from streamlit
- remove all local mode and other V1 stuff
- Load existing knowledge bases
- Delete existing knowledge bases
- Enable streaming responses
- Show retrieved context
- Refactor UI
- Introduce smart FAQs
- Exchange downloaded file storage with tempfile
- Add user creation and login
- Add chat history per user
- Make all I/O asynchronous
- Implement FastAPI routes and backend app
- Implement a proper frontend (react or whatever)
- containerize the app