Create a proof-of-concept implementation of a secure endpoint fronting Langchain's conversation retrieval chain. Use the following stack:
- FastAPI
- Langchain
- OpenAI
- Chroma (easily swappable)
The endpoint should:
- be secured with OAuth2 and JWT tokens
- return the source documents as JSON payload
- stream the completion response in real time
- support multiple conversations, identifiable via an ID and backed by ConversationBufferWindowMemory
- enable easy modification of default prompts (CONDENSE_QUESTION_PROMPT, QA_PROMPT)
- be easy to containerise + configure via environment
This is a demonstration that meets the above requirements in a simple standalone FastAPI server.
API design, effectively representing a multi-part combination of a) JSON documents and b) a text stream, was a requirement for a fairly niche use case. In practice, this makes clients and processing tricker, and a separation of streaming and document retrieval would normally be a better option.
The implementation consists of:
api.py
provides the APIauth.py
implements JWT authentication helpersclient.py
is a Python client for the endpoint, showing how a response can be consumed and processedconfig.py
provides point of access for external configurationcrc.py
provides the Conversation Retrieval Chain functionality on top of documents and retriever (vector store) managementdao.py
contains a skeletal implementation of a client cred store and a conversation store, a production implementation would replace thesetests/
contains a pytest test suite
Supporting notebooks:
notebooks/init_chroma_vectorstore.ipynb
showing howw to initialise the vector storenotebooks/crc.ipynb
, a notebook for playing with the CRC interface
- Load the desired doc(s) into the vector store by modifying
notebooks/init_chroma_vectorstore.ipynb
- Run the API server (in dev mode, with reload enabled):
poetry run python crc_api/api.py
-
Get a token
curl -X 'POST' \ -H 'Content-Type: application/x-www-form-urlencoded' \ -d 'client_id=demo_client_id&client_secret=demo_client_secret' http://127.0.0.1:8000/token
-
Invoke the completion endpoint, replacing the
<TOKEN>
with the one from the previous step:curl --no-buffer -X POST -H 'accept: text/event-stream' \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer <TOKEN>' \ -d '{"conversation_id": "123", "question": "What is the total number of AI publications in 2021?"}' \ http://localhost:8000/ask
The client performs similar steps to curl - gets a token and makes a request. It then unpicks the response to extract out the docs and print completion tokens as they arrive.
poetry run python crc_api/client.py