Based on LangStream's example application called Webcrawler, I’ve attempted to recreate the application in LangChain.
There are 2 custom chains in this application:
-
crawler: This crawls the site, renders the html, and splits the text into tokens. The output is a collection of LangChain documents.
-
writeToAstra: This accepts the collection of documents and uses LangChain’s Cassandra vector store to store the documents. The Cassandra db is hosted on DataStax Astra.
The application takes an argument of which chain to run. The options are:
-
crawl-site: This will kick off the crawler chain.
-
chat: This will kick off a RetrievalQA chatbot that uses the Cassandra vector store and the AzureChatOpenAI llm model.
Create a .env file with the following:
OPENAI_API_KEY="<replace>"
OPENAI_API_BASE="<replace>"
OPENAI_API_TYPE="<replace>"
OPENAI_API_VERSION="<replace>"