/langchain-webcrawler

I used the design and concepts from the LangStream webrawler example and did it with LangChain.

Primary LanguagePython

langchain-webcrawler

Based on LangStream's example application called Webcrawler, I’ve attempted to recreate the application in LangChain.

Chains

There are 2 custom chains in this application:

  • crawler: This crawls the site, renders the html, and splits the text into tokens. The output is a collection of LangChain documents.

  • writeToAstra: This accepts the collection of documents and uses LangChain’s Cassandra vector store to store the documents. The Cassandra db is hosted on DataStax Astra.

Running the application

The application takes an argument of which chain to run. The options are:

  • crawl-site: This will kick off the crawler chain.

  • chat: This will kick off a RetrievalQA chatbot that uses the Cassandra vector store and the AzureChatOpenAI llm model.

Creds

Create a .env file with the following:

OPENAI_API_KEY="<replace>"
OPENAI_API_BASE="<replace>"
OPENAI_API_TYPE="<replace>"
OPENAI_API_VERSION="<replace>"

Fair warning

There are a TON of hardcoded things in here. This can be a solid start to creating something real, but it’s not there yet.