/OpenAIEnterpriseChatBotAndQA

OpenAI Enterprise knowledge search ChatBot and QA

Primary LanguagePythonMIT LicenseMIT

OpenAIEnterpriseChatBotAndQA

This repo is made to use Langchain to integrate vector search and Azure OpenAI to support Enterprise knowledge search as ChatBot scenario.

High level architecture

image

Installation

  1. Install Python runtime (This repo is developed with Python 3.11.2)
  2. Download and install Microsoft Visual C++ Redistributable packages (choose X64 version if you setup inside windows VM on Azure)
  3. Clone the project onto your local Windows, install the python dependencies:
pip install -r ./requirements.txt
  1. Create your Azure OpenAI service and get your OPENAI_API_BASE and OPENAI_API_KEY.
  2. Deploy OpenAI models, deploy at least text-embedding-ada-002 and text-davinci-003, and remember keep the deployment name same as model name, otherwise you need change the source code file Enterprise_KB_Chatbot.py and Enterprise_KB_Ingest.py accorddingly.
  3. (Optional) Create Azure speech service and get SPEECH_KEY, SPEECH_REGION according to this KB.
  4. (Optional) Create Azure cognitive translation service and get TRANSLATOR_KEY, TRANSLATOR_LOCATION, TRANSLATOR_ENDPOINT according to this KB.
  5. Create a .env file at the project folder, and provide all necessary environment variables you get from above steps as below example. Azure Speech and Translator service keys are optional if you don't need speech and tranlation services integrated.
OPENAI_API_KEY=00000000000000000000000000000000
OPENAI_BASE=https://<youroai>.openai.azure.com
SPEECH_KEY=00000000000000000000000000000000
SPEECH_REGION=chinaeast2
TRANSLATOR_KEY=00000000000000000000000000000000
TRANSLATOR_LOCATION=chinaeast2
TRANSLATOR_ENDPOINT=https://api.translator.azure.cn/

Prepare VectorDB ingestion

This repo has two demo documents at Doc_Store folder, you can replace with your enterprise documents (PDF, DOCX, PPTX are supported so far), then run following command to re-build the vector DB.

python ./Enterprise_KB_Ingest.py

Currently the VectorDB engine is FAISS, but for enterprise production use, it can be replaced by other Vector Database engines (e.g. qdrant, weaviate, milvus, elastic which are all supported by LangChain)

NOTE: If your documents are Chinese version, it's recommended to replace following line code in Enterprise_KB_Ingest.py before re-build vector DB.

text_splitter = RecursiveCharacterTextSplitter(chunk_size=ENGLISH_CHUCK_SIZE, chunk_overlap=0) 

to

text_splitter = RecursiveCharacterTextSplitter(chunk_size=CHINESE_CHUNK_SIZE, chunk_overlap=0) 

Run chatbot

  1. Run following command at your project folder
python ./Enterprise_KB_Chatbot.py
  1. Use local brower to access http://127.0.0.1:7860/
  2. If you want to make your application internet accessible, change the last line of code in Enterprise_KB_Chatbot.py from
demo.launch()

to

demo.launch(auth=("admin", "pass1234"), share=True)

then run your application, get the internet accessible url which printed on the screen, you need keep your application running locally while have others access from this public URL. ( NOTE: internet url only available for maximum 72 hours)

How to change chain type

This project relies on LangChain , you can change the chain type according to your needs. open Enterprise_KB_Chatbot.py, find follwing line

lc_chatbot = CustomConversationalRetrievalChain.from_llm(lc_chatbot_llm, vectorstore.as_retriever(
), condense_question_prompt=MyPromptCollection.CONDENSE_QUESTION_PROMPT, chain_type="stuff") 

and change the chain_type to any of stuff, refine,map-reduce or map-rerank.

Enable Speech and Translation

speech and translation functions are disabled by default, if you have speech and translator API key configured at beginning, you can use following approach to enable.

Open Enterprise_KB_Chatbot.py, find following code

GlobalContext.ENABLE_TRANSLATION = False 
GlobalContext.ENABLE_VOICE = False  

and change to

GlobalContext.ENABLE_TRANSLATION = True 
GlobalContext.ENABLE_VOICE = True  

run the application and you will see the web page changed to below capture (Note speech only works when you run the application locally) image

Enable single turn Q&A

Single turn Q&A uses VectorDBQAWithSourcesChain, and provide single turn question and answer experience rather than conversational chatbot, which can support some special requirement for enterprise. you use follow approach to enable Open Enterprise_KB_Chatbot.py, find following code

GlobalContext.SHOW_SINGLE_TURN_QA = False

and change to

GlobalContext.SHOW_SINGLE_TURN_QA = True  

run the application and you will see the web page append additional portion at bottom as below capture image

Interaction example for multi-turn conversation

image image image

Interaction example for multi language support with cognitive translation integrated

image image

Interaction example for question un-related to the enterprise KBs

image image

NOTE : Here we can see it can return not found for most unrelated questions, but the model sometimes still try to answer some well-known questions even we have indicated not to do so in the prompt, this is also an area of prompt refinement.