🧐 Knowledge QA LLM

📣 We're looking for front-end development engineers interested in Knowledge QA with LLM, who can help us achieve front-end and back-end separation with our current implementation.

Questions & Answers based on local knowledge base + LLM.
Reason:
- The idea of this project comes from Langchain-Chatchat
- I have used this project before, but it is not very flexible and deployment is not very friendly.
- Learn from the ideas in How to build a knowledge question answering system with a large language model, and try to use this as a practice.
Advantage:
- The whole project is modularized and does not depend on the lanchain library, each part can be easily replaced, and the code is simple and easy to understand.
- In addition to the large language model interface that needs to be deployed separately, other parts can use CPU.
- Support documents in common formats, including txt, md, pdf, docx, pptx, excel etc. Of course, other types of documents can also be customized and supported.

Architecture

Parse the document and store it in the database

flowchart LR

A([Documents]) --ExtractText--> B([sentences])
B --Embeddings--> C([Embeddings])
C --Store--> D[(DataBase)]

Retrieve and answer questions

flowchart LR
E([Query]) --Embedding--> F([Embeddings]) --> H[(Database)] --Search--> G([Context])
E --> I([Prompt])
G --> I --> J([LLM]) --> K([Answer])

Installation

Clone the whole repo into local directory.

git clone https://github.com/RapidAI/Knowledge-QA-LLM.git

Install the requirements.

cd Knowledge-QA-LLM
pip install -r requirements.txt

Download the moka-ai/m3e-small model and put it in the assets/models/m3e-small directory. This model is used to vectorize text content.
Separately configure the interface of chatglm2-6b, interface startup reference: ChatGLM2-6B API. The specific usage method Reference: knowledge_qa_llm/llm/chatglm2_6b.py
Write the deployed llm_api to the llm_api_url field in the configuration file knowledge_qa_llm/config.yaml.

Usage

Run
```
streamlit run webui.py
```
UI Demo
CLI Demo
```
python cli.py
```

🛠 Tools Used

Document analysis: extract_office_content, rapidocr_pdf, rapidocr_onnxruntime
Extract feature vector: moka-ai/m3e-small
Vector storage: sqlite
Vector retrieval: faiss
UI: streamlit>=1.25.0

📂 File structure

.
├── assets
│ ├── db                # store vector database
│ ├── models            # place the model for extracting embedding
│ └── raw_upload_files
├── knowledge_qa_llm
│ ├── __init__.py
│ ├── config.yaml       # configuration file
│ ├── file_loader       # Handle documents in various formats
│ ├── encoder           # Extract embeddings
│ ├── llm               # Large model interface, the large model needs to be deployed separately and called by interface
│ ├── utils
│ └── vector_utils      # embedding access and search
├── LICENSE
├── README.md
├── requirements.txt
├── tests
├── cli.py
└── webui.py            # UI implementation based on streamlit

Changelog

Click to expand

2023-08-29 v0.0.8 update:
- Fixed missing embedding_extract
- Fixed default parameters of LLM
2023-08-11 v0.0.7 update:
- Optimize layout, remove the plugin option, and put the extract vector model option on the home page.
- The tips are translated into English for easy communication.
- Add project logo:🧐
- Update CLI module code.
2023-08-05 v0.0.6 update:
- Adapt more llm_api, include online llm api, such ad ERNIE-Bot-Turbo.
- Add the status of extracting embeddings.
2023-08-04 v0.0.5 update:
- Fixed the problem of duplicate data inserted into the database.
2023-07-29 v0.0.4 update:
- Reorganize the UI based streamlit==1.25.0
- Optimize the code.
- Record the GIF demo of UI.
2023-07-28 v0.0.3 update:
- Finish the file_loader part.
2023-07-25 v0.0.2 update:
- Standardize the existing directory structure, more compact, extract some variables into config.yaml
- Perfect documentation

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.

yuanmeng1120/Knowledge-QA-LLM

🧐 Knowledge QA LLM

📣 We're looking for front-end development engineers interested in Knowledge QA with LLM, who can help us achieve front-end and back-end separation with our current implementation.

Architecture

Installation

Usage

🛠 Tools Used

📂 File structure

Changelog

Contributing