English | 简体中文

HuixiangDou is a group chat assistant based on LLM (Large Language Model).

Advantages:

Design a three-stage pipeline of preprocess, rejection and response to cope with group chat scenario, answer user questions without message flooding, see 2401.08772, 2405.02817, Hybrid Search and Precision Report.
Low cost, requiring only 1.5GB memory and no need for training
Offers a complete suite of Web, Android, and pipeline source code, which is industrial-grade and commercially viable

Check out the scenes in which HuixiangDou are running and join WeChat Group to try AI assistant inside.

If this helps you, please give it a star ⭐

🔆 News

🔥 Join the InternLM Summer Camp3 now!

The web portal is available on OpenXLab, where you can build your own knowledge assistant without any coding, using WeChat and Feishu groups.

Visit web portal usage video on YouTube and BiliBili.

[2024/07] Hybrid knowledge graph and dense retrieval boost 1.7% F1 score 🎯
[2024/07] config.ini support LLM Reranker
[2024/06] Evaluation of Chunk Size, Splitter and Model🎯
[2024/05] wkteam WeChat access, support image, URL and reference resolution in group chat
[2024/05] Add Coreference Resolution fine-tune 🎯

🤗 LoRA-Qwen1.5-14B LoRA-Qwen1.5-32B alpaca data arXiv
[2024/04] Add SFT data annotation and examples
[2024/04] Update preprint
[2024/04] Release web server source code 👍
[2024/03] New wechat integration method with prebuilt android apk !
[2024/02] [experimental] Integrated multimodal model into our wechat group for OCR

📖 Support

Model	File Format	Retrieve Method	IM Application
InternLM2 Qwen/Qwen2 KIMI DeepSeek Step GLM (ZHIPU) SiliconCloud Xi-Api OpenAOE	pdf word excel ppt html markdown txt	Knowledge Graph BCEmbedding bge/bge-m3	WeChat Lark

📦 Hardware

The following are the hardware requirements for running. It is suggested to follow this document, starting with the basic version and gradually experiencing advanced features.

Version	GPU Memory Requirements	Features
Cost-effective Edition	1.5GB	Use openai API (e.g., kimi and deepseek) to handle source code-level issues Free within quota
Standard Edition	19GB	Deploy local LLM can answer basic questions
Complete Edition	40GB	Fully utilizing search + long-text, answer source code-level questions

🔥 Run

First agree BCE license and login huggingface.

huggingface-cli login

Then install requirements.

# parsing `word` format requirements
apt update
apt install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig libpulse-dev
# python requirements
pip install -r requirements.txt

Standard Edition

The standard edition runs text2vec, rerank and a 7B model locally.

STEP1. First, without rejection pipeline, run test cases:

# Standalone mode
# main creates a subprocess to run the LLM API, then sends requests to the subprocess
python3 -m huixiangdou.main --standalone
..
+-------------------------+-------------------------+---------------+------------+
|          Query          |          State          | Part of Reply | References |
+=========================+=========================+===============+============+
| How to install mmpose ? | Topics unrelated to the | ..            |            |
|                         | knowledge base..        |               |            |
+------------  -----------+-------------------------+---------------+------------+

You can see that the result of handling the example question in main.py is the same, whether it's about mmpose installation or How's the weather tomorrow?

STEP2. Use mmpose and test documents to build a knowledge base and enable the rejection pipeline

Copy all the commands below (including the '#' symbol) and execute them.

# Download knowledge base documents
cd HuixiangDou
mkdir repodir
git clone https://github.com/open-mmlab/mmpose --depth=1 repodir/mmpose
git clone https://github.com/tpoisonooo/huixiangdou-testdata --depth=1 repodir/testdata

# Save the features of repodir to workdir
mkdir workdir
python3 -m huixiangdou.service.feature_store

Note

If restarting local LLM is too slow, first python3 -m huixiangdou.service.llm_server_hybrid, then open a new terminal, and only execute python3 -m huixiangdou.main without restarting LLM.

Then rerun main, Huixiangdou will be able to answer mmpose installation and reject casual chats.

python3 -m huixiangdou.main --standalone

+-----------------------+---------+--------------------------------+-----------------+
|         Query         |  State  |         Part of Reply          |   References    |
+=======================+=========+================================+=================+
| How to install mmpose?| success | To install mmpose, please..    | installation.md |
+-----------------------+---------+--------------------------------+-----------------+

Please adjust the repodir documents, good_questions, and bad_questions to try your own domain knowledge (medical, financial, power, etc.).

STEP3. Test sending messages to Feishu group (optional)

This step is just for testing algorithm pipeline, STEP4 also support IM applications.

Click Create Feishu Custom Bot to obtain the callback WEBHOOK_URL and fill it in config.ini

# config.ini
...
[frontend]
type = "lark"
webhook_url = "${YOUR-LARK-WEBHOOK-URL}"

Run. After the end, the technical assistant's response will be sent to Feishu group.

python3 -m huixiangdou.main --standalone

STEP4. WEB service and IM applications

We provide a complete front-end UI and backend service that supports:

Multi-tenant management
Zero-programming access to Feishu, WeChat groups

See the effect at OpenXlab APP, please read the web deployment document.

Cost-effective Edition

If your machine only has 2G GPU memory, or if you are pursuing cost-effectiveness, you only need to read this Zhihu document.

The cost-effective version only discards the local LLM and uses the remote LLM instead, and other functions are the same as the standard version.

Take kimi as an example, fill in the API KEY applied from the official website into config-2G.ini

# config-2G.ini
[llm]
enable_local = 0
enable_remote = 1
...
remote_type = "kimi"
remote_api_key = "YOUR-API-KEY-HERE"

Note

The worst case for each Q&A is to call the LLM 7 times, subject to the free user RPM limit, you can modify the rpm parameter in config.ini

Execute the command to get the Q&A result

python3 -m huixiangdou.main --standalone --config-path config-2G.ini # Start all services at once

Complete Edition

The HuixiangDou deployed in the WeChat group is the complete version.

When 40G of GPU memory is available, long text + retrieval capabilities can be used to improve accuracy.

Please read following topics

🛠️ FAQ

What if the robot is too cold/too chatty?
- Fill in the questions that should be answered in the real scenario into resource/good_questions.json, and fill the ones that should be rejected into resource/bad_questions.json.
- Adjust the theme content in repodir to ensure that the markdown documents in the main library do not contain irrelevant content.
Re-run feature_store to update thresholds and feature libraries.

⚠️ You can directly modify reject_throttle in config.ini. Generally speaking, 0.5 is a high value; 0.2 is too low.
Launch is normal, but out of memory during runtime?

LLM long text based on transformers structure requires more memory. At this time, kv cache quantization needs to be done on the model, such as lmdeploy quantization description. Then use docker to independently deploy Hybrid LLM Service.
How to access other local LLM / After access, the effect is not ideal?
- Open hybrid llm service, add a new LLM inference implementation.
- Refer to test_intention_prompt and test data, adjust prompt and threshold for the new model, and update them into worker.py.
What if the response is too slow/request always fails?
- Refer to hybrid llm service to add exponential backoff and retransmission.
- Replace local LLM with an inference framework such as lmdeploy, instead of the native huggingface/transformers.
What if the GPU memory is too low?

At this time, it is impossible to run local LLM, and only remote LLM can be used in conjunction with text2vec to execute the pipeline. Please make sure that config.ini only uses remote LLM and turn off local LLM.

No module named 'faiss.swigfaiss_avx2' locate installed faiss package

import faiss
print(faiss.__file__)
# /root/.conda/envs/InternLM2_Huixiangdou/lib/python3.10/site-packages/faiss/__init__.py

add soft link

# cd your_python_path/site-packages/faiss
cd /root/.conda/envs/InternLM2_Huixiangdou/lib/python3.10/site-packages/faiss/
ln -s swigfaiss.py swigfaiss_avx2.py

🍀 Acknowledgements

KIMI: long context LLM
BCEmbedding: Bilingual and Crosslingual Embedding (BCEmbedding) in English and Chinese
Langchain-ChatChat: ChatGLM Application based on Langchain
GrabRedEnvelope: Grab Wechat RedEnvelope

📝 Citation

@misc{kong2024huixiangdou,
      title={HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance},
      author={Huanjun Kong and Songyang Zhang and Jiaying Li and Min Xiao and Jun Xu and Kai Chen},
      year={2024},
      eprint={2401.08772},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{kong2024huixiangdoucr,
      title={HuixiangDou-CR: Coreference Resolution in Group Chats},
      author={Huanjun Kong},
      year={2024},
      eprint={2405.02817},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

boshallen/HuixiangDou