/HuixiangDou

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

English | 简体中文

Wechat PyPI YouTube BiliBili discord Arxiv Arxiv

HuixiangDou is a group chat assistant based on LLM (Large Language Model).

Advantages:

  1. Design a three-stage pipeline of preprocess, rejection and response to cope with group chat scenario, answer user questions without message flooding, see 2401.08772, 2405.02817, Hybrid Retrieval and Precision Report.
  2. Low cost, minimum requirement of 2GB memory and no need for training
  3. Offers a complete suite of Web, Android, and pipeline source code, industrial-grade and commercially viable

Check out the scenes in which HuixiangDou are running and join WeChat Group to try AI assistant inside.

If this helps you, please give it a star ⭐

🔆 New Features

Our Web version has been released to OpenXLab, where you can create knowledge base, update positive and negative examples, turn on web search, test chat, and integrate into Feishu/WeChat groups. See BiliBili and YouTube !

📖 Support Status

LLM File Format Retrieval Method Instant Messaging Preprocessing
  • pdf
  • word
  • excel
  • ppt
  • html
  • markdown
  • txt
  • WeChat
  • Lark

📦 Hardware Requirements

The following are the GPU memory requirements for different features, the difference lies only in whether the options are turned on.

Configuration Example GPU mem Requirements Description Verified Devices on Linux System
config-2G.ini 2GB Use openai API (such as kimi, deepseek, stepfun and siliconcloud) to search for text only
config-multimodal.ini 10GB Use openai API for LLM, image and text retrieval
[Standard Edition] config.ini 19GB Local deployment of LLM, single modality
config-advanced.ini 80GB local LLM, anaphora resolution, single modality, practical for WeChat group

🔥 Running the Standard Edition

We take the standard edition (local running LLM, text retrieval) as an introduction example. Other versions are just different in configuration options.

I. Download and install dependencies

Click to agree to the BCE model agreement, log in huggingface

huggingface-cli login

Install dependencies

# parsing `word` format requirements
apt update
apt install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig libpulse-dev
# python requirements
pip install -r requirements.txt
# For python3.8, install faiss-gpu instead of faiss

II. Create knowledge base and ask questions

Use mmpose documents to build the mmpose knowledge base and filtering questions. If you have your own documents, just put them under repodir.

Copy and execute all the following commands (including the '#' symbol).

# Download the knowledge base, we only take the documents of mmpose as an example. You can put any of your own documents under `repodir`
cd HuixiangDou
mkdir repodir
git clone https://github.com/open-mmlab/mmpose    --depth=1 repodir/mmpose

# Save the features of repodir to workdir, and update the positive and negative example thresholds into `config.ini`
mkdir workdir
python3 -m huixiangdou.service.feature_store

After running, test with python3 -m huixiangdou.main --standalone. At this time, reply to mmpose related questions (related to the knowledge base), while not responding to weather questions.

python3 -m huixiangdou.main --standalone

+---------------------------+---------+----------------------------+-----------------+
|         Query             |  State  |         Part of Reply      |   References    |
+===========================+=========+============================+=================+
| How to install mmpose?    | success | To install mmpose, plea..  | installation.md |
--------------------------------------------------------------------------------------
| How is the weather today? | unrelated.. | ..                     |                 |
+-----------------------+---------+--------------------------------+-----------------+
🔆 Input your question here, type `bye` for exit:
..

Note

If restarting LLM every time is too slow, first python3 -m huixiangdou.service.llm_server_hybrid; then open a new window, and each time only execute python3 -m huixiangdou.main without restarting LLM.

💡 也可以启动 gradio 搭建一个简易的 Web UI,默认绑定 7860 端口:

python3 -m huixiangdou.gradio

💡 Also run a simple Web UI with gradio:

python3 -m tests.test_query_gradio

Or run a server to listen 23333:

python3 -m huixiangdou.server

# test async API 
curl -X POST http://127.0.0.1:23333/huixiangdou_stream  -H "Content-Type: application/json" -d '{"text": "how to install mmpose","image": ""}'
# cURL sync API
curl -X POST http://127.0.0.1:23333/huixiangdou_inference  -H "Content-Type: application/json" -d '{"text": "how to install mmpose","image": ""}'

Please update the repodir documents, good_questions and bad_questions, and try your own domain knowledge (medical, financial, power, etc.).

III. Integration into Feishu, WeChat group

IV. Deploy web front and back end

We provide typescript front-end and python back-end source code:

  • Multi-tenant management supported
  • Zero programming access to Feishu and WeChat
  • k8s friendly

Same as OpenXlab APP, please read the web deployment document.

🍴 Other Configurations

2G Cost-effective Edition

If your GPU mem exceeds 1.8G, or you pursue cost-effectiveness. This configuration discards the local LLM and uses remote LLM instead, which is the same as the standard edition.

Take siliconcloud as an example, fill in the API TOKEN applied from the official website into config-2G.ini

# config-2G.ini
[llm]
enable_local = 0   # Turn off local LLM
enable_remote = 1  # Only use remote
..
remote_type = "siliconcloud"   # Choose siliconcloud
remote_api_key = "YOUR-API-KEY-HERE" # Your API key
remote_llm_model = "alibaba/Qwen1.5-110B-Chat"

Note

Each Q&A scenario requires calling the LLM 7 times at worst, subject to the free user RPM limit, you can modify the rpm parameter in config.ini

Execute the following to get the Q&A results

python3 -m huixiangdou.main --standalone --config-path config-2G.ini # Start all services at once

10G Multimodal Edition

If you have 10G GPU mem, you can further support image and text retrieval. Just modify the model used in config.ini.

# config-multimodal.ini
# !!! Download `https://huggingface.co/BAAI/bge-visualized/blob/main/Visualized_m3.pth`    to `bge-m3` folder !!!
embedding_model_path = "BAAI/bge-m3"
reranker_model_path = "BAAI/bge-reranker-v2-minicpm-layerwise"

Note:

Run gradio to test, see the image and text retrieval result here.

python3 tests/test_query_gradio.py

80G Complete Edition

The "HuiXiangDou" in the WeChat experience group has enabled all features:

  • Serper search and SourceGraph search enhancement
  • Group chat images, WeChat public account parsing
  • Text coreference resolution
  • Hybrid LLM
  • Knowledge base is related to openmmlab's 12 repositories (1700 documents), refusing small talk

Please read the following topics:

🛠️ FAQ

  1. What if the robot is too cold/too chatty?

    • Fill in the questions that should be answered in the real scenario into resource/good_questions.json, and fill the ones that should be rejected into resource/bad_questions.json.
    • Adjust the theme content in repodir to ensure that the markdown documents in the main library do not contain irrelevant content.

    Re-run feature_store to update thresholds and feature libraries.

    ⚠️ You can directly modify reject_throttle in config.ini. Generally speaking, 0.5 is a high value; 0.2 is too low.

  2. Launch is normal, but out of memory during runtime?

    LLM long text based on transformers structure requires more memory. At this time, kv cache quantization needs to be done on the model, such as lmdeploy quantization description. Then use docker to independently deploy Hybrid LLM Service.

  3. How to access other local LLM / After access, the effect is not ideal?

  4. What if the response is too slow/request always fails?

    • Refer to hybrid llm service to add exponential backoff and retransmission.
    • Replace local LLM with an inference framework such as lmdeploy, instead of the native huggingface/transformers.
  5. What if the GPU memory is too low?

    At this time, it is impossible to run local LLM, and only remote LLM can be used in conjunction with text2vec to execute the pipeline. Please make sure that config.ini only uses remote LLM and turn off local LLM.

  6. No module named 'faiss.swigfaiss_avx2' locate installed faiss package

    import faiss
    print(faiss.__file__)
    # /root/.conda/envs/InternLM2_Huixiangdou/lib/python3.10/site-packages/faiss/__init__.py

    add soft link

    # cd your_python_path/site-packages/faiss
    cd /root/.conda/envs/InternLM2_Huixiangdou/lib/python3.10/site-packages/faiss/
    ln -s swigfaiss.py swigfaiss_avx2.py

🍀 Acknowledgements

📝 Citation

@misc{kong2024huixiangdou,
      title={HuiXiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance},
      author={Huanjun Kong and Songyang Zhang and Jiaying Li and Min Xiao and Jun Xu and Kai Chen},
      year={2024},
      eprint={2401.08772},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{kong2024huixiangdoucr,
      title={HuiXiangDou-CR: Coreference Resolution in Group Chats},
      author={Huanjun Kong},
      year={2024},
      eprint={2405.02817},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}```