English ο½ δΈζ
π Dataset | π Benchmark | π€ Models | π Paper
KwaiAgents is a series of Agent-related works open-sourced by the KwaiKEG from Kuaishou Technology. The open-sourced content includes:
- KAgentSys-Lite: a lite version of the KAgentSys in the paper. While retaining some of the original system's functionality, KAgentSys-Lite has certain differences and limitations when compared to its full-featured counterpart, such as: (1) a more limited set of tools; (2) a lack of memory mechanisms; (3) slightly reduced performance capabilities; and (4) a different codebase, as it evolves from open-source projects like BabyAGI and Auto-GPT. Despite these modifications, KAgentSys-Lite still delivers comparable performance among numerous open-source Agent systems available.
- KAgentLMs: a series of large language models with agent capabilities such as planning, reflection, and tool-use, acquired through the Meta-agent tuning proposed in the paper.
- KAgentInstruct: over 200k Agent-related instructions finetuning data (partially human-edited) proposed in the paper.
- KAgentBench: over 3,000 human-edited, automated evaluation data for testing Agent capabilities, with evaluation dimensions including planning, tool-use, reflection, concluding, and profiling.
Models | Training Data | Benchmark Data |
---|---|---|
Qwen-7B-MAT | KAgentInstruct (upcoming) |
KAgentBench |
Baichuan2-13B-MAT |
- 2023.12.13 - The benchmark and evaluation code [link] released
- 2023.12.08 - Technical report [link] released
- 2023.11.17 - Initial release
- Benchmark Results
Scale | Planning | Tool-use | Reflection | Concluding | Profile | Overall Score | |
---|---|---|---|---|---|---|---|
GPT-3.5-turbo | - | 18.55 | 15.89 | 5.32 | 37.26 | 35.42 | 21.72 |
Llama2 | 13B | 0.15 | 0.23 | 0.08 | 16.60 | 17.73 | 5.22 |
ChatGLM3 | 6B | 7.87 | 6.82 | 4.49 | 30.01 | 30.14 | 13.82 |
Qwen | 7B | 13.34 | 10.87 | 4.73 | 36.24 | 34.99 | 18.36 |
Baichuan2 | 13B | 6.70 | 10.11 | 4.25 | 24.97 | 19.08 | 12.54 |
ToolLlama | 7B | 0.20 | 3.44 | 0.54 | 15.62 | 10.66 | 5.50 |
AgentLM | 13B | 0.17 | 0.09 | 0.05 | 16.30 | 15.22 | 4.86 |
Qwen-MAT | 7B | 31.64 | 28.26 | 29.50 | 44.85 | 44.78 | 34.20 |
Baichuan2-MAT | 13B | 37.27 | 34.82 | 32.06 | 48.01 | 41.83 | 38.49 |
- Human evaluation. Each result cell shows the pass rate (%) and the average score (in parentheses)
Scale | NoAgent | ReACT | Auto-GPT | KAgentSys | |
---|---|---|---|---|---|
GPT-4 | - | 57.21% (3.42) | 68.66% (3.88) | 79.60% (4.27) | 83.58% (4.47) |
GPT-3.5-turbo | - | 47.26% (3.08) | 54.23% (3.33) | 61.74% (3.53) | 64.18% (3.69) |
Qwen | 7B | 52.74% (3.23) | 51.74% (3.20) | 50.25% (3.11) | 54.23% (3.27) |
Baichuan2 | 13B | 54.23% (3.31) | 55.72% (3.36) | 57.21% (3.37) | 58.71% (3.54) |
Qwen-MAT | 7B | - | 58.71% (3.53) | 65.67% (3.77) | 67.66% (3.87) |
Baichuan2-MAT | 13B | - | 61.19% (3.60) | 66.67% (3.86) | 74.13% (4.11) |
We recommend using vLLM and FastChat to deploy the model inference service. First, you need to install the corresponding packages (for detailed usage, please refer to the documentation of the two projects):
- For Qwen-7B-MAT, install the corresponding packages with the following commands
pip install vllm
pip install "fschat[model_worker,webui]"
- For Baichuan-13B-MAT, install the corresponding packages with the following commands
pip install "fschat[model_worker,webui]"
pip install vllm==0.2.0
pip install transformers==4.33.2
To deploy KAgentLMs, you first need to start the controller in one terminal.
python -m fastchat.serve.controller
Secondly, you should use the following command in another terminal for single-gpu inference service deployment:
python -m fastchat.serve.vllm_worker --model-path $model_path --trust-remote-code
Where $model_path
is the local path of the model downloaded. If the GPU does not support Bfloat16, you can add --dtype half
to the command line.
Thirdly, start the REST API server in the third terminal.
python -m fastchat.serve.openai_api_server --host localhost --port 8888
Finally, you can use the curl command to invoke the model same as the OpenAI calling format. Here's an example:
curl http://localhost:8888/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "kagentlms_qwen_7b_mat", "messages": [{"role": "user", "content": "Who is Andy Lau"}]}'
Here, change kagentlms_qwen_7b_mat
to the model you deployed.
Download and install the KwaiAgents, recommended Python>=3.10
git clone git@github.com:KwaiKEG/KwaiAgents.git
cd KwaiAgents
python setup.py develop
- ChatGPT usage Declare some environment variables
export OPENAI_API_KEY=sk-xxxxx
export WEATHER_API_KEY=xxxxxx
The WEATHER_API_KEY is not mandatory, but you need to configure it when asking weather-related questions. You can obtain the API key from this website (Same for local model usage).
kagentsys --query="Who is Andy Lau's wife?" --llm_name="gpt-3.5-turbo" --lang="en"
- Local model usage
To use a local model, you need to deploy the corresponding model service as described in the previous chapter
kagentsys --query="Who is Andy Lau's wife?" --llm_name="kagentlms_qwen_7b_mat" \
--use_local_llm --local_llm_host="localhost" --local_llm_port=8888 --lang="en"
Full command arguments:
options:
-h, --help show this help message and exit
--id ID ID of this conversation
--query QUERY User query
--history HISTORY History of conversation
--llm_name LLM_NAME the name of llm
--use_local_llm Whether to use local llm
--local_llm_host LOCAL_LLM_HOST
The host of local llm service
--local_llm_port LOCAL_LLM_PORT
The port of local llm service
--tool_names TOOL_NAMES
the name of llm
--max_iter_num MAX_ITER_NUM
the number of iteration of agents
--agent_name AGENT_NAME
The agent name
--agent_bio AGENT_BIO
The agent bio, a short description
--agent_instructions AGENT_INSTRUCTIONS
The instructions of how agent thinking, acting, or talking
--external_knowledge EXTERNAL_KNOWLEDGE
The link of external knowledge
--lang {en,zh} The language of the overall system
Note:
- If you need to use the
browse_website
tool, you need to configure the chromedriver on your server. - If the search fails multiple times, it may be because the network cannot access duckduckgo_search. You can solve this by setting the
http_proxy
.
We only need two lines to evaluate the agent capabilities like:
cd benchmark
python infer_qwen.py qwen_benchmark_res.jsonl
python benchmark_eval.py ./benchmark_eval.jsonl ./qwen_benchmark_res.jsonl
The above command will give the results like
plan : 31.64, tooluse : 28.26, reflextion : 29.50, conclusion : 44.85, profile : 44.78, overall : 34.20
Please refer to benchmark for more details.
@article{pan2023kwaiagents,
author = {Haojie Pan and
Zepeng Zhai and
Hao Yuan and
Yaojia Lv and
Ruiji Fu and
Ming Liu and
Zhongyuan Wang and
Bing Qin
},
title = {KwaiAgents: Generalized Information-seeking Agent System with Large Language Models},
journal = {CoRR},
volume = {abs/2312.04889},
year = {2023}
}