Workflow AI provides a high-performance C++ library functionally similar to LangChain and LangGraph by interacting with Large Language Models (LLMs) and extends the task serial/parallel orchestration capabilities in C++ Workflow .
Modern LLMs applications increasingly require not just API requests but also complex tool calls and agent orchestration. By abstracting the complexity of mixed I/O and compute workflows, this library makes LLMs integration easy and efficient for all C++ applications.
👉 [ 中文文档看这里! ]
- 💬 Chatbot: Basic demo which can be use directly as a chatbot
- 🔧 Tool Calls: Native support for LLM function/tool calling
- 📒 Memory: Simple memory module for multple round sessions
- ➡️ Task DAG:Build DAG with task based on C++ Workflow
- 🌊 Streaming Support: Real-time streaming responses with Sync / Async / Task APIs
- ⚡️ Parallel Execution: Execute multiple tool calls in parallel
- 🚀 High Performance: Efficient asynchronous non-blocking Network I/O and Computation
- 📮 Client / Proxy: Use as Client or Proxy. Server is comming soon
View Development Roadmap (Click to expand)
This is the very beginning of a multi-layer LLM interaction framework. Here's the implementation status and future plans:
-
Model Interaction Layer
- chat : Task → Model → Callback
- tool : Task → Model → Function → Model → Callback
- chat : Task + Get Memory → Model → Callback + Save Memory
- Streaming response by SSE
- Streamable HTTP protocol
- Model KVCache loading
- Prefill/decode optimization
-
Tool Calling Layer
- Single tool execution
- Parallel tool execution
- Workflow native task example (In progress)
- MCP Framework (Multi-tool Coordination)
- Local command execution (e.g., ls, grep)
- Remote RPC integration
-
Memory Storage Layer
- Context in-memory storage
- Offload local disk storage
- Offload distributed storage
- Asynchronous Task API
- Synchronous API (Done. 2025.August.01)
- Semi-Sync API (Done. 2025.August.16)
- Text-to-text
- Text-to-image
- Text-to-speech
- Embeddings
- DeepSeek API
- OpenAI-compatible APIs
- Claude API (Planned)
- Local model integration (Planned)
- Client mode
- Proxy mode (Partial)
- Server mode (In progress)
- Session state management
- Preset task templates (translation/summary/code)
- Prompt engineering
- Few-shot integration
- Dynamic prompt building
- JSON
- Protobuf
- Custom formats
Easy to Compile with Bazel or CMake (Click to expand)
- Required C++11 or later
git clone https://github.com/holmes1412/workflow-ai.git
cd workflow-ai# Build all targets
bazel build ...
# Run basic DeepSeek chatbot
bazel run :deepseek_chatbot -- <your_api_key># Download workflow source code and make for the first time
# git clone https://github.com/sogou/workflow.git /PATH/TO/WORKFLOW
# cd /PATH/TO/WORKFLOW && make
# build
mkdir cmake.build && cd cmake.build
cmake .. -D Workflow_DIR=/PATH/TO/WORKFLOW
make
# run sync_demo
./sync_demo <your_api_key>This exmaple shows the basic steps to chat with LLMs.
Here is a one round chat , we can use any of the three kinds of APIs : synchronous, aynchronous and task-based.
🧑💻 user request 'hi'
↓
┌───────────┐
│ Chat Task │ // asynchronous task API:
│ to LLMs │ // send request get response
└───────────┘
↓
🧑💻 extract() // for streaming get chunk
↓
🧑💻 callback() // callback and task end
🤖 1. Synchronous API
Let's begin with some simple one :chat_completion_sync() .
int main()
{
LLMClient client("YOUR_API_KEY"); // build a client by `api_key`. support `base_url` and DeepSeek is default
ChatCompletionRequest request;
request.messages.push_back({"user", "hi"});
ChatCompletionResponse response;
SyncResult result = client.chat_completion_sync(request, response);
if (result.success)
printf("%s\n", response.choices[0].message.content.c_str());
else
printf("Request Failed : %s\n", result.error_message.c_str());
return 0;
}🤖 2. Asynchronous API
If we use streaming mode to receive every chunk from LLM servers, we can use chat_completion_async() which is similar to the generator in Python.
int main()
{
LLMClient client("YOUR_API_KEY");
ChatCompletionRequest request;
request.stream = true; // set this to use streaming
request.messages.push_back({"user", "hi"});
AsyncResult result = client.chat_completion_async(request);
// ... you may do anything else until you need the data ...
while (true)
{
ChatCompletionChunk *chunk = result.get_chunk();
if (!chunk /*non-streaming*/ || chunk->state != RESPONSE_SUCCESS)
break;
if (!chunk->choices.empty() && !chunk->choices[0].delta.content.empty())
printf("%s", chunk->choices[0].delta.content.c_str());
if (chunk->last_chunk())
break;
}
// if non streaming mode, use get_response()
//ChatCompletionResponse *response = result.get_response();
}🤖 3. Task API
Task-based APIs is useful for organizing our task graph.
In this example we use create_chat_task() so we can create a WFHttpChunkedTask, which can be used with any other workflow task, push_back into a SeriesWork, ParalleWork, or a DAG in workflow .
int main()
{
LLMClient client("YOUR_API_KEY");
ChatCompletionRequest request;
request.model = "deepseek-reasoner"; // set to use model DeepSeek-R1
request.messages.push_back({"user", "hi"});
auto *task = client.create_chat_task(request, extract, callback);
task->start();
// pause or use wait_group.wait()
}
// here we get each chunk
void extract(WFHttpChunkedTask *task, ChatCompletionRequest *req, ChatCompletionChunk *chunk)
{
if (!chunk->choices.empty())
{
if (!chunk->choices[0].delta.reasoning_content.empty())
printf("%s", chunk->choices[0].delta.reasoning_content.c_str());
else if (chunk->choices[0].delta.content.empty())
printf("%s", chunk->choices[0].delta.content.c_str());
}
}
// here we get the final response, both streaming or non streaming
void callback(WFHttpChunkedTask *task, ChatCompletionRequest *req, ChatCompletionResponse *resp)
{
// if (task->get_state() == WFT_STATE_SUCESS)
if (req->model == "deepseek-reasoner")
printf("%s\n", resp->choices[0].message.reasoning_content.c_str());
printf("%s\n", resp->choices[0].message.content.c_str());
}
This example shows how to use function call as tools.
The following task flow seems more complicated just because it includes the internal architecture. Here are only 3 steps we need to pay attention to.
👩💻 preparation: register functions
↓
👩💻 user request: ask for multiple information
↓
┌───────────┐
│ Chat Task │ // asynchronous network task
│ to LLMs │ // send request get response
└───────────┘
↓
LLMs response for tool_calls
↓
create WFGoTask for local function computing
↓
┌─────────┬─────────┬─────────┐
│ Tool A │ Tool B │ Tool C │ // execute in parallel
│ Series1 │ Series2 │ Series3 │ // by compute threads
└─────────┴─────────┴─────────┘
↓
collect results
↓
┌───────────┐
│ Chat Task │ // send all the context and results
│ to LLMs │ // multi round supported by memory module
└───────────┘
↓
👩💻 extract()
↓
👩💻 callback()
🤖 Code Example.
Step-1 : Define our function which is the tool call.
This preparation only need to do once before all the requests.
The parameters for all the functions are fixed :
- arguments : the arguments from LLMs in Json format, e.g. {"location":"Shenzhen"}
- result : the return value for our function to fill
void get_current_weather(const std::string& arguments, FunctionResult *result)
{
result->success = true;
result->result = "Weather: 25°C, Sunny";
}Step-2 : Register the function into function_manager and add function_manager into client.
This is one time preparation, too.
int main()
{
LLMClient client("your_api_key");
FunctionManager func_mgr;
client.set_function_manager(&func_mgr);
// Register function
FunctionDefinition weather_func = {
.name = "get_weather",
.description = "Get current weather information"
};
func_mgr.register_function(weather_func, get_current_weather);
...
}Step-3 : Start a request with tools.
As long as we have function in manager and set request.tool_choice, LLMs will tell us how to use corresponding tools. After this information is returned, this library will help us automatically execute the tool (that is, some functions we have registered). The library will then automatically provide the response to LLM, allowing it to generate a summary based on the function results, and LLM will return the final result to us.
{
ChatCompletionRequest request;
request.model = "deepseek-chat";
request.messages.push_back({"user", "What's the weather like?"});
request.tool_choice = "auto"; // set `auto` or `required` to enable tools using
ChatCompletionResponse response;
auto result = client.chat_completion_sync(request, response);
if (result.success)
printf("%s\n", response.choices[0].message.content.c_str());
// "Shenzhen has sunny weather today, with a temperature of 25°C and pleasant weather."
}The library automatically detects when multiple tool calls are returned by the LLM and executes them in parallel using Workflow's ParallelWork:
In example/parallel_tool_call.cc, we can make request like this:
// When LLM returns multiple tool calls, they execute in parallel
request.messages.push_back({"user", "Tell me the weather in Beijing and Shenzhen, and the current time"});This will execute weather queries for both cities and time query simultaneously, significantly improving response time.
./bazel-bin/parallel_tool_call <API_KEY>
registered weather and time functions successfully.
Starting parallel tool calls test...
function calling...get_current_weather()
function calling...get_current_time()
parameters: {"location": "Beijing"}
function calling...get_current_weather()
parameters: {"location": "Shenzhen"}
parameters: {}
Response status: 200
Response Content:
The current temperature in Beijing is 30°C and in Shenzhen it is 28°C. The time now is 10:05 PM on Friday, August 8, 2025.
LLMClient: Main client for LLM interactionsFunctionManager: Manages function registration and executionChatCompletionRequest: Request send by usersChatCompletionResponse: Response data structure from LLMs
| Example | Description |
|---|---|
| sync_demo.cc | Most simplest demo in synchronous API |
| async_demo.cc | Asynchronous API to show the usage of getting streaming chunk |
| task_demo.cc | Task API which is able to create tasks for Graph, also asynchronous |
| deepseek_chatbot.cc | DeepSeek chatbot implementation for multi round session with memory |
| tool_call.cc | Basic function calling with single tool |
| parallel_tool_call.cc | Demonstrates parallel execution of multiple tools |
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Sogou Workflow v0.11.9
- OpenSSL
- pthread
For more examples, check the examples/ directory. Detailed documentations are coming soon.
Have questions, suggestions, or want to contribute? Feel free to reach out!
✉️ Email: liyingxin1412@gmail.com
🧸 GitHub: https://github.com/holmes1412
We welcome all feedback and contributions to make Workflow AI even better!