Sogou C++ Workflow
As Sogou`s C++ server engine, Sogou C++ Workflow supports almost all back-end C++ online services of Sogou, including all search services, cloud input method,online advertisements, etc., handling more than 10 billion requests every day. This is an enterprise-level programming engine in light and elegant design which can satisfy most C++ back-end development requirements.
You can use it:
- To quickly build an HTTP server:
#include <stdio.h>
#include "workflow/WFHttpServer.h"
int main()
{
WFHttpServer server([](WFHttpTask *task) {
task->get_resp()->append_output_body("<html>Hello World!</html>");
});
if (server.start(8888) == 0) { // start server on port 8888
getchar(); // press "Enter" to end.
server.stop();
}
return 0;
}
- As a multifunctional asynchronous client, it currently supports
HTTP
,Redis
,MySQL
andKafka
protocols. - To implement client/server on user-defined protocol and build your own RPC system.
- srpc is based on it and it is an independent open source project, which supports srpc, brpc and thrift protocols.
- To build asynchronous workflow; support common series and parallel structures, and also support any DAG structures.
- As a parallel computing tool. In addition to networking tasks, Sogou C++ Workflow also includes the scheduling of computing tasks. All types of tasks can be put into the same flow.
- As a asynchronous file IO tool in
Linux
system, with high performance exceeding any system call. Disk file IO is also a task. - To realize any high-performance and high-concurrency back-end service with a very complex relationship between computing and networking.
- To build a micro service system.
- This project has built-in service governance and load balancing features.
Compiling and running environment
- This project supports
Linux
,macOS
,Windows
and other operating systems.Windows
version is currently released as an independent branch, usingiocp
to implement asynchronous networking. All user interfaces are consistent with theLinux
version.
- Supports all CPU platforms, including 32 or 64-bit
x86
processors, big-endian or little-endianarm
processors. - Relies on
OpenSSL
;OpenSSL 1.1
and above is recommended. If you don't like SSL, you may checkout the nossl branch. But still need to linkcrypto
formd5
andsha1
. - Uses the
C++11
standard and therefore, it should be compiled with a compiler which supportsC++11
. Does not rely onboost
orasio
. - No other dependencies. However, if you need
Kafka
protocol, some compression libraries should be installed, includinglz4
,zstd
andsnappy
.
Try it!
- Client
- Server
- Parallel tasks and Series
- Important topics
- Computing tasks
- Asynchronous File IO tasks
- User-defined protocol
- Timing tasks and counting tasks
- Service governance
- Connection context
- Built-in protocols
System design features
We believe that a typical back-end program=protocol+algorithm+workflow and should be developed completely independently.
- Protocol
- In most cases, users use built-in common network protocols, such as HTTP, Redis or various rpc.
- Users can also easily customize user-defined network protocol. In the customization, they only need to provide serialization and deserialization functions to define their own client/server.
- Algorithm
- In our design, the algorithm is a concept symmetrical to the protocol.
- If protocol call is rpc, then algorithm call is an apc (Async Procedure Call).
- We have provided some general algorithms, such as sort, merge, psort, reduce, which can be used directly.
- Compared with a user-defined protocol, a user-defined algorithm is much more common. Any complicated computation with clear boundaries should be packaged into an algorithm.
- In our design, the algorithm is a concept symmetrical to the protocol.
- Workflow
- Workflow is the actual bussiness logic, which is to put the protocols and algorithms into the flow graph for use.
- The typical workflow is a closed series-parallel graph. Complex business logic may be a non-closed DAG.
- The workflow graph can be constructed directly or dynamically generated based on the results of each step. All tasks are executed asynchronously.
Basic task, task factory and complex task
- Our system contains six basic tasks: networking, file IO, CPU, GPU, timer, and counter.
- All tasks are generated by the task factory and automatically recycled after callback.
- Server task is one kind of special networking task, generated by the framework which calls the task factory, and handed over to the user through the process function.
- In most cases, the task generated by the user through the task factory is a complex task, which is transparent to the user.
- For example, an HTTP request may include many asynchronous processes (DNS, redirection), but for user, it is just a networking task.
- File sorting seems to be an algorithm, but it actually includes many complex interaction processes between file IO and CPU computation.
- If you think of business logic as building circuits with well-designed electronic components, then each electronic component may be a complex circuit.
Asynchrony and encapsulation based on C++11 std::function
- Not based on user mode coroutines. Users need to know that they are writing asynchronous programs.
- All calls are executed asynchronously, and there are almost no operation that occupys a thread.
- Although we also provide some facilities with semi-synchronous interfaces, they are not core features.
- We try to avoid user's derivations, and encapsulate user behavior with
std::function
instead, including:- The callback of any task.
- Any server's process. This conforms to the
FaaS
(Function as a Service) idea. - The realization of an algorithm is simply a
std::function
. But the algorithm can also be implemented by derivation.
Memory reclamation mechanism
- Every task will be automatically reclaimed after the callback. If a task is created but a user does not want to run it, the user needs to release it through the dismiss method.
- Any data in the task, such as the response of the network request, will also be recycled with the task. At this time, the user can use
std::move()
to move the required data. - SeriesWork and ParallelWork are two kinds of framework objects, which are also recycled after their callback.
- When a series is a branch of a parallel, it will be recycled after the callback of the parallel that it belongs to.
- This project doesn’t use
std::shared_ptr
to manage memory.
More design documents
To be continued...