Stream Kernel
Max-Meldrum opened this issue · 3 comments
Moving forward we will need a stream kernel
for the data processing layer that is specifically designed for Arcon. This issue serves as a direction (I.e not prioritised in short term).
Related issues:
Kernel
The Kernel represents an application-level OS that manages Task scheduling, memory management, and I/O. The idea is to cooperatively schedule a set of tasks on a single core in order to get better CPU utilisation + locality between tasks + avoid context switches. As noted here, storage and networking are no longer the bottlenecks in a modern data center, but CPUs are.
Running a Thread-Per-Core model with cooperative scheduling is not something unique itself. Down below are some data-parallel systems that execute it with success.
Rough Overview Sketch
The Kernel has the following context that is shared between tasks that it executes:
- Time (Watermark)
- Epoch
- Logger
- State Backend(s)
Task
A cooperative async long-running future that drives the execution of a dataflow node. A Task must always check if it should yield back control to other tasks in order to not block progress.
// source example
let source = ...;
let task: Task = async move {
loop {
while let Some(elem) = source.poll_next().await {
// output elem to next task.
// yield back control to scheduler at times..
}
}
};
Tasks may send their output in 3 different ways:
- Intra-Kernel with
Rc<RefCell<Vec<T>>>
- Local Inter-Kernel with explicit queues (e.g., spsc)
- Remote Inter-Kernel over the wire
Application-level Task Scheduling
API levels
Suggested by @segeljakt
High-level API: builtin operators (map, filter, window, keyby)
Mid-level API: operator constructors + event handlers
Low-level API: tasks/async functions + channels
Async-friendly Runtime
Currently, it is hard to support async
interfaces/crates. Two prime examples are source implementations and supporting state backends that are async.
// Rough sketching of source, operator, and state interfaces.
#[async_trait]
pub trait Source {
type Item: ArconType;
async fn poll_next(&mut self) -> Result<Poll<Self::Item>>;
}
#[async_trait]
pub trait Operator {
async fn handle_elem(&mut self, elem: ArconElement<Self::IN>);
}
#[async_trait]
pub trait ValueIndex<V: Value> {
async fn get(&mut self) -> Result<Option<V>>;
async fn put(&mut self, value: V) -> Result<()>;
}
Glommio
Glommio is a Seastar inspired TPC cooperative threading framework built in Rust. It relies on Linux and its io_uring
functionality. This is the only notable downside of adopting Gloomio for Arcon. That is, making it a Linux only system. But then again, data-intensive systems such as Arcon are supposed to run on Linux anyway.
Another downside is that Glommio runs on the assumption that the machine has NVMe storage. Specific OS + Hardware requirements will make it harder to run or to contribute to Arcon.
Pros:
- Configurable scheduling priority (latency matters vs. not)
- First-class
io_uring
and Direct I/O support (future backend) - Designed to be used in data and I/O intensive systems (Arcon)
- Offers placement strategies (NUMA etc..)
Cons:
- Linux + Hardware Requirements
- Not as many users/contributors as for example
tokio
Article about Gloomio may be found here.
Other Async runtime candidates
Tokio
The other approach is to adopt tokio and use a similar approach to the actix runtime where they spawn a runtime per thread and combine it with a LocalSet to support !Send
futures.
Going with tokio as the default kernel runtime is a bit safer and makes it easier to contribute but also play around with Arcon. I believe it should be possible to support glommio later on with a glommio_rt
feature flag.
I wonder if we could combine kompact and tokio to get "the best of both worlds". Tokio integrates us with the async ecosystem which I think is crucial for streaming applications. Kompact on the other hand gives us networking (while tokio only gives us sockets).
If it's possible, I think a nice solution could be to implement some type of network channel using kompact which tokio tasks on different machines can use for communication.
@Max-Meldrum although distributed Arcon is not currently planned, what do you think?
I wonder if we could combine kompact and tokio to get "the best of both worlds". Tokio integrates us with the async ecosystem which I think is crucial for streaming applications. Kompact on the other hand gives us networking (while tokio only gives us sockets).
If it's possible, I think a nice solution could be to implement some type of network channel using kompact which tokio tasks on different machines can use for communication.
@Max-Meldrum although distributed Arcon is not currently planned, what do you think?
yeah, worth seeing if it's possible to combine or perhaps reuse implementation parts of the networking.