A flash sympathetic data stream, leveraging latest io_uring features for fast asynchronous IO.
2018 edition Rust Linux, io_uring is a Linux only feature Kernel version 5.6+, certain io_uring features used require a relatively modern kernel version
Eventually this will be an event stream similar to, and inspired by Apache Kafka. The ultimate goal is to create a modern event stream borrowing new concepts from academia, and built to leverage modern hardware and operating system features.
In progress:
- Leverage io_uring features for network and filesystem access
- More optimal API, less system calls, less interupts
- this is especially good in the times where patches for Specter and Meltdown reduce system call performance
- it's easier to share buffers between user and kernel space meaning less copying
- hypervisor passthru features can improve efficiency during use in many clouds
- mmap and buffered IO are nice, but we can't control the kernel behavior entirely implementing our own virtual memory layer allows us to control our own cache.
- More optimal API, less system calls, less interupts
- Safety and reliability are important, anything unsafe needs to be well tested and fuzzed.
- Copying is not fast, don't copy unless it's unsafe not to, opt for safe zero-copy always.
- This may include avoiding copying memory between user and kernel space.
- KISS, keep things as simple as possible for desired features
- Networking should be simple. No need for complex serialization. Wire protocol is a simple binary protocol.
Planned:
- Virtual memory system
- Leverage O_DIRECT to bypass kernel cacheing
- Implement a purpose built concurrent paging system
- Lock free would be great(study the design of sled project)
- Partitioning
- Partitions are split by a provided key using hash
- Replication
- Partitions are replicated by a given replication factor
- Membership, consensus, and gossip
- TBD
- AP is probably more important than CP in this case, because consistency to the reader doesn't mean much since they're mostly waiting for events.
- consistency only really matters to ensure order from multiple producers, which is still not important for most cases.
- Probably have a raft group per partition
- Ideas:
- instead of something like zookeeper perhaps a gossip to share cluster configuration
- global leaders aren't very scalable, perhaps partition leaders or completely decentralized writes(write to any node of with that partition).
- clients should be smart
- clients should send data to nodes who are concerned with that partition, rather than
having nodes have to forward data they may never store.
- in a leader system, probably send to the partition leader
- in a decentralized system any node with that partition will do
- clients should be informed of the cluster configuration through some mechanism
so they can make optimal network decisions.
- either they are part of the gossip "ring"
- or they periodically grab the configuration from a node
- access should be read only for clients
- clients should send data to nodes who are concerned with that partition, rather than
having nodes have to forward data they may never store.
- Be more efficient than Apache Kafka
- Already don't have a garbage collector
- Probably avoid using Arc pointers for things that don't live long
- fewer allocations since we don't need the heap as much
- the stack is fast
- io_uring means less system calls, and less copying
- machine code baby!
- Already don't have a garbage collector