Distributed systems consist of multiple logical stages running on multiple machines. Data is split into entities whose state is modified while flowing through the system. The union of the entities' states in the logical stages comprises the overall state of the distributed system.
This project StateDB
implements a scalable database to manage the state of entities in a distributed system.
The StateDB
supports multi-tenancy by storing multiple system instances in namespaces (e.g. for development use cases).
Within each namespace, a collection of components stores a set of state machines for the entities of the system.
Each entity is identified by a correlation id.
The state machine consists of a set of states and transitions between the states.
Each entity is in exactly one state.
The transitions from one state to another are stored in state transitions.
This section summarizes the API to interact with the StateDB
.
The namespace is created with a POST
request on /namespace
:
{
"namespace": "<namespace>"
}
The component is created with a POST
request on /<namespace>/component
:
{
"component": "<component>",
"states": [
"created",
"in-process",
"completed",
],
"transitions": [
{
"from": "created",
"to": "in-process",
},
{
"from": "in-process",
"to": "completed",
}
],
"initial_state": "created",
"meta_data": {
"latitude": "double",
"longitude": "double",
"location": "string"
}
}
The entities are created with a POST
request on /<namespace>/<component>
:
{[
{
"entity_id": "<correlation_id1>",
"meta_data": {
"latitude": 10.0,
"longitude": 20.0,
"location": "location"
}
},
{
"entity_id": "<correlation_id2>"
}
]}
The state of one or multiple entities is updated with a POST
request on /<namespace>/<component>/entities
:
{[
{
"entity_id": "<correlation_id1>",
"meta_data": {
"latitude": 11.0,
"longitude": 21.0,
"location": "location"
},
"from_state": "created",
"to_state": "in-processing",
"compute_instance": "<instance-id>"
},
{
"entity_id": "<correlation_id2>",
"from_state": "in-processing",
"to_state": "completed",
"compute_instance": "<instance-id>"
}
]}
All state updates in one API call are performed in transactional mode such that all updates succeed or none.
The entities are query with a GET
request on /<namespace>/<component>/query
:
{
"entity_id": "<glob_expression>",
"state": "<state1>|<state2>|...|<stateN>",
"metadata": "latitude > 0.4 and longitude > 10.0 and location == 'location'",
"aggregation": "count()"
}
The query operation reads consistently from the database in the context of parallel state updates.
The statedb
executable is the central UI to interact with the database.
A new database is created with:
statedb create --working-dir <working_dir> --config <config_file>
The database service is launched with:
statedb launch --working-dir <working_dir> --port <database_port>
The database console may be launched with:
statedb console --port <database_port> --user <user> --password <password>
statedb> create namespace <namespace_file>
statedb> create component <component_file>
statedb> create entities <entities_file>
statedb> update entities <entities_file>
statedb> query entities <query_file>
The following non-functional requirements shall be included in the database design:
- 1mio entities shall be (bulk-)created in below 1min
- state updates shall be performed in parallel from multiple thousand compute instances
- entity queries in a component of 100mio entities shall complete in below 10secs