eclipse-opendut/opendut

CARL should persist its state

Closed this issue · 4 comments

kKdH commented

CARL does currently not persist any state. After a restart of CARL all resources (PeerDescriptor, ClusterConfiguration, ClusterDeployment, ...) are gone.

  • determine which resources need to be persistent.
  • select a suitable persistence architecture (relational, event sourced, document based, etc.).
  • select a suitable DBMS.
  • #187
  • implement the required logic to persist resources.
  • implement the required logic to restore resources.
  • #309

Depends on #46.

Sub-Tasks:

mbfm commented

Initial architecture thoughts:

Persistables

  • PeerDescriptors
    • DeviceDescriptors
  • ClusterConfigurations
  • Users → Keycloak will do that
  • Mapping ClientId → PeerId (#149)

Restore (after CARL start-up)

  • Clear out NetBird state

  • Recreate Peers

  • Recreate Devices

  • Recreate Clusters

  • Ensure EDGARs don't need to be re-setup.

  • Ensure Keycloak ClientIds are still linked properly.

Testing

  • Maybe restore into separate database/ResourcesManager and expect same data.
  • Diesel has test_transaction for integration tests. (Requires running PostgreSQL)

Storage

  • PostgreSQL
  • Diesel
    • Use separate model structs and convert between them
    • Introduce common conversion error type, akin to proto-conversions
  • Needs to be disableable, for unit tests.
    • ResourcesManager as abstraction layer? Or some StorageController below that?

TODO

  • CARL Architecture overview

  • All Persistables in Actions? → How do we not re-persist, if persisting is part of the Action?
    → Made more sense to hook them up directly to the ResourcesManager.

  • Maybe do restoration separately at program start, launching the Services with inherent state?
    → We store and read the state directly from the database.

  • Identify state kept outside the ResourcesManager.

    • PeerMessagingBroker → Does not need to be persisted.
    • ClusterManager → None
    • PeerManager → None (actions)
    • NetBird Server → Yes, need to purge and re-create.
      • ClusterManager does create_cluster() and delete_cluster()
      • actions::peers does create_peer() and delete_peer()
        • create_peer wanted to be moved to later, after the peer initially connects, not after its peer_descriptor was created (right? why?)
        • delete_peer wanted to also be done on generate_peer_setup()
        • generate_vpn_peer_configuration() should be renamed, it triggers a (re-)creation of the Peer-Group
          • Code needs documenting, why we do this. Maybe this should also be broken up? Not sure, whether there was a reason we pulled everything underneath the interface.
    • Keycloak → Yes, need to co-exist, restore linking ClientId ←→ PeerId
kKdH commented

I can recommend rusqlite. The advantage is, that you can run a SQLite Database within the same process. Therefore it is not required to extend THEO. For simple unit tests it is even possible to run it in memory completely.

it is as easy as opening a file:

let path = "./my_database.db";
let db = Connection::open(path)?;
mbfm commented

I can recommend rusqlite. The advantage is, that you can run a SQLite Database within the same process. Therefore it is not required to extend THEO. For simple unit tests it is even possible to run it in memory completely.

it is as easy as opening a file:

let path = "./my_database.db";
let db = Connection::open(path)?;

We decided to use testcontainers for spawning a Postgres container in unit tests, so we wouldn't need to also support SQLite specifics.

mbfm commented

Implemented as part of v0.3.0.