/Orleans.Tournament

Orleans sample with clustering, implicit stream handling, authentication, authorization, websockets

Primary LanguageC#MIT LicenseMIT

Tournament Demo Project

This project is a backend oriented demo with websocket capabilities that relies heavily in the Actor Model Framework implementation of Microsoft Orleans. I gave myself the luxury of experimenting with some technologies and practices that might be useful on a real world distributed system.

Table of Contents
Demo
Domain
DDD and immutability
CQRS, projections, event sourcing and eventual consistency
Sagas
Async communication via websockets
Authentication
Infrastructure (Kubernetes)
How to run it
Use the websockets client
Tests

Demo

Click to view it on Loom with sound.

demo

Domain

The domain contains two aggregate roots: Teams and Tournaments.

A Team contains a name, a list of players (strings) and a list of tournament IDs on which the team is registered.

A Tournament contains a name, a list of team IDs participating on it, and a Fixture that contains information of each different Phase: quarter finals, semi finals and finals.

For a Tournament to start, the list of teams must have a count of 8. When it starts, the participating teams are shuffled randomly, and the Fixture's quarter Phase is populated.

A Phase is a container for a list of Matches that belong to the bracket. Each Match contains the LocalTeamId, the AwayTeamId and a MatchResult. In order not to use null values, the MatchResult is created with default values of 0 goals for each team and the property Played set to false.

When the user intents the command SetMatchResult the correct result will be assigned to the corresponding Match on the corresponding Phase. When all the Matches on a Phase are played, the next Phase is generated. Meanwhile the quarter finals are generated by a random shuffle of the participating teams, the semifinals and the finals are not generated randomly but considering the results of previous brackets.

DDD and immutability

In Actor Model Framework, each Actor (instance of an aggregate root) contains a state that mutates when applying events. Orleans is not different, and each Grain (Actor) acts the same way.

  1. Each TournamentGrain contains a TournamentState.
  2. When a TournamentGrain receives a Command, it first checks if the command is valid business wise.
  3. If the Command is valid, it will publish an Event informing what happened and modifying the State accordingly.

The TournamentState is not coupled to Orleans framework. This means that the class is just a plain one that exposes methods that acts upon certain events. This allows for replayability of events and event sourcing as a whole. The state is then mutable, but that does not mean that the properties exposed by it are mutable as well. For example the TournamentState contains the Fixture.

The Fixture is a value object implemented with C# records. This means that the methods exposed by it does not cause side effects and instead returns a new Fixture with the changes reflected. This enables an easier testability and predictability as all the methods are deterministic; you can execute a method a hundred times and the result is the same.

CQRS, projections, event sourcing and eventual consistency

One of the advantages of using an Actor Model Framework is the innate segregation of commands and queries. The commands are executed against a certain Grain and causes side effects, such as publishing an Event. However, the queries do not go against a Grain, instead they go against a read database that gets populated via projections.

CQRS and projections.

This is, in fact, eventual consistency. There are some milliseconds between the source of truth changes (Grain) are reflected in the projections which one can query. This should not be a problem, but one needs to make sure to enable retry strategies in case the database is down while consuming an event from the stream, etc.

In the scenario of a Grain being killed because of inactivity, or the Silo resetting and losing the in memory state of the Grain; each Grain can be recovered by applying the events in order:

  1. TournamentGrain is not on memory.
  2. Initialize the TournamentGrain by replaying the Events stored in the write database.
  3. Executes the SetMatchResult command, etc.

As you can see, the Grain will never use the read state as the source of truth, and instead it will rely on the event sourcing mechanism, to apply all the events again one after another until the state is up to date.

Sagas

Saga is a pattern for a distributed transaction that impacts more than one aggregate. In this case, lets look at the scenario when a Team wants to join a Tournament.

  1. TeamGrain receives the command CreateTeam and the TeamState is initialized.
  2. TournamentGrain receives the command AddTeam for the team created above, validations kick in:
    • Does the team exist? (Note: here you are not supposed to query your read database, as it not the source of truth, but actually send a message to the TeamGrain).
    • Does the tournament contain less than 8 teams?
    • Is the tournament already started?
  3. If validations are ok, publishes the TeamAdded event.

So far, the TournamentGrain is aware of the Team joining, but the TeamGrain is not aware of the participation on the Tournament. Enter the TeamAddedSaga Grain.

  1. TeamAddedSaga subscribes implicitly to an Orleans Stream looking for TeamAdded events.
  2. When it receives an event, it gets a reference for the TeamGrain with the corresponding ID.
  3. Sends the JoinTournament command to the TeamGrain.
  4. There are no extra validations needed, as everything was already validated before.
  5. TeamGrain publishes the TeamJoinedTournament event.

In this case there is not a rollback strategy implemented but these capabilities can definitely be handled with an "intermediate" Grain such as the Saga.

Async communication via websockets

Given the async nature of Orleans, the commands executed by a user should not return results on how does the resource looks after a command succeeded, not even IF the command succeeded.

Instead the response will contain a TraceId as a GUID representation on the intent of the user. The user will receive a message via websockets indicating what happened for that TraceId. The frontend can reflect the changes accordingly.

Websockets

For the graph above the work, the user should establish a websocket connection with the API. The user will only get the results for those commands he/she invoked and not for everything happening in the system.

Authentication

As not all users should receive the results for all the commands, there is a need for distinguishing users. For this purpose a super simple authentication and authorization mechanism is implemented. If you need to implement auth in a real world app, please do not reinvent the wheel and check existing solutions such as AD or Okta.

As mentioned before, each command executed will result on a TraceId, but internally there is also an InvokerUserId propagated that can also serve as an audit log for each event.

The user will only get websocket messages for those events on which the InvokerUserId matches with the one on the JWT Token.

Fallback mechanism

It makes sense to have a fallback mechanism in case a websocket event did not arrive due to network issues. This could be a key value database where a user can search for a TraceId to find the response in order to react to a command executed. This is not currently implemented but will be added later.

Infrastructure (Kubernetes)

The solution consists on different projects that need to be executed at the same time:

  • API.Identity: Create user and generate token.
  • API: Entrypoint for the user to invoke commands, connects to the Orleans cluster as a client.
  • Silo: Orleans cluster, handles everything related to Orleans such as placement, streams, etc.
  • Silo.Dashboard: Orleans client that displays metrics and information.

It also requires an instance of Postgres to be running and accepting connections.

Taking in consideration that the API could be split in smaller pieces and the Silos count being scalable, I decided to create Kubernetes manifests to host this application. As it is not a common practice to have your own Kubernetes cluster locally, I also decided on using Kind which uses the docker engine to host a single node of Kubernetes locally.

I am not going to enter into details on how this works, but just so you know that Docker and Kind are required to be installed to run this solution. Of course, one can choose to run locally instead, bear in mind that in that scenario you need to be able to connect to a Postgres instance as mentioned above.

All the Kubernetes configuration files can be found on the kubernetes folder. NOTE: There are also "secrets" on the mentioned folder, for a real world application please do not store your secrets in plain text on your repository.

How to run it

As mentioned before, it is required to have docker installed as well as kind command line and kubectl. It is also suggested to have make support, so you don't have to manually go through the commands one by one.

Initial bootstrap:

make run

Trigger rebuild of images and recreation of all the pods:

make restart

If you see something like this, everything should be up and running:

running terminal

Use kubectl instead of kb as it is an alias that I am used to.

Now you can access with the following endpoints:

You can use the Postman collection below to try out the functionality as you can see on the demo video. This collection assigns values from previous responses so you don't have to modify the requests manually but just execute them in order.

Run in Postman

Use the websockets client

As this client is not containerized, dotnet sdk is required. After creating a user and getting the token, you can execute on the utils/Websockets.Client directory:

dotnet run

Paste your user token, and when choosing an environment type K and press enter key. A message saying connected should appear. Now you should be able to get the results of all the requests fired through the API using that user token.

Tests

Tests are not yet implemented, but you can see a simple example on the Domain.Tests project. I would like the unit tests just to cover the domain functionality:

  • Given a state, apply an event, assert the modified state.
  • Given a value object, invoke a method, assert the returned record.