Async web server

Demo for web server using pub/sub and background processing written in Go.

The app consists of two commands:

  message-consumer Runs the pool of workers that consumes messages from pub/sub and processes them
  server           Runs the web server that exposes an api for creating messages

Server

The server exposes a single API endpoint POST /message and expects JSON. It simply shards the data and sends it to pub/sub. Example request body:

[
    {"timestamp":"2019-10-12T08:10:53.52Z","value":"D"},
    {"timestamp":"2019-10-12T08:10:52.52Z","value":"E"},
    {"timestamp":"2019-10-12T08:10:54.52Z","value":"F"}
]

Only timestamp is required for each entry. Status 201 means the messages were sent to the queue.

Message consumer

Each shard has a separate worker pool (fixed number of workers for now)
Worker processing timeout
Any worker will buffer the data they receive and write the messages in batches in files (containing the worker id)
Displays a history of workers and how many messages each has processed in the logs

Prerequisites

RabbitMQ 3.12.2
Go 1.20
Docker can be helpful

How to run it:

Run rabbitMQ. Recommended docker: docker run --rm --hostname my-rabbit -p 5672:5672 --name some-rabbit rabbitmq:3
Create a directory that will be output for the processed data and set it to the env variable DATA_OUTPUT_DIR
Run go build
Run async-data-processor server and async-data-processor message-consumer in a separate console window

Assumptions:

Sharding:
- It is ok to shard by timestamp, because timestamp data is distributed more or less uniformly
Regarding request:
- Value can consist of several characters and it's not required
Regarding sever:
- It's ok to show some internal messages in the API response when internal server error is returned
Regarding workers:
- We want one of the workers in the pool to process five messages as soon as we have five messages in the queue. That's why workers in a pool are sharing the same batch. I know that batch can become a deep throat here, but I felt it's more reasonable than each worker having it's own batch and compete for unbatched messages with other workers.
- Regarding the requirement: Displays a history of workers and how many messages each has processed I assumed that console logs are fine.

TODO/Corners cut:

There is no graceful shutdown on termination signal. That is, CTRL+C can cause some of the messages to be unprocessed (those that are in the batch).
Requirement It starts with 3 workers and has at most 4 workers at any time is not fullfilled, my worker pool spawns fixed number of 3 workers (configurable).
Very naive error handling in the worker (I would use errgroup if I had more time)
Messages buffer could be refactored, GetMessages and Reset should be in one function
Test coverage is low

mknycha/async-data-processor

Async web server

Server

Message consumer

Prerequisites

How to run it: