My name is Thomas Cruveilher. This repository is my submission for an assignment given by Cogent Labs during their recruitment process.
Folder and code architectures are made with Clean/Hexa/Onion architecture in mind. As the 3 revolve around the same principles with separated layers.
Here, we have domain and adapter directories.
Primary adapters are where we get inputs from. This could be UI if the project needs one, but in our case, it consists of two HTTP servers, worker
and api
.
Their only purpose is to get inputs from users/consumers and send them to the service/business layer from the domain
.
Each primary adapter directory is dependent on the framework used. I chose to use Nest.
This server receives HTTP requests to:
- trigger thumbnail's creation job
- get job status
- get all job list
- get thumbnail file
It communicates with the worker by using JobSender
implementation. The current one is webhook/http based so it sends a POST request to worker.
This whole part need to be replaced with a OpenAPI/Swagger page.
Example request (to run in your favorite http client, mine is embedeed in jetbrain, vscode has this extension to do so : https://marketplace.visualstudio.com/items?itemName=humao.rest-client)
File path is assuming you are running it from the project's root directoy.
POST http://localhost:4242/thumbnail/create
Content-Type: multipart/form-data; boundary=WebAppBoundary
--WebAppBoundary
Content-Disposition: form-data; name="file"; filename="cat-200-200"
< ./domain/picture/spec/fixture-200-200
--WebAppBoundary--
Returns job information
{
"type": "thumbnail",
"data": {
"name": "cat-200-200"
},
"id": "57a61121-2383-40c0-9943-eb7fae462685"
}
Example request
GET http://localhost:4242/job/57a61121-2383-40c0-9943-eb7fae462685
Accept: application/json
Returns job information
{
"id": "57a61121-2383-40c0-9943-eb7fae462685",
"status": "success",
"lastChangeDate": "2023-11-04T02:28:03.925Z",
"type": "thumbnail"
}
Example request
GET http://localhost:4242/job/
Accept: application/json
Returns jobs information
[
{
"id": "5db69565-db75-456b-ae26-d2d52c3c0f12",
"type": "thumbnail",
"status": "success",
"lastChangeDate": "2023-11-04T02:26:04.227Z"
},
{
"id": "57a61121-2383-40c0-9943-eb7fae462685",
"type": "thumbnail",
"status": "success",
"lastChangeDate": "2023-11-04T02:28:03.925Z"
}
]
Example request
GET http://localhost:4242/thumbnail/fixture-200-200
Accept: application/json
Returns image with Content-Type: application/octet-stream
This server receives HTTP request to execute jobs. Iit contains only one endpoint that will be called by the API.
POST http://localhost:4343/job/execute
Content-Type: application/json
{
"type": "thumbnail",
"id": "cool-id-2",
"data": {
"name": "fixture-200-200"
}
}
Always send back "ok" and will store job status.
Secondary adapters are "output" of the business/domain logic. Think about storage such as filesystem, s3 buckets or database and outward communication to external API via http request, message in queue, websocket etc.
They are implementation of interfaces from the domain, so the folder architecture is similar.
In Memory/Filesystem adapters are used by the domain during unit test, so they do not have the own test.
Other adapters using SQL for example have their own integration test (described below).
Store Job Event data somewhere.
Implementation:
- In Memory
- SQL using TypeORM
Other possible implementations:
- SQL with another framework
- Any other database type
Send job data where necessary.
Implementation:
- In Memory
- HTTP request (webhook style)
Other possible implementations:
- Add message in message queu (AMQP, MTTQ)
- Websocket
Store picture data.
Implementation:
- Filesystem
Other possible implementations:
- Bucket in cloud providers
Generate thumbnail from source image.
Implementation:
- Using
image-thumbnail
library
Other possible implementations:
- Any other library
- External API
Domain everything that is related to the business need, the "what are we doing" part.
Looking at service.ts
methods should answer this question.
Here are the actual features:
- save source image
- create thumbnail from source image
- get thumbnail by giving source image name
- create a job
- get a job status
- get all job status
- execute a job with a given strategy
All domains features are unit tested by using in memory/filesystem adapters. This allows nearly instant feedback, and thus TDD.
Deployed system has 4 components:
- API to be accessed freely
- Worker which is triggered by API via webhook
- Postgres Database for job events storage
- A shared volume to store/access thumbnail/source image
Jobs are not stored in database "as-is" and updated to reflect their status.
In place of that, it's an event sourced system where we store events related to jobs such job created, job is running, job has error, job succeeded. Few benefits is ease monitoring and stats creation.
I think that this architecture comes with a high cognitive load for developers with no experience with it.
However, anybody can learn to work, and I believe that in the long run, such architecture with no coupling, thanks to abstraction, in par with cohesive team discipline (TDD for example) drastically increase maintainability and flexibility (such as moving from one db provider to another, or one framework to another).
I chose to use a webhook based worker for simplicity/speed sake. However, this fast implementation comes with some trade-offs. The lack of retry mechanism or pressure handling (which kinda depends on the former).
I love this framework for the following reason:
- Typescript first
- Opinionated Clean/Hexa/Onion architecture
- Dependency injection system
- Can generate/host OpenAPI/Swagger documentation from code
- Can be used for HTTP API, Websocket server, Worker (handling many queues engine), Standalone apps
- Huge ecosystem
- Awesome documentation
- Overall DX
But it comes with a learning curve, like everything.
I chose image-thumbnail
because its installation is easy.
Potentially more performant library all requires MagickImage cli to be installed before hand.
Cogent labs using a SQL database, looked like the obvious choice.
Also, Event-Sourcing requires auto-increment ID.
- Node v20.9.0
- Yarn 1.22.19
- Docker. Tested with v20.10.6
- Docker compose. Tested with v1.19.1
This project uses yarn workspace, simply run yarn
at the project root folder.
docker compose up
Will spawn all components.
API is accessible via port 4242
We have 3 testing strategies, unit, integration and e2e. All can be ran from the root folder.
Integration and e2e requires a running DB (use docker compose up
)
This strategy test the business logic/domain code by using in memory/filesystem secondary adapters. They are fast, and intended to always be running with file watching while you are programming.
yarn test
yarn test:watch
This strategy makes sure that we correctly implemented the external dependencies, such as the SQL database, in our secondary adapters.
Requires a running DB (use docker compose up
)
yarn test:integration
yarn test:integration:watch
This strategy make sure that we integrated everything correctly in our primary adapters API and Worker.
They are "end-to-end" relative to each adapter context. They are not tests from the end user/consumer perspective with all systems running.
For example, API tests call all API endpoints, and ensure that we get the expected response, but does not check that the worker did what it has to do.
Requires a running DB (use docker compose up
)
yarn test:e2e
- Retry mechanism for webhook delivery
- Add reasons on error when getting job(s) status
- Snapshots to offload processing
- Pagination on get job (cannot be done efficiently without snapshots)
- Handle image extensions
- Setup OpenAPI/Swagger documentation
- Handle all non optimistic edge cases such as when images/jobs do not exist, or external providers fails.
- Improve Job data and JobEvent data attributes, probably with generic
- Harmonise linters between domain and adapters (prettier is used in Nest applications)
- Run "real" e2e test with containers running, and api really calling the worker
- Add readme in domain and adapters folders
- Prevent access to worker's endpoint from outside (only other pod should be able to access it)