The scenario is as follows:
- Developers are required to build new APIs consistently to achieve specific business objectives.
- The technical competence to specify and support the building of these APIs is currently a scarce resource, which slows the company down.
- The mobile view shown in Appendix 1 is a frontend-view which lacks an API solution on the backend.
- The API definition and implementation can often occur too late in the SDLC which is something we would like to improve upon.
- Product Managers only have limited technical knowledge and require assistance in describing the technical implementation details for less experienced developers.
- The frontend-developers are competent and can be frustrated by delays in API availability
- We want work between frontend and backend to happen in parallel.
It seems that the core problem is the growing business outpaces the capacity of the backend engineering teams.
Improving productivity of the backend developers would help solve the problem.
Hence DORA framework is a suitable methodology to be applied in this case.
Our goal is to improve these DORA metrics:
- Frequency of deployments
- Lead time for changes
- Change failure rate
- Time to recovery
Also, I would like to add one addtional metric:
- Time from idea to code
Micro-services are much easier to understand and develop. Product managers can get shorter feedback loops with less experienced developers.
It basically shifts the burden of understanding/managing complexity from developers to architects.
In this case, the service is divided into a few micro-services with a single responsibility.
- The Account micro-service
- The Invoice micro-service
- The Transaction micro-service
- The Card micro-service
- The Customer-service micro-service
AWS Cloud Map is used for service discovery.
Also, a rpc-client is provided.
This is demostrated in:
- The OpenAPI Spec
- The interfaces defined in typescript
- The validition of interfaces
With interfaces well defined and respected, frontend and backend engineers are able to work in paralle.
This is demostrated in the github actions workflow
Everything is code:
- The business logic
- The infrastructure
- The access control policy
- The configuration and secrets
- The test data
- The pipelines themselves
One thing that stops less experienced developers from releasing is the fear of casuing incidents.
To mitigate the fear, we should ensure that systems are reliable, secure and scalable out of box.
- Integrate lint tools, unit test suite into the pipeline.
- Run end-2-end test suite in the pipeline.
A decent infrastructure architecture is provided out of box.
- The database is always backed up.
- The database is always HA in production environment.
- A load balancer is always in front of a service.
- Network Access control is in place to limit blasting radius in case that some services are compromised.
- So is the IAM access control.
- A healthcheck endpoint is required and checked.
- Database foreign keys are not used.
- All services are stateless.
- Logs and metrics is provided out of box. Tracing can be enabled.
- A common logger is provided as a module.
- In worst case, we can rebuild the whole system in another region.
- All the code and artifacts are version controlled.
- DB performance insight can be enabled if needed.
- Run db operations and S3 presigned-url operation in parallel.
- Refine IAM policies to apply least-priviledges-principle.
- Use flyaway or liquibase to integrate DB management into the CI/CD pipeline. At this moment, tables are created manually.
- Add customized security groups for containers and DBs.
- The infrastructure should be split into 3 parts: the foundation(ECR, cluster), the service, the dependencies(DB, S3)
- HTTPS is not used since certificate doesn't use a pay-as-you-use pricing model.
- Communication among services are not encrypted.