The elements of a good information system
The purpose of an information system is triple: 1) preserve data; 2) present data; 3) transform data.
I outline the seven properties I believe are essential in any good information system.
-
Recoverability of data: the data should be preserved in a recoverable way that contemplates parts of the system failing. Only a highly unlikely event should result in a loss of a significant part of the data.
-
Understandability of data flows: the data flows should be understandable in a crystal-clear way.
- Data in transit: Each endpoint, whether REST or a queue, should show the inputs and outputs. Errors are also outputs.
- We should know which endpoint being hit triggers calls to other endpoints. A flow is a list of endpoints hit in sequence.
- The inputs and outputs should be shown as serializable data. No abstractions!
- Data at rest: show the structure of the database or structured files.
- This also goes for communication with third parties.
- Data in transit: Each endpoint, whether REST or a queue, should show the inputs and outputs. Errors are also outputs.
-
Testability of the system: the system should have complete and almost fully automatic tests.
- There should be a test suite that covers all the possible cases of the code.
- Because of the potential infinite inputs, to achieve completeness the testing needs to be written in a white-box way, to match the order of execution.
- Everything should be automatic, except for tasks requiring manual authentication when integrating with third-party providers.
-
Simplicity of implementation:
- Minimize lines of code and files.
- Minimize dependencies.
- Minimize technologies.
-
Observability of operations:
- All the errors generated by the system should be available.
- All the logs and metrics of the systems should be queryable.
- Any abnormal situation should be immediately reported to the responsible parties.
-
Scalability:
- The system should be able to be deployed automatically.
- The system should be able to grow in data flow and data at rest.
- The tradeoffs between consistency and availability should be minimized and should be defined explicitly.
-
Performance:
- Any operation that should have no reason to be slow should be very fast.
- Any operation that has reason to be slow should be reasonably fast.