Group project for Distributed Systems course at the University of Oulu.
Name | Student ID | GitHub |
---|---|---|
Perttu Kärnä | 2465119 | ppeerttu |
Guanchen LI | 2627656 | Ivan-Lee99 |
Juho Kantola | 2519793 | knatola |
Xin Shu | 2627520 | Mr-Sushi |
The system can be ran locally by using docker-compose
with the following simple steps:
- Prepare
.env
file out of example.env - Do similar things for the
example.*.env
-files at the following directories- auth-service:
.db.env
,.auth.env
- user-service:
.db.env
- auth-service:
- Run
sh run_docker_compose.sh
to create required network, volumes and finally launch containers- Subsequent runs can be done by simple
docker-compose up
- Subsequent runs can be done by simple
This project is course work implementation for Distributed Systems course at the University of Oulu. The group members (and collaborators) are mentioned in group members section.
This system is a naive clone of Instagram, containing mainly the backend functionality. We're not considering the client implementation in this project. The functionality that the system covers is as follows:
- Create and manage account
- Authentication and access control
Follow users- Post, read and delete images
- Comment on images
- Like images
The aim of the project is mainly within the field of distributed systems: to make scalable, fault tolerant, loosely coupled and highly available distributed system.
The system consists of isolated loosely coupled components called services. The workload of the system has been divided to the services based on logical functionality areas of the system: images, comments, users etc. The communication between the services happens with gRPC, and service discovery happens via Consul, which can be considered as one of component as well. The system exposes one public HTTP REST API for clients - all gRPC APIs and communication is supposed to be private.
NOTE: The system can be ran without Consul as well, since e.g. Kubernetes does the same job for us (health checking, service discovery, load balancing).
See the system architecture below.
The authentication service is responsible for account and authentication management. It stores account credentials, and uses JSON Web Tokens for authorization. While all services could utilize auth service for fine grained access control, for the sake of simplicity in our implementation the requests are validated only by the REST API.
The user service is responsible for managing user information related to the user profile, such as biography and followers. In our implementation it is kept really simple. Accounts from auth service are directly mapped one-to-one with users in user service, even though in real life these could be kept separate (one account holds access to multiple users/profiles).
The image service is responsible for managing posted images. It stores image metadata, likes and the actual image data. While in real life implementations the actual image files might be kept in external CDN servers, in our implementation the images are stored directly on the file system of the image service container. Shared volumes between containers enables scaling.
The comment service is simply responsible for managing comments of images.
The REST API component is exposing public HTTP REST APIs to be consumed by the clients. Thus, the REST API component consumes all the other gRPC services.
The service discovery component is simply Consul, which enables service registration and discovery via HTTP endpoints.
The chosen message broker for this project was a distributed streaming platform called Kafka. While Kafka can be used for much more advanced scenarios, and some other alternative might be more lightweight for this project, we wanted to give it a shot as it's certainly very interesting technology in distributed systems world.
The topics, consumer groups and publishers are as follows:
Topic | Consumer group(s) | Publisher(s) | Event types |
---|---|---|---|
accounts |
user-service |
auth-service |
CREATED , DELETED |
users |
image-service |
user-service |
CREATED , DELETED |
images |
comment-service |
image-service |
LIKED , CREATED , DELETED |
The messaging model is not tuned to perfect, and it's there just to give the idea of how the data can be managed in this kind of distributed system. Here is an example on what happens when an account gets deleted:
- Account
account-1
deleted- Published by
auth-service
- Consumed by
user-service
- Published by
- User for
account-1
deleted- Published by
user-service
- Consumed by
image-service
- Published by
- Images
image-1
,image-2
,...
deleted (posted byaccount-1
)- Published by
image-service
- Consumed by
comment-service
- Published by
- Comments for images
image-1
,image-2
,...
deleted- This is not published event, but an action that
comment-service
does
- This is not published event, but an action that
Each service has their own isolated database. The databases has been selected as follows:
- MongoDB
- Image service
- Comment service
- PostgreSQL
- Auth service
- User service
The data of the system is being stored in following structure.