/instagram-clone

Clone of Instagram for distributed systems course

Primary LanguageJavaScript

instagram-clone

Group project for Distributed Systems course at the University of Oulu.

Group members

Name Student ID GitHub
Perttu Kärnä 2465119 ppeerttu
Guanchen LI 2627656 Ivan-Lee99
Juho Kantola 2519793 knatola
Xin Shu 2627520 Mr-Sushi

Running the system

The system can be ran locally by using docker-compose with the following simple steps:

  1. Prepare .env file out of example.env
  2. Do similar things for the example.*.env -files at the following directories
  3. Run sh run_docker_compose.sh to create required network, volumes and finally launch containers
    • Subsequent runs can be done by simple docker-compose up

Project description

This project is course work implementation for Distributed Systems course at the University of Oulu. The group members (and collaborators) are mentioned in group members section.

This system is a naive clone of Instagram, containing mainly the backend functionality. We're not considering the client implementation in this project. The functionality that the system covers is as follows:

  1. Create and manage account
  2. Authentication and access control
  3. Follow users
  4. Post, read and delete images
  5. Comment on images
  6. Like images

The aim of the project is mainly within the field of distributed systems: to make scalable, fault tolerant, loosely coupled and highly available distributed system.

System design

The system consists of isolated loosely coupled components called services. The workload of the system has been divided to the services based on logical functionality areas of the system: images, comments, users etc. The communication between the services happens with gRPC, and service discovery happens via Consul, which can be considered as one of component as well. The system exposes one public HTTP REST API for clients - all gRPC APIs and communication is supposed to be private.

NOTE: The system can be ran without Consul as well, since e.g. Kubernetes does the same job for us (health checking, service discovery, load balancing).

See the system architecture below.

System architecture

Authentication service

The authentication service is responsible for account and authentication management. It stores account credentials, and uses JSON Web Tokens for authorization. While all services could utilize auth service for fine grained access control, for the sake of simplicity in our implementation the requests are validated only by the REST API.

User servcice

The user service is responsible for managing user information related to the user profile, such as biography and followers. In our implementation it is kept really simple. Accounts from auth service are directly mapped one-to-one with users in user service, even though in real life these could be kept separate (one account holds access to multiple users/profiles).

Image service

The image service is responsible for managing posted images. It stores image metadata, likes and the actual image data. While in real life implementations the actual image files might be kept in external CDN servers, in our implementation the images are stored directly on the file system of the image service container. Shared volumes between containers enables scaling.

Comment service

The comment service is simply responsible for managing comments of images.

REST API

The REST API component is exposing public HTTP REST APIs to be consumed by the clients. Thus, the REST API component consumes all the other gRPC services.

Service discovery

The service discovery component is simply Consul, which enables service registration and discovery via HTTP endpoints.

Message broker

The chosen message broker for this project was a distributed streaming platform called Kafka. While Kafka can be used for much more advanced scenarios, and some other alternative might be more lightweight for this project, we wanted to give it a shot as it's certainly very interesting technology in distributed systems world.

The topics, consumer groups and publishers are as follows:

Topic Consumer group(s) Publisher(s) Event types
accounts user-service  auth-service CREATED, DELETED
users image-service  user-service CREATED, DELETED
images comment-service image-service  LIKED, CREATED, DELETED

The messaging model is not tuned to perfect, and it's there just to give the idea of how the data can be managed in this kind of distributed system. Here is an example on what happens when an account gets deleted:

  1. Account account-1 deleted
    • Published by auth-service
    • Consumed by user-service
  2. User for account-1 deleted
    • Published by user-service
    • Consumed by image-service
  3. Images image-1, image-2, ... deleted (posted by account-1)
    • Published by image-service
    • Consumed by comment-service
  4. Comments for images image-1, image-2, ... deleted
    • This is not published event, but an action that comment-service does

Data model and databases

Each service has their own isolated database. The databases has been selected as follows:

The data of the system is being stored in following structure.

Data model