/CloudFlix

Highly scalable video streaming website built with microservices (Go, Docker, Kubernetes, PostgreSQL, gRPC, Redis, RabbitMQ, Prometheus, Grafana, Jaeger/Zipkin)

Primary LanguageGo

CloudFlix

current frontend (in progress): in progress design

video view (in progress): video view initial backend design:
my vision so far

Running

  1. Install Kubernetes (minikube):
sh ./minikube.sh
  • Note: You may experience kube-dns failures when using --vm-driver=none, so see the following issue for a solution
  • kubernetes/minikube#2027
  • tl;dr:
  • sudo systemctl stop systemd-resolved
  • sudo systemctl disable systemd-resolved
  • edit file /etc/resolv.conf, the only line should be nameserver 8.8.8.8
  • delete the kube-dns pod
  1. Init helm:
helm init 
  1. Install Minio:
helm install --name minio --set persistence.size=100Gi,accessKey=minio,secretKey=minio123,service.type=LoadBalancer stable/minio  
  1. Login to Minio at the url and create a bucket named videos.
minikube service minio --url
  • Then create a bucket named thumb, and change the policy to ReadOnly (left sidebar)
  1. Clone this repository
cd $GOPATH/src/github.com # mkdir github.com if needed
mkdir agxp && cd agxp
git clone --recurse-submodules -j8 https://github.com/agxp/cloudflix.git
  1. Install protobuf
# Get the protocol compiler release
wget https://github.com/google/protobuf/releases/download/v3.5.1/protoc-3.5.1-linux-x86_64.zip
# extract to your path (local bin is okay)
unzip protoc-3.5.1-linux-x86_64.zip -d ~/.local/
# Get the protobuf Go runtime
go get -u github.com/golang/protobuf/protoc-gen-go
# get the protobuf micro runtime
go get -u github.com/micro/protoc-gen-micro
  1. Install PostgreSQL (note: on minikube there are bugs so we have to set persistence to false)
helm install --name postgres --set persistence.enabled=false,postgresUser=postgres,postgresPassword=postgres123,postgresDatabase=videos,metrics.enabled=true stable/postgresql  
  1. Install pgAdmin
docker run --net="host" -e "PGADMIN_DEFAULT_EMAIL=admin@localhost" -e "PGADMIN_DEFAULT_PASSWORD=pgadmin123" -d dpage/pgadmin4
  1. Forward the postgres port
kubectl port-forward <postgres-pod-name> 5432
  1. Add db schema
  • Login to pgAdmin (localhost:80)
  • add the Postgres server (localhost:5432)
  • create the database schema using videos_schema.sql
ALTERNATIVELY
docker cp ./videos_schema.sql <pgadminContainerID>:/videos_schema.sql
docker exec -it <pgadminContainerID> /bin/sh
psql -U postgres -h localhost -d videos -f /videos_schema.sql
  1. Add the incubator repo to helm with
helm repo add incubator https://kubernetes-charts-incubator.storage.googleapis.com/
  1. Install Jaeger
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-kubernetes/master/all-in-one/jaeger-all-in-one-template.yml
  1. Install Redis (two slaves and persistence off like postgres)
helm install --name redis --set persistence.enabled=false,cluster.slaveCount=2,usePassword=false,metrics.enabled=true stable/redis
  1. Install RabbitMQ
helm install --name rabbit --set rabbitmq.username=admin,rabbitmq.password=password,persistence.enabled=false stable/rabbitmq
  1. Fix the serviceaccount settings (warning: this is insecure)
kubectl create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=default:default
  1. Install Prometheus and Grafana
kubectl create -f ./monitor/kubernetes-prometheus/manifests-all.yaml
  • Wait till pods are green (~1 minute), then initialize the dashboards
  • Because of some errors we have to delete and recreate the job
kubectl --namespace monitoring delete job grafana-import-dashboards    
kubectl apply --filename ./monitor/kubernetes-prometheus/manifests/grafana/import-dashboards/job.yaml
  • Then wait ~1 minute to initialize
  1. cd into each service folder and run
make build-local && make deploy-local

Progress

Some simple (naive) load-testing for latency Load testing

Jaeger view: Jaeger view

Jaeger view 10,000 requests/s: Jaeger view 10,000 requests/s

  • As you can see compared to the first Jaeger view where requests took on average 1-2ms, during heavy load of 10,000 requests/second we end up with widely varying (but still reasonable) response times (up to 3 seconds)

Barebones skeleton UI: User interface

Grafana view (naive load-testing video-host without caching (get url, data, etc)): Grafana

Heavy Load testing video-search-svc (1000r/s, crashed lol) Failed

  • Here the obvious bottleneck is the excessive database calls, so it is a good idea to add caching

Heavy Load testing video-hosting-svc GetVideoInfo (with caching, eg redis)

  • Notice postgres cpu usage is 0
  • Capped at 10,000 requests/s with 10 router pods and 10 video-host pods (crashed after that) Success

Prometheus view (during load-testing): Prometheus