tattle-made/docs

Health and Log Monitoring

Closed this issue · 6 comments

For all the services we want to be able to do the following

  • See an overview of their health statuses
  • Get automated alarms on apt channels - slack, email when they go down or are unhealthy

Lets add all features that might fit in the scope of this.

In the interest of time I'm trying to see if we can use a hosted service for some monitoring and log management.

I know that elastic.co offers a solution for this in the trifecta of Logstash(collecting logs from your application), elastic search(storing and searching logs) and kibana (visualizing)
but somehow the idea of setting up 3 things for this sounds daunting to me :P and maybe a bit of an overkill.

so i'd been looking into some managed solutions. here's this one that i like - https://timber.io/
it has a guide on forwarding logs from kubernetes to its server here - https://docs.timber.io/setup/platforms/kubernetes

Can you give it a quick read (not urgent) and see if it is simple to hook up with our current setup?

Evaluation Criteria

  • Monitoring application logs when deployed on the k8s cluster
  • Viewing "health" status of applications/Pods/Nodes/Cluster
  • Alarms for any health issues on Slack/Github
  • Feasibility of slicing-and-dicing logs across multiple components/applications
  • Checking real-time resource consumption of different Pods/Nodes in the Cluster
  • Learning curve and ease of implementation of the framework

Important Considerations for Health/Logs Monitoring in k8s:

  • Cluster-level, Node-level, Pod-level logs monitoring
  • k8s Events monitoring
  • Pod-wise health status

Promising Evaluations for Logs and Health Monitoring

  • Sematext
  • Prometheus + Grafana
  • Timber.io

We have evaluated Sematext, Prometheus and Timber.io, and are planning to proceed with Sematext for now because:

  • Built for both logs search as well as cluster monitoring (most tools do one or the other)
  • Built on Elasticsearch; has full integration with Logstash/Beats/Kibana; even exposes Elasticsearch API for custom integrations; but none of the Elasticsearch maintenance and scaling issues (a major challenge with a self-managed ELK stack)
  • Has paid plans for enterprise scale and support, and customer support has Elasticsearch expertise as well
  • Incredible breadth of documentation, as well as other useful content (like comparisons, guides, etc.)

Sematext Deployment Status:
Sematext has been deployed in the k8s cluster for logs and infra monitoring

Next Steps:

  • Add k8s container monitoring in Sematext
  • Add k8s audit logs monitoring in Sematext
  • Check alerts options
  • Check for logs download option, and explore syncing to S3 buckets

Low priority issue. I just noticed that the dashboards you created are not accessible to regular Users (denny@tattle.co.in)
I could only access them when i was logged in as admin@tattle.co.in

Sematext Deployment Status:

  • k8s container monitoring is now enabled
  • Requested customer support to help with k8s audit logs

PS - Users can access dashboards created by admin@tattle.co.in by 'switching' their account after logging into Sematext.

New dashboards were created for Logs monitoring of SCS and Khoj, as well as Infra monitoring of new Dev and Prod clusters.

Closing this issue as all monitoring components have been implemented.