tattle-made/docs

Configure, Deploy, Monitor Cron Jobs

Closed this issue · 13 comments

Key requirements i'd like to meet

  • checking in cron configuration to github
  • being able to monitor cron job results upto last one month
  • standardization across all services

Happy to hear all other features we could add to this.

Hey @whymath, I have defined the acceptance critera for https://github.com/tattle-made/sharechat-scraper/issues/7 on the issue page.

It would be ideal if log monitoring didn't require SSHing in to the remote machine.

Framework evaluations:

  • Python-based schedulers - sched, Advanced Python Scheduler, Schedule
  • Node-based Cron wrappers - Cron Job Manager, Cron
  • GUI frameworks - Crontab UI
  • Configuration-based - Kubernetes CronJob, Ansible Cron
  • Server-based - AWS Lambda, Chronos

Primary Approaches Under Consideration

  • Language-based - Here SOPs would be created for different language-based frameworks. Python services would use sched or APS, while JS services might use Cron Job Manager. Will be defining basic processes and best practices. Advantage is that developers have maximum visibility and ownership of the schedulers. Might require consolidation of logs for monitoring. Python schedulers will have slightly higher resource consumption.
  • Kubernetes-based - Since we are already in the process of k8s-based deployment, this can be included and handled together. Advantage is single framework for both container orchestration and schedulers, and single point of implementation. Increases dependency on DevOps, however. Also reduces developer visibility of scheduler implementations.

Additional Considerations

  • Monitoring scheduler jobs remotely without manual SSH (might be handled as part of #3)

Current Approach

  • Deploy sharechat-scraper (SCS) on crontab
  • Evaluate and test a Python-based scheduler to check standardizability
  • Test PoC for Kubernetes CronJob to check ease of use and accessibility
  • Determine optimum framework for easy, scalable scheduling

Sprint 2 Objectives:

  • Primary: Deploy SCS on crontab
  • Secondary: Evaluate Python-based scheduler and Kubernetes CronJob for SCS
    (additional Sprint 2 Objectives in #1)

Sharechat Scraper Deployment Status:

  • sharechat-scraper is deployed as a Kubernetes pod
  • cron was manually configured in the container and jobs are getting triggered
  • Manual trigger of config_trending_cron.py ran semi-successfully - execution completed, and CSV was generated, but HTML file creation failed with an Attribute Error
  • When the scraper was triggered from the cron job, execution failed with what looks like an env variable error (debugging WIP)

Pending Activities:

  • Primary: Debugging and fixing env variable issue during cron job run
  • Primary: Reverting all tags in config_trending_cron.py and configuring production timings in the cron job
  • Primary: Updating missing env setup commands in Dockerfile
  • Secondary: Updating flow to ensure successful HTML file creation
  • Secondary: Triggering k8s rolling update for new version deployment

Sharechat Scraper Deployment Status:

  • SCS is now getting triggered correctly from the cron job with the environment variables
  • Production timings have been configured for both cron jobs
  • HTML file creation issue was also resolved

Pending Activities:

  • Primary: Monitor logs and push code/infra fixes to production
  • Secondary: Enable logs monitoring access
  • Secondary: Updating missing env setup commands in Dockerfile
  • Secondary: Triggering k8s rolling update for new version deployment

Sharechat Scraper Deployment Status

  • Tuesday's run gave an anomalous error of the cron job triggering without the environmental variables
  • This was tested with multiple runs of both jobs, but could not be replicated, and production timings were re-configured
  • ASCII error during HTML file creation due to the cron locale issue was resolved

Pending Activities

  • Primary: Monitor logs and reports for next few days
  • Secondary: Updating missing env setup commands in Dockerfile
  • Secondary: Pushing generated report files to S3 instead of keeping on file system
  • Secondary: Enable logs monitoring access
  • Secondary: Triggering k8s rolling update for new version deployment

Sharechat Scraper Deployment Update

  • HTML file creation was failing with a ContentTooShort error, this was rectified with a sharechat_helper.py injection
  • One job also failed because of disk space, this was solved by clearing all data backups from the server

Sharechat Scraper Deployment Update:

  • Started getting a "returned empty dataframe" error due to changes in the consuming API, this was rectified with a sharechat_helper.py injection
  • Cron job configuration was updated through a config_trending_cron.py injection

Next Steps:

  • Explore Kubernetes's Cron Job (basically transient container deployments)
  • Compare against language-based frameworks for ease of implementation, convention, etc.
  • If required, evaluate k8s exception-handling with a single permanent pod deployment

Evaluation Status
Kubernetes cron job PoC was successful and we were able to schedule k8s jobs and monitor their output

Next Steps:

  • Primary: Implement scheduled full run of the SCS cron job on k8s for both trending and fresh
  • Primary: Monitor implementation over a few days, and deprecate existing deployment
  • Secondary: Create parameterized version of config.py to handle both test and prod scenarios

Deployment Status:

  • Full deployment of the trending and fresh cron jobs was completed over the weekend, and the jobs have triggered successfully since Sunday, 26th July
  • Monitoring of the Sharechat Scraper has been implemented in the Sematext app

Cronjobs manifest file creation and deployment on to k8s cluster has been standardized, and new virality scrapers were deployed successfully using the standardized process.

Closing this issue.