rust-lang/simpleinfra

Fix disk space alerts on the dev-desktops

jdno opened this issue · 1 comments

jdno commented

The infra-team has been receiving alerts about less than 10% free disk space on the dev-desktops for a while now. The machines currently have 1TB hard drives attached to them and all machines still have >80GB free space, which means the urgency to fix this has been low. But as we look to increase the number of users and their quota, we need to find a more sustainable solution to this problem.

Investigation

Across the four machines that we operate, every additional 1TB of disk space costs us around $320/month. We have chosen reasonable performant disks to optimize local filesystem access, but these disks are more expensive than other block storage solutions.

When looking into the disk space usage, it found that we can reclaim ~260GB on dev-desktop-eu-1 by removing build and target directories that haven't been touched in 30 days. In other words, we can probably free up 20% of space by cleaning projects that users haven't worked on in a while.

Tasks

  • Clean up disks
    • Create a script that can find and clean unused cache directories
    • Create a cronjob that runs that script
    • Deploy the script to staging and test it there
    • Deploy the script to production
  • Resize disks
    • Increase disk size on AWS-based machines by 1TB
    • Increase disk size on Azure-based machines by 1TB
    • Document process

Resources

jdno commented

I had originally planned to increase the disk size of the dev-desktops as well. But seeing how much space we freed up, I am postponing that until the issue becomes more pressing again. It will require quite some time to go through the process for all four machines (and document it this time), which we simply don't have right now. But this might be a good task for the new infrastructure engineer that we are hiring.