Troubleshoot what's taking up ephemeral storage causing containers to enter a bad state
Closed this issue · 1 comments
The regression which caused the API container to drop from 4Gi
to 2Gi
of ephemeral storage illuminated that much more space is being taken up than expected. I need to perform root cause analysis to ensure that this problem does not scale as traffic grows.
My suspicion is that the transformations that are happening on-the-fly via GQL queries are eating up storage space - these transformations are saved to disk before being sent along to GCS. I have set up an alert to go off once a container reaches 1Gi
of ephemeral storage. The only way to determine for certain what's going on is to use the following commands once the alert goes off:
Disk Utility:
du -h --max-depth=1
- or -
Disk Free Space:
df
The best place to run these commands is in the ./storage
folder in the container and then cd
up each subfolder until I find the culprit.