Behemoths periodically run out of inodes
lrytz opened this issue · 5 comments
Spinning off the discussion from scala/scala-dev#732 (comment) into a new ticket
Indeed it looks like inodes is more likely the issue than actual disk space. On behemoth-1:
admin@ip-172-31-2-3:~$ df -hi
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/xvdj 25M 25M 816K 97% /home/jenkins
while disk space looks fine
/dev/xvdj 393G 244G 130G 66% /home/jenkins
The community build workspaces have huge numbers of files and directories. For example, for "scala-2.13.x-jdk11-integrate-community-build" there are currently 103 extraction directories
admin@ip-172-31-2-3:~$ ls /home/jenkins/workspace/scala-2.13.x-jdk11-integrate-community-build/target-0.9.17/extraction | wc -l
103
A single one of those has > 200k inodes:
admin@ip-172-31-2-3:~$ find /home/jenkins/workspace/scala-2.13.x-jdk11-integrate-community-build/target-0.9.17/extraction/82d4b745facaabd414be8c97dd8725a670038658 | wc -l
207593
Looking at things a bit, it seems we could save > 40% of inodes by not pulling in all the git refs to pull requests. They look similar to this:
/home/jenkins/workspace/scala-2.13.x-jdk11-integrate-community-build/target-0.9.17/extraction/82d4b745facaabd414be8c97dd8725a670038658/projects/93deaed81507c97b97bdf01b44a6723b14827dc1/.git/refs/pull/110
Some directory counting
admin@ip-172-31-2-3:~$ find /home/jenkins/workspace/scala-2.13.x-jdk11-integrate-community-build/target-0.9.17/extraction/82d4b745facaabd414be8c97dd8725a670038658 -type d | wc -l
80892
admin@ip-172-31-2-3:~$ find /home/jenkins/workspace/scala-2.13.x-jdk11-integrate-community-build/target-0.9.17/extraction/82d4b745facaabd414be8c97dd8725a670038658 -type d | grep '\/pull\/' | wc -l
43463
Looking at files in the extraction, again a large number of git refs corresponding to pull requests
admin@ip-172-31-2-3:~$ find /home/jenkins/workspace/scala-2.13.x-jdk11-integrate-community-build/target-0.9.17/extraction/82d4b745facaabd414be8c97dd8725a670038658 -type f | wc -l
126693
admin@ip-172-31-2-3:~$ find /home/jenkins/workspace/scala-2.13.x-jdk11-integrate-community-build/target-0.9.17/extraction/82d4b745facaabd414be8c97dd8725a670038658 -type f | grep -E 'pull\/[0-9]+\/head' | wc -l
43463
@SethTisue do you think we can do something about these git refs to pull requests?
Hmm — I remember talking with @cunei once about the possibility of shallow-cloning. I'll see if I can dig that conversation up.
Alternatively we can create a new EBS volume with a new file system where we explicitly specify the number of inodes on creation (https://askubuntu.com/questions/600159/how-can-i-create-an-ext4-partition-with-an-extra-large-number-of-inodes), then copy over the files.
at present, on each behemoth I need to blow away the community build directories under workspace every 2 months, something like that. it's very little burden
closing, as I think the status quo here is okay. we've recently added JDK 20 which increases the pressure somewhat, so we'll see, but in the meantime, I don't think this needs to stay open.
did it successfully on behemoth-3.
- new EBS volume, gp3, 400g, default iops/throughput. us-west-1c
- attach to instance as /dev/xvdk
- lsblk
- mkfs -t ext4 -N 50000000 /dev/xvdk
- 50M inodes, old volume has 25M
- mkdir /home/jenkins-new
- chown jenkins:root /home/jenkins-new
- fstab:
/dev/xvdk /home/jenkins-new ext4 noatime 0 0
- systemctl daemon-reload
- mount -a
- chown jenkins:root /home/jenkins-new
- rsync -a -H --info=progress2 --info=name0 /home/jenkins/ /home/jenkins-new/
-H
is important, git checkouts use hard links
- fstab, mount new volume at /home/jenkins. comment out old volume
- systemctl daemon-reload
- reboot (old volume might be in use)
admin@ip-172-31-2-5:~$ df -hi
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/xvdk 48M 7.3M 41M 16% /home/jenkins