scala-ci.typesafe.com/artifactory is full? PR validation jobs are giving 413 errors

Question

scala-ci.typesafe.com/artifactory is full? PR validation jobs are giving 413 errors

SethTisue opened this issue 4 years ago · 15 comments

so e.g. at https://scala-ci.typesafe.com/job/scala-2.12.x-validate-main/4276/console:

[error] (partest / publish) java.io.IOException: PUT operation to URL https://scala-ci.typesafe.com/artifactory/scala-pr-validation-snapshots;build.timestamp=1603922141869/org/scala-lang/scala-partest/2.12.13-bin-bfc824a-SNAPSHOT/scala-partest-2.12.13-bin-bfc824a-SNAPSHOT.pom failed with status code 413: Request Entity Too Large; Response Body: {
[error]   "errors" : [ {
[error]     "status" : 413,
[error]     "message" : "Datastore disk usage is too high. Contact your Artifactory administrator to add additional storage space or change the disk quota limits."
[error]   } ]
[error] }

lrytz commented 4 years ago

🍿

Answer 1 · 2020-10-29T00:31:40.000Z

iirc, last time it happened we dealt with this by zeroing out https://scala-ci.typesafe.com/artifactory/scala-pr-validation-snapshots/ — or did we not zero it out entirely, but just deleted the oldest builds? I can't remember

the PR validation snapshots do have some value even after a PR is merged, since they enable per-commit bisecting of regressions. (in contrast to the mergelies at https://scala-ci.typesafe.com/artifactory/scala-integration/ , which enable coarser-grained bisecting: per PR. the mergelies, we intend to retain indefinitely)

Answer 2 · 2020-10-29T00:45:53.000Z

aha, I found the ticket from the last time this happened: #636

Answer 3 · 2020-10-29T00:55:34.000Z

recent article (October 1, 2020) with advice: https://jfrog.com/knowledge-base/artifactory-cleanup-best-practices/

Answer 4 · 2020-10-29T00:56:36.000Z

I went through all the remote repositories and set them to expire cached artifacts after 720 hours (30 days), then I ran "Cleanup Unused Cached Artifacts", but this is unlikely to buy us more than a small amount of time, as https://scala-ci.typesafe.com/artifactory/webapp/#/admin/advanced/storage_summary shows that the lion's share of storage is going to the PR validation snapshots and mergelies

Answer 5 · 2020-10-29T10:05:57.000Z

I'll run my script (#636 (comment)) to delete from pr-validation-snapshots what's older than 2019.

Answer 6 · 2020-10-29T10:11:16.000Z

Seems we already deleted what's older than 2019, and half of 2019 too. So I'm going for ~~everything~~ artifacts for non-merged commits older than 2020.

Answer 7 · 2020-10-29T12:54:27.000Z

This didn't help enough, we're still at 85%.

I noticed

28 gigs in scala-release-temp. I think we used this at some point for temporary builds, eg for benchmarking, to avoid putting them in scala-integration. scala/scala@6ff389166e. We can probably clean that up.
Artifactory's internal database takes 67 gigs (du -h /var/opt/jfrog/artifactory/data/derby). There's a function to compress that (https://www.jfrog.com/confluence/display/JFROG/Regular+Maintenance+Operations#RegularMaintenanceOperations-Storage), after trying that a few times with an error message it eventually worked, but it didn't help. Still 67 gigs.
I can delete more PR validation builds, that repo is at 110 gigs
scala-integration is at 207 gigs and we probably don't ever want to remove from there

Answer 8 · 2020-10-29T13:21:39.000Z

@SethTisue let me know what you think. IMO we can also bump the EBS volume size.

Answer 9 · 2020-10-29T13:46:44.000Z

I've never used scala-release-temp or seen it used, so I have no objection to zeroing that one out.

IMO we can also bump the EBS volume size

Can that be done without a lot of rebuilding effort?

Answer 10 · 2020-10-29T15:05:31.000Z

Resizing the EBS is simple, then i'll give the resize2fs command a try

Answer 11 · 2020-10-29T15:06:12.000Z

Currently taking a snapshot of the volume (that's quite slow).

Answer 12 · 2020-10-29T19:55:24.000Z

Resizing worked fine

took a snapshot of the EBS volume (will keep it around for a while)
using "Modify Volume" changed the size to 600 gigs
changed the filesystem according to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/recognize-expanded-volume-linux.html

admin@ip-172-31-10-237:~$ df -hT
...
/dev/xvdk      ext4      493G  419G   52G  90% /var/opt/jfrog/artifactory/data
...

admin@ip-172-31-10-237:~$ lsblk
...
xvdk    202:160  0  600G  0 disk /var/opt/jfrog/artifactory/data
...

admin@ip-172-31-10-237:~$ sudo /sbin/resize2fs /dev/xvdk
resize2fs 1.43.4 (31-Jan-2017)
Filesystem at /dev/xvdk is mounted on /var/opt/jfrog/artifactory/data; on-line resizing required
old_desc_blocks = 32, new_desc_blocks = 38
The filesystem on /dev/xvdk is now 157286400 (4k) blocks long.

admin@ip-172-31-10-237:~$ df -hT
...
/dev/xvdk      ext4      591G  419G  146G  75% /var/opt/jfrog/artifactory/data
....

Answer 13 · 2020-11-04T16:33:28.000Z

Fixed. 🤞

Answer 14 · 2020-11-17T03:59:15.000Z

@lrytz I don't remember if it was on a ticket somewhere or in private communication, but you asked if I wanted the behemoths resized to help the community build, and I said yes please — when you have time.

what I should have added: lately the most common disk space problem on the behemoths is actually inodes, not raw space. the community build is amazingly inode hungry. is there a way to get more inodes as well as more gigabytes?