inbo/tutorials

Pull requests artefacts are clogging our free storage

Closed this issue ยท 23 comments

This repository creates a build for the website (25MB) for every pull request, so that reviewers can see the site locally:

Unfortunately those artefacts don't come for free. For the whole INBO organization, we have 2 GB free storage for such artefacts. The 34 above constitute 875MB.

@florisvdh do you know about other repositories with artefacts that could clog up the remaining 1.125MB?
@niconoe @florisvdh I will look into removing old artefacts (see https://github.com/marketplace/actions/remove-artifacts). Do you think we should do this INBO org wide or specific per repo?

I think this is best considered in a repo-specific way, given that repos will differ in number and size of artifacts, and because artifacts will by default only be kept available for 90 days. So for repos with frequent PRs (like this one) or with large artifacts, this may be more of an issue.

As a measure for the future, I think we should set the 'artifact retention period', at least for repos with many or large artifacts.

I implemented a pkgdown website artifact in the inbo/n2khab repo. I'm not sure how much it takes, or where to find that number. But sure I'm prepared to reduce the retention period to some small number of days; there's no need to keep artifacts for 90 days.

@inbo/bmk who knows of more places where artifacts are used?

I have reduced the retention period INBO wide to from 90 to 30 days. This will likely be sufficient for most repositories (and artefacts can be triggered quite easily). Let's see if that improves our usage.

You can see the artefacts for inbo/n2khab at https://github.com/inbo/n2khab/actions/workflows/site-devel.yaml They have been triggered 34 times, but are less than 1MB. I'm still not quite sure why those pkgdown websites need to be build and kept for every commit in a PR. Running tests: yes. Building pkgdown without committing to see any errors: maybe. Building pkgdown down on PR acceptance: yes.

I recently implemented artifacts for inbo/protocols repo to preview Rmarkdown html output of a protocol. The last one was 2 Mb.

@peterdesmet For pkgdown websites at PR, it's a mere convenience indeed. It allows extra reviewing without needing to build it locally, which is easier for contributors. It's not an essential need, but since the artifact is not large, I suggest to keep it but reduce the retention period.

Aha, good to see that artefacts are used often, but I think:

  • we should be a bit conservative in when to build them
  • don't keep them for long (hence 30 days)
  • potentially increase our free storage (I'm guessing $0.25/GB/month)

Honestly I didn't know yet it is possible to see the website before publication for this repo. It would be good if this was explained on the create_tutorial page, so all contributors can take advantage of it (as it is installed anyway, it is a pity that people do not use it just because they are not aware of it).

Also, @florisvdh, as it is a mere convenience, does this mean is is also implemented in the checklist package of @ThierryO ? In this case I completely overlooked this in the package documentation. If not, would it be a good idea to implement it (as an option), @ThierryO ? If implemented, I think it is important to document this, so people are aware of it and can use it. Otherwise it is just a lost of money, I think.

https://github.com/inbo/indicatoren uses artefacts as well. It provides the rendered version of the altered indicators in that PR for review purposes.

The generic R package checks generate artefacts too when a check fails. We could turn that off.

Our new checklistpackage sets up GitHub Actions for packages and R source repos (https://inbo.github.io/checklist/reference/setup_source.html). This is an easy way to add the remove artifacts GHA.

I think it makes sense to follow https://github.com/r-lib/actions templates and not have artefacts for packages (i.e. pkgdown builds). Artefacts to preview reports or tutorial pages are useful imo.

@ElsLommelen

It would be good if this was explained on the create_tutorial page

It is currently mentioned in README.md, where a specific md file is referred on the topic.

But indeed it sounds like a good idea to refer the md file in the create_tutorial page as well, because the latter is rendered on the website itself.

does this mean is is also implemented in the checklist package

I presume it is not? At the time I was looking at implementing GHA workflows in n2khab, I added the site-devel workflow as a minor tweak of the site-deploy workflow, as a nice-to-have in PRs, not being aware of storage limits. ๐Ÿ˜„

@florisvdh Maybe it would even be a good idea to make the README.md as a whole available somewhere on the website itself, or at least refer to it in the contributing guidelines on the first page of the tutorial? Honestly I hadn't discovered this information yet: as I know there are contributing guidelines on the website, I always stick to that. (Usually I'm working on packages and I'm used that the README.md is the first page of the website, so I didn't expect it to contain additional information to the website with regard to contributing...)

For me it might have to do with the fact that I'm used to collaborate on packages (where everything is on the website), but I wonder to what extent less experienced collaborators are able to find these different pieces of information on how to collaborate? I suppose they benefit as well if all relevant information would be grouped together on the website. Details on the building process might be too technical for some contributors, but I think they should at least be informed that there is an automatic built (that may sent some email messages). Details could be referred to via a link, so advanced users can have a look if they wish to. Or you may just want to mention it is in the readme on the github page, but don't expect people will find it (and do the effort to search for this information on different spots after they found one of the contribution guidelines).

Since https://github.com/inbo/tutorials/blob/master/.github/workflows/REVIEWING.md is short and not a standardized GitHub file, I suggest to include it in the pull_request_template.md, so itโ€™s loaded in the PR body. Reviewers can easily read it there.

@peterdesmet should this issue remain open - i.e. on the storage subject?

The subject on .github/workflows/REVIEWING.md is now continued in #252.

Yes, let's close it for now and see if the 30 days help.

We can't merge pull requests at the indicatoren repo because of this. Below is an R script that returns an overview of the artifact usage by repo.

gh("/orgs/inbo/repos", .limit = Inf) %>%
  map_chr("full_name") %>%
  sprintf(fmt = "/repos/%s/actions/artifacts") %>%
  map(gh, .limit = Inf) -> artifacts
total_count <- map_int(artifacts, "total_count")
artifacts[total_count > 0] %>%
  map("artifacts") %>%
  unlist(recursive = FALSE) %>%
  map_dfr(as.data.frame) %>%
  select(url, size_in_bytes, created_at, expires_at) %>%
  mutate(repo = str_replace(url, ".*repos/inbo/(.*)/actions.*", "\\1")) %>%
  group_by(repo) %>%
  summarise(
    n(),
    total_size_mb = sum(size_in_bytes) / (2 ^ 20),
    oldest = min(created_at)
  ) %>%
  arrange(desc(total_size_mb))

The results I get are below. Note that I might miss private repos that I can't access.

   repo           `n()` total_size_mb oldest
   <chr>          <int>         <dbl> <chr>               
 1 camtrapdp        124      2698.    2021-03-18T10:59:41Z
 2 tutorials         39       983.    2021-02-23T11:38:54Z
 3 indicatoren      131       606.    2020-08-25T17:49:36Z
 4 checklist        161       140.    2020-07-12T12:42:28Z
 5 protocolhelper    82       139.    2021-03-22T17:20:56Z
 6 n2khab            68        75.1   2021-02-05T14:12:30Z
 7 inborutils        34        73.3   2021-03-05T17:01:16Z
 8 INBOtheme          1        26.4   2020-09-04T12:38:26Z
 9 protocols         13        14.1   2021-04-20T13:04:22Z
10 ladybird           4        11.9   2021-03-04T12:56:17Z
11 gulltracking       8         1.83  2021-02-26T13:18:40Z
12 grtsdb             1         0.322 2021-01-13T08:54:53Z

With camtrapdp now removing its artefacts, can you merge PR for indicatoren again?

Yes. Currently the build of "indicatoren" fails with the error below.

Error: Create Artifact Container failed: Artifact storage quota has been hit. Unable to upload any new artifacts

Hmm, can you have a look at my suggestion for #256 (comment). Then we can merge that one too.

PR merged (retaining last 5). I will close this issue again.

We are still hitting the limit. The strange thing is that there are still some old artefacts lingering at the indicatoren repo although the cleanup GHA runs daily. Another thing is that total size is smaller than the 2(?) GB limit. Maybe there is another INBO repo to which I don't have access.

   repo           `n()` total_size_mb oldest              
   <chr>          <int>         <dbl> <chr>               
 1 indicatoren      107       512.    2020-08-25T17:49:36Z
 2 camtrapdp          5       147.    2021-06-05T21:00:18Z
 3 tutorials          5       128.    2021-06-09T08:48:48Z
 4 n2khab            68        75.1   2021-02-05T14:12:30Z
 5 inborutils        34        73.3   2021-03-05T17:01:16Z
 6 protocols         20        35.6   2021-04-20T13:04:22Z
 7 INBOtheme          1        26.4   2020-09-04T12:38:26Z
 8 checklist         40        19.1   2020-07-12T12:42:28Z
 9 ladybird           4        11.9   2021-03-04T12:56:17Z
10 protocolhelper     4         5.24  2021-05-26T16:51:36Z
11 datapackage       44         3.42  2021-06-07T12:57:48Z
12 gulltracking       8         1.83  2021-02-26T13:18:40Z
13 watina             1         0.496 2021-06-11T13:31:30Z
14 grtsdb             1         0.322 2021-01-13T08:54:53Z

some old artefacts lingering

Are files, images etc. which were uploaded in messages on GitHub (posts in issues, PRs) also regarded as an artifact?

Another thing is that total size is smaller than the 2(?) GB limit. Maybe there is another INBO repo to which I don't have access.

I looked into this. It is the inbo/vis repo. Will create an issue there.

This has be resolved.