actions/upload-artifact

[docs] Uploading different files to the same artifact from multiple jobs

PathogenDavid opened this issue · 5 comments

What files would you like to change?

upload-artifact/README.md

Lines 131 to 177 in f4ac36d

### Uploading to the same artifact
With the following example, the available artifact (named `artifact` by default if no name is provided) would contain both `world.txt` (`hello`) and `extra-file.txt` (`howdy`):
```yaml
- run: echo hi > world.txt
- uses: actions/upload-artifact@v2
with:
path: world.txt
- run: echo howdy > extra-file.txt
- uses: actions/upload-artifact@v2
with:
path: extra-file.txt
- run: echo hello > world.txt
- uses: actions/upload-artifact@v2
with:
path: world.txt
```
> **_Warning:_** Be careful when uploading to the same artifact via multiple jobs as artifacts may become corrupted
Each artifact behaves as a file share. Uploading to the same artifact multiple times in the same workflow can overwrite and append already uploaded files:
```yaml
strategy:
matrix:
node-version: [8.x, 10.x, 12.x, 13.x]
steps:
- name: Create a file
run: echo ${{ matrix.node-version }} > my_file.txt
- name: Accidentally upload to the same artifact via multiple jobs
uses: actions/upload-artifact@v2
with:
name: my-artifact
path: ${{ github.workspace }}
```
In the above example, four jobs will upload four different files to the same artifact but there will only be one file available when `my-artifact` is downloaded. Each job overwrites what was previously uploaded. To ensure that jobs don't overwrite existing artifacts, use a different name per job:
```yaml
uses: actions/upload-artifact@v2
with:
name: my-artifact ${{ matrix.node-version }}
path: ${{ github.workspace }}
```

What are your suggested changes?

In the "Uploading to the same artifact" section, it is warned that uploading to the same artifact from multiple jobs can have unexpected results. The proposed workaround is to upload a different artifact from each job in the matrix.

While this does work, it's not super convenient if you need to be able to download all artifacts from a matrixed job in one go.


Based on my understanding of how artifacts in GitHub Actions work, it seems to me like it should be possible to upload to the same artifact from multiple jobs as long as the file paths are all different. For example:

    strategy:
      matrix:
          node-version: [8.x, 10.x, 12.x, 13.x]
    steps:
        - name: Create a file
          # Note the change below ______________________vvvvvvvvvvvvvvvvvvvvvvvvvvv
          run: echo ${{ matrix.node-version }} > my_file_${{ matrix.node-version }}.txt
        - name: Accidentally upload to the same artifact via multiple jobs
          uses: actions/upload-artifact@v2
          with:
              name: my-artifact
              path: ${{ github.workspace }}

In practice this seems to work fine, but it would be nice to have it clarified in the documentation whether A) this is an intended/supported feature or B) if it's actually still problematic and I'm relying on a race condition succeeding.

If someone from GitHub who knows how the backend works can clarify whether this works intentionally, I can submit a PR adding the appropriate example/disclaimer.

I read somewhere (cannot find it atm) that uploading same artifact (even with different files) at same time from multiple jobs can lead to archive corruption. It is probably very rare or maybe it was fixed or is handled differently on backend side and this is not happening any more. Will link to this if I find this info again.

//EDIT

Found this in current docs:

Warning: Be careful when uploading to the same artifact via multiple jobs as artifacts may become corrupted. When uploading a file with an identical name and path in multiple jobs, uploads may fail with 503 errors due to conflicting uploads happening at the same time. Ensure uploads to identical locations to not interfere with each other.

It is still unclear. Based on second part of this warning when you try to upload to same artifact from multiple jobs then first job should work and the rest should get 503 error. Not really sure what is first part about because first upload will create an artifact (with files from this one job) and it should be proper archive but incomplete because rest of the jobs failing.

In practice this seems to work fine,

It worked fine with actions/upload-artifact@v3.

With actions/upload-artifact@v4 an attempt to upload different files to the same artifact from mulitple jobs fails explicitely with:

Run actions/upload-artifact@v4
With the provided path, there will be 1 file uploaded
Artifact name is valid!
Root directory input is valid!
Error: Failed to CreateArtifact: Received non-retryable error: Failed request: (40[9](https://github.com/(REDACTED)/actions/runs/(REDACTED)/job/(REDACTED)#step:10:10)) Conflict: an artifact with this name already exists on the workflow run

However this does not happen with actions/upload-artifact@v3 (same artifact, different paths).