taverntesting/tavern

YAML syntax prevents uploading multiple files with the same "name"

SyntaxColoring opened this issue · 6 comments

It looks like Tavern's current YAML syntax prevents using Tavern for certain standard kinds of multi-file uploads.

If you have an HTML form input like this:

<input type="file" name="input_files" multiple="true" required="true"/>

Then the POST request should look like this:

POST /foo HTTP/1.1
User-Agent: PostmanRuntime/7.30.0
Accept: */*
Host: localhost:31950
Connection: keep-alive
Content-Type: multipart/form-data; boundary=--------------------------432665675269641474044671
Content-Length: 12345

----------------------------432665675269641474044671
Content-Disposition: form-data; name="input_files"; filename="file_1.txt"
Content-Type: application/octet-stream

first file conents
blah blah blah

----------------------------432665675269641474044671
Content-Disposition: form-data; name="input_files"; filename="file_2.txt"
Content-Type: application/octet-stream

second file contents
blah blah blah

----------------------------432665675269641474044671--

Notice how both files have name="input_files", even though they have different filenames and different contents.

Quoting from RFC 7578 section 4.3:

The form data for a form field might include multiple files.
...
To match widely deployed implementations, multiple files MUST be sent by supplying each file in a separate part but all with the same "name" parameter.

In requests, you can do this by providing a list as the files argument:

multiple_files = [
    ('input_files', ('file_1.txt', file_1, 'text/plain')),
    ('input_files', ('file_2.txt', file_1), 'text/plain'))]
r = requests.post(url, files=multiple_files)

But in Tavern's YAML syntax, this doesn't seem possible.

request:
  url: '{url}'
  method: POST
  files:
    input_files: 'file_1.txt'
    input_files: 'file_2.txt' # Invalid YAML because this is a duplicate key. :(

Proposed syntax change to allow this:

request:
  url: '{url}'
  method: POST
  files:
    # Send file_1.txt and file_2.txt, both with name="input_files", in the multipart data.
    input_files:
      - 'file_1.txt'
      - 'file_2.txt'

With the long style looking like this:

request:
  url: '{url}'
  method: POST
  files:
    # Send file_1.txt and file_2.txt, both with name="input_files", in the multipart data.
    input_files:
      - file_path: "file_1.txt"
        content_type: "application/customtype"
        content_encoding: "UTF16"
      - file_path: "file_2.txt"
        content_type: "application/customtype"
        content_encoding: "UTF16"

The current scalar syntax could still be supported for backwards compatibility. I think it would be equivalent to a list with 1 element.

[EDIT] This syntax might be flawed because it can't preserve order. See #833 (comment) for an update.

I don't think this would be a huge amount of work, just updating the jsonschema and the file parsing in the rest request code

@michaelboulton If it's helpful, I can take a shot at implementing this over this weekend or the next.

Hi @michaelboulton @SyntaxColoring any update on the above? Let me know if I can help.

@debugger24 I've cloned the repo to start on this, but I ran into some mysterious Docker-related errors that I haven't had a chance to debug. If you want to give it a whirl, I won't complain. :) Otherwise, I'll get to it eventually.

I've also realized that my initially proposed syntax is a bit flawed.

Instead of this, which I initially proposed:

request:
  url: '{url}'
  method: POST
  files:
    # Send file_1.txt and file_2.txt, both with name="input_files", in the multipart data.
    input_files:
      - file_path: "file_1.txt"
        content_type: "application/customtype"
        content_encoding: "UTF16"
      - file_path: "file_2.txt"
        content_type: "application/customtype"
        content_encoding: "UTF16"

I now think it should be this:

request:
  url: '{url}'
  method: POST
  files:
    # Send file_1.txt and file_2.txt, both with name="input_files", in the multipart data.
    - field_name: "input_files"
      file_path: "file_1.txt"
      content_type: "application/customtype"
      content_encoding: "UTF16"
    - field_name: "input_files"
      file_path: "file_2.txt"
      content_type: "application/customtype"
      content_encoding: "UTF16"

Because the ordering of parts in a multipart upload can be significant, apparenly.

Can you see whether #870 solves your problem? I've added some tests to it but it would be good to get some third party confirmation