aws-beam/aws-elixir

S3 ListParts returns wrong types

Closed this issue Β· 7 comments

Hi,

I'm calling

def list_parts(
and realized the response Parts ETags and PartNumbers are kind of double encoded strings. This led to a subtle bug that was very hard to debug, before I landed on the below code. Notice the weird string unwrapping at the end:

    AWS.S3.list_parts(client, bucket, key, nil, nil, upload_id)
    |> case do
      {:ok,
       %{
         "ListPartsResult" => %{
           "IsTruncated" => "false",
           "Part" => parts
         }
       }, _resp} ->
        # parts are shaped like
        # [
        # %{"ETag" => "\"fa1d4ddd9daf8c27abb9491155a4df8d\"", "LastModified" => "2024-01-30T10:56:35.000Z", "PartNumber" => "1", "Size" => "5242880"},
        #  ...
        # ]
        parts
        |> Enum.map(fn %{"ETag" => etag, "PartNumber" => part_number} ->
          %{
            "ETag" => String.replace(etag, "\"", ""),
            "PartNumber" => String.to_integer(part_number)
          }
        end)
        |> then(&{:ok, &1})

The docs at https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListParts.html#API_ListParts_ResponseSyntax says the partnumber should be an integer at least.

CreateMultipartUpload does not suffer from the same issue:

def create_multipart_upload(%Client{} = client, bucket, key, input, options \\ []) do

From this repo's readme:

# Create the multipart request
{:ok,
 %{
   "InitiateMultipartUploadResult" => %{
     "UploadId" => upload_id
   }
 }, _} = AWS.S3.create_multipart_upload(client, bucket, filename, %{})

upload_id is not double encoded like etag above.

I had a quick little fiddle with this and even the aws cli returns the ETags in the escaped-quoted way. See also: aws/aws-sdk-net#815 which includes: aws/aws-sdk-net#815 (comment) as a "resolution".

See the raw JSON output from a aws s3api list-parts output:

{
    "Parts": [
        {
            "PartNumber": 1,
            "LastModified": "2024-01-31T09:58:28.758Z",
            "ETag": "\"6af8307c2460f2d208ad254f04be4b0d\"",
            "Size": 9
        }
    ],
    "ChecksumAlgorithm": null,
    "Initiator": null,
    "Owner": null,
    "StorageClass": "STANDARD"
}

Dealing with the quoted ETag might be tricky as I'd really like to avoid adding a String:replace/2 over every element for the sake of ETag handling and risk exploding things in other unknown places.

The integers could easily be solved by changing:

aws-elixir/lib/aws/xml.ex

Lines 129 to 132 in 01ab90f

defp hook_fun(text, global_state) when Record.is_record(text, :xmlText) do
text = xmlText(text, :value)
{:unicode.characters_to_binary(text), global_state}
end

to: (or any other alternative to your liking, happy to accept the PR if you'd like to go ahead with such a change at least πŸ‘)

defp hook_fun(text, global_state) when Record.is_record(text, :xmlText) do
  text = xmlText(text, :value)
  try do
    {:erlang:list_to_integer(text), global_state}
  catch
    _type, _reason ->
      {:unicode.characters_to_binary(text), global_state}
  end
end

We'd have to do something similar in aws-erlang but that's easy enough πŸ˜„

I'm tempted to close this issue seeing as aws-sdk-net isn't fixing this, I'm not convinced we should handle the ETag in any special way either. So besides the integer fix, I'm not convinced we should deviate from the official SDK(s).

@Doerge What's your thoughts? :-)

Yikes. Even in the HTTP header response the ETag is double-quotes wrapped.
Screenshot 2024-02-01 at 12 00 34
There is probably a good reason for that. Just tripped me up tremendously😬
I'm now a bit unsure if it is then safe to attempt to decode everything that looks like an integer too? Maybe there are similar scenarios where AWS would return "123" instead of 123 for some reason?

Then let's agree to leave this as is. I'm also a bit concerned as to existing code that already performs similar type of parsing of responses that you wrote that would suddenly have to change by us doing integer conversions on the aws-{elixir,erlang} side but I'd be open to the idea.

For now, let's simply leave this be and have this issue serve as a history for someone who encounters it πŸ‘ If we want to pick up integer conversion later on, we can πŸ‘

Hmm..
The type info is in the json file that is used to generate these modules:
https://github.com/aws/aws-sdk-go/blob/3e2c844a50e888261e19e7d4c0991346580c2b98/models/apis/s3/2006-03-01/api-2.json#L6313
However I see that AWS is deprecating the aws-sdk-go package anyway, and the aws-sdk-go-v2 https://github.com/aws/aws-sdk-go-v2/blob/main/codegen/sdk-codegen/aws-models/s3.json format is way different:

        "com.amazonaws.s3#PartNumber": {
            "type": "integer"
        },
        "com.amazonaws.s3#PartNumberMarker": {
            "type": "string"
        },

If any effort is put in, it should probably be in the migration to the new go package.

Yup I'm aware. We essentially have 6 months to migrate πŸ‘ It's on my TODO to start to investigate the effort and work required πŸ‘

Sorry! Wasn't meant like that! 😬 I immensely appreciate these packages! πŸ™
Was just curious if the types were even documented anywhere, so I dug into where the type info came from, and saw the deprecation notice!

No worries πŸ˜„ I know it's gonna be a big thing and it's been on my todo to look into but something keeps getting in my way to really sit down a weekend to dive into this.

Since most of the boilerplate is already in place in aws-codegen I suspect it's not gonna be a complete and utter rewrite but would still estimate it to be a decent amount of work πŸ‘