paws-r/paws

How to use additional checksum algorithms when using put_object()

Opened this issue ยท 16 comments

Using paws, I am looking to store additional checksum data when using s3$put_object(), specifically sha1 values. However, I'm having issues successfully setting this when executing put_object().

Is this something I can currently do with paws? Below is some basic sample code and the associated error message.

Sample code:

s3$put_object(
  Bucket = "test-bucket-name",
  Key = "test-file-key",
  Body = "test-file",
  ChecksumAlgorithm = 'sha1'
)

Associated error:

Running the example above produces the following error message:

Error: InvalidRequest (HTTP 400). x-amz-sdk-checksum-algorithm specified, but no corresponding x-amz-checksum-* or x-amz-trailer headers were found.

Looking through the paws documentation, I am not sure how to set the "headers" referenced in the error message above.

What I'm ultimately hoping for is, when using s3$put_object(), I can store the sha1 value for the object I am adding to s3, then I want to have the ability to retrieve that data using something like s3$list_objects()

Thanks in advance.

Can you share the logs please, options(paws.log_level=3). It will help debug the issue :)

Absolutely, thanks @DyfanJones!

Below are the logs you requested, along with some additional background information, in case it's helpful:

Background Info:

What I am looking to do/take advantage of is covered here. However, rather than utilizing the additional checksums feature via say the s3 console, I'm hoping to do so programmatically via paws.

Requested Logs retrieved via options(paws.log_level=3)

INFO [2023-04-27 18:04:11.417]: -> PUT /test_data_1.csv HTTP/1.1
-> Host: exp-derived-data-dev.s3.us-east-2.amazonaws.com
-> Accept-Encoding: deflate, gzip, br
-> Accept: application/json, text/xml, application/xml, */*
-> User-Agent: paws/0.5.5 (R4.2.2; linux-gnu; x86_64)
-> x-amz-acl: 2
-> x-amz-sdk-checksum-algorithm: sha1
-> Content-Md5: BGGVlI07DfjTJ+gVmdqhMA==
-> Content-Length: 78
-> X-Amz-Security-Token: <REDACTED>
-> X-Amz-Date: 20230427T180411Z
-> X-Amz-Content-Sha256: 8f767239b0bfe5e2b03c04638784e96eef2cca51827685b087f430ee3dc1353d
-> Authorization: AWS4-HMAC-SHA256 Credential=ASIAVVZOODW4YOCBGRNS/20230427/us-east-2/s3/aws4_request, SignedHeaders=content-length;content-md5;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-sdk-checksum-algorithm;x-amz-security-token, Signature=0a6fc8d80861e67b66c094831ce21b40e003cdd0792cdb84991a589f4f0ff688
-> 
INFO [2023-04-27 18:04:11.417]: >> variable_name,value_1,value_2,value_3
>> apple,1,2,3
>> banana,4,2,4
>> orange,2,2,2

INFO [2023-04-27 18:04:11.433]: <- HTTP/1.1 400 Bad Request
INFO [2023-04-27 18:04:11.437]: <- x-amz-request-id: 6ZT7AVPQ2B48RKT7
INFO [2023-04-27 18:04:11.437]: <- x-amz-id-2: bxz2Z94/DGCj0ED6gqro63ilNL/g+tpAcE8kG2mFth50zbeW9dBcT3H+lglV194EYfA2huec1e4=
INFO [2023-04-27 18:04:11.437]: <- Content-Type: application/xml
INFO [2023-04-27 18:04:11.437]: <- Transfer-Encoding: chunked
INFO [2023-04-27 18:04:11.437]: <- Date: Thu, 27 Apr 2023 18:04:11 GMT
INFO [2023-04-27 18:04:11.438]: <- Server: AmazonS3
INFO [2023-04-27 18:04:11.438]: <- Connection: close
INFO [2023-04-27 18:04:11.438]: <- 
Error: InvalidRequest (HTTP 400). x-amz-sdk-checksum-algorithm specified, but no corresponding x-amz-checksum-* or x-amz-trailer headers were found.

Thanks, will have a look at the backend to see why the headers aren't being attached ๐Ÿค”

From checking over I believe don't currently support this functionality. Will need to investigate how the other sdk implement this so that we can bring it over to paws.

Not a 100% sure how to implement crc32c algorithm. It looks like digest doesn't support it as of yet. Will raise a ticket to see if they are happy to implement it.

Raise a ticket with the package digest: eddelbuettel/digest#183

For the time being will focus on the other checksum algorithms. After they have been completed we can loop back to crc32c.

Thanks for all the investigating/work thusfar @DyfanJones! Please let me know if there's anything else I can provide.

No worries, I am on holiday for the next 2 weeks. I will start work on this when I get back. In the meantime please feel free to raise any PRs, more than happy to review them.

hi @DyfanJones! hope you had a good holiday...wanted to check in to see if there were any updates here.

Hi @tkwilos we have some fantastic news, @eddelbuettel has implemented the crc32c algorithm https://github.com/eddelbuettel/crc32c. This means we can proceed in implemented the new checksum algorithms possibly by investigating botocore implementation https://github.com/boto/botocore/blob/develop/botocore/httpchecksum.py.

This feature will take a little time as I am fairly busy with a new born. I will keep you updated on the progress of this feature.

Please feel free to raise PR if you are able to get to this before me :)

Yep meant to circle back too. It's all there but not yet fully wired up in the digest version on CRAN. However, crc32c is there and can be used and relied upon. We should circle back 'time permitting' to make better use of it in digest too.

Thanks so much for the updates and efforts @DyfanJones & @eddelbuettel!