buchgr/bazel-remote

ExtractLogicalSize returns abnormal size for s3 file

PI-Victor opened this issue · 2 comments

Hey folks,

I'm having an issue when i'm trying to fetch a cached file from s3. Specifically, a segment of the ExtractLogicalSize function is returning an unusually large file size:

var uncompressedSize int64
br := bytes.NewReader(earlyHeader[8:])
err = binary.Read(br, binary.LittleEndian, &uncompressedSize)
if err != nil {
return nil, -1, err
}

you can see in the logs the value of uncompressedSize.

2023/06/16 21:02:28 S3 CONTAINS bazel-remote-temporary cas.v2/c9/c9c6ada24ecb197e6902b6c9454867973a68ec5a531df651729a811924ed8e5e OK
2023/06/16 21:02:28 S3 DOWNLOAD bazel-remote-temporary cas.v2/c9/c9c6ada24ecb197e6902b6c9454867973a68ec5a531df651729a811924ed8e5e OK
**uncompressedSize: 4048010878927438080**
2023/06/16 21:02:28 failed to get CAS c9c6ada24ecb197e6902b6c9454867973a68ec5a531df651729a811924ed8e5e from proxy backend size: -1 err: expected magic number not found
2023/06/16 21:02:29 GRPC ASSET FETCH https://eu.edge.kernel.org/fedora/updates/38/Everything/x86_64/Packages/c/curl-8.0.1-1.fc38.x86_64.rpm 404 Not Found

The actual file size on s3:

{
    "AcceptRanges": "bytes",
    "LastModified": "xxx",
    "ContentLength": 357972,
    "ETag": "\"b0c28af12e124b2487f06b12d255b1f7\"",
    "ContentType": "binary/octet-stream",
    "Metadata": {}
}

The file in question is a curl rpm that I've uploaded to S3 from another internal application cache, it's a simple hashed blob. I've confirmed its integrity - the sha256sum matches, the file is installable, etc. So, I'm confident the file is not corrupted.

[root@38c284c09f27 /]# rpm -qlp /curl-8.0.1-1.fc38.x86_64.rpm
/usr/bin/curl
/usr/lib/.build-id
/usr/lib/.build-id/3c
/usr/lib/.build-id/3c/8a7d82c4e99cca1db2514d545d384733133512
/usr/share/doc/curl
/usr/share/doc/curl/BUGS.md
/usr/share/doc/curl/CHANGES
/usr/share/doc/curl/FAQ
/usr/share/doc/curl/FEATURES.md
/usr/share/doc/curl/README
/usr/share/doc/curl/TODO
/usr/share/doc/curl/TheArtOfHttpScripting.md
/usr/share/man/man1/curl.1.gz
/usr/share/zsh
/usr/share/zsh/site-functions
/usr/share/zsh/site-functions/_curl

[root@38c284c09f27 /]# sha256sum curl-8.0.1-1.fc38.x86_64.rpm
c9c6ada24ecb197e6902b6c9454867973a68ec5a531df651729a811924ed8e5e  curl-8.0.1-1.fc38.x86_64.rpm

At this point, it's unclear whether Minio might be returning some data that causes unexpected behavior in the code, or whether something else is the root cause. To try and resolve this, I updated the Minio dependency to the latest version, but the problem persists.
I think the issue here is also the fact that i'm having a hard time understanding what "magic number" it's trying to get, exactly? I'd appreciate any pointers!

thanks!

Hi, I think the problem is that when using bazel-remote's default --storage_mode zstd setting, the "cas.v2" blobs are stored in a specific compressed form both in the disk cache and the proxy backends. So when bazel-remote tries to fetch this blob that was uploaded by another tool, it doesn't receive the data in the format it expects.

I think the best thing to do would be to re-upload this blob using bazel-remote itself (eg by uploading to bazel-remote's http API), then it will be in the expected format.

Heya @mostynb, thanks, it turns out i missed the fact that the default compression is zstd, i've turned it off and it now works ok.
thanks!