Azure/azure-storage-fuse

Blobfuse2 issue with .usdz and .glb files

gurugit07 opened this issue · 13 comments

Which version of blobfuse was used?

2.0.5

Which OS distribution and version are you using?

If relevant, please share your mount command.

What was the issue encountered?

When uploading .usdz and .glb files blobfuse2 is corrupting the file content. I compared the Md5 hash of original file it's different from the content of the on azure storage. Also it's updates the content-type as "application/octet-stream". On directly copying file to Azure Storage using Storage Explorer which uses AZCopy content-type stays as "application/zip" and Md5Hash remains same.

As per code I don't see support for usdz and glb -

std::map<std::string, std::string> contentTypeMap {

Is there a way to ignore content type and keep the content same during fuse upload?

Have you found a mitigation/solution?

No

Please share logs if available.

For content type we can add support.
As far as MD5sum and file corruption is concerned, are you suing file-cache or streaming config?
It will be great help if you share your config file with us (remove all user credentials before sharing).

Thank you, is there an ETA to add content type? Also if there is solution to keep the MD5sum intact, so that file doesn't get corrupted, please share.
Here is the config file -

`allow-other: true

logging:
type: base
level: log_info
file-path: /home/{path}/blobFuse.log

components:

  • libfuse
  • attr_cache
  • azstorage

libfuse:
attribute-expiration-sec: 120
entry-expiration-sec: 120
negative-entry-expiration-sec: 240

attr_cache:
timeout-sec: 7200

azstorage:
type: block
account-name:
account-key:
endpoint:
mode: key
container: `

You do not have a file-cache component plugged in the pipeline and that is the reason for file corruption. Your pipeline shall look like

components:

libfuse
file_cache
attr_cache
azstorage

and then in addition provide file-cache config as well like the tmp-path etc.

For the other part of content-type, we will add it with our next release which shall happen towards end of this month.

You do not have a file-cache component plugged in the pipeline and that is the reason for file corruption. Your pipeline shall look like

components:

libfuse
file_cache
attr_cache
azstorage

and then in addition provide file-cache config as well like the tmp-path etc.

Thank you, adding file_cache solved the corruption issue. Actually we had file_cache initially but we saw issues with it so removed it. Here are older logs -

blobfuse2[30907]: LOG_ERR [file_cache.go (724)]: FileCache::DeleteFile : error {file_name}.glb [no such file or directory] blobfuse2[30907]: LOG_ERR [libfuse_handler.go (874)]: Libfuse::libfuse_unlink : error deleting file {file_name}.glb [no such file or directory]

I think we saw errors on VM restart where blobfuse2 is running. Here is our /etc/fstab entry
/usr/bin/blobfuse2 /home/{container-path} fuse defaults,_netdev,--config-file=/home/{config-path}/blobfuse-config.yaml,allow_other 0 0

Please let me know if any change is required here.

This is not an issue with file-cache, its just saying you are trying to delete a file that does not exists. Does you file name contain these special characters "{file_name}.glb"?

PR #1261
We have fixed the content-type for files with extension ".usdz" as "application/zip".
For the ".glb" file uploading through AzCopy shows content-type as "application/octet-stream", so we haven't made any changes for that.

Does you file name contain these special characters "{file_name}.glb"?

Yes our flle name contains underscore.

by special character I meant "{" "}" also ?

Currently we are not using it, but we have plan to add this in future. Would that cause any issue?
file_cache: path: /home/{path} timeout-sec: 10 cleanup-on-start: true allow-non-empty-temp: true max-size-mb: 4096
We are thinking of these settings, Can you suggest if we should add any other config for file_cache or optimal number for max-size-mb.
We are anticipating multiple files with overall size of 50GB max and one file max size of 500MB during one push from client.

  • for special characters in filename, there shall be no issue.
  • "max-size-mb: 4096": this is more of a soft limit saying when disk usage of your cache reaches 4GB (85% of it) then start evicting the files earlier. As it's a soft limit if application does not close the file handles, then files will still keep occupying space on disk and will not be deleted. Also, if you try to download a file which is beyond this limit it will still go ahead and download it. This is just to engage early eviction of files.
  • If your application workflow uses one file only once while processing then better to keep the file-cache timeout as "0" so that file is evicted from cache as soon as application closes it. Keep files in cache with higher cache timeout only when application needs to to process the same file multiple times.

Just to reconfirm ".glb" files do not have any specific content-type set by AzCopy as well. We validated this locally and blobfuse will also not set any specific type for these. For the other file zip will be set. Kindly confirm this is fine with your workflow.

Yes I have tested the same with Azcopy as well. I confirm the change.