Azure/azure-storage-fuse

I want to display the latest BLOB data using streaming method.

Shunya-Seki opened this issue · 9 comments

Which version of blobfuse was used?

Blobfuse2 version: 2.3.0~preview.1

Which OS distribution and version are you using?

Rhel8.8

If relevant, please share your mount command.

◆mount command
blobfuse2 mount /blobmount --config-file=/etc/blobfuse2config.yaml -o allow_other

◆config.yaml

Refer ./setup/baseConfig.yaml for full set of config parameters

logging:
type: syslog
level: log_debug

components:

  • libfuse
  • stream
  • attr_cache
  • azstorage

libfuse:
attribute-expiration-sec: 0
entry-expiration-sec: 0
negative-entry-expiration-sec: 0
direct-io: true

stream:
block-size-mb: 0
max-buffers: 0
buffer-size-mb: 0

attr_cache:
timeout-sec: 7200

azstorage:
type: block
account-name: xxxxx
objid: xxxxxxxxx
endpoint: xxxxxx
mode: msi
container: xxxx

What was the issue encountered?

I want to use the Stream method and ensure that the data in Blob Storage is always up to date on the OS side where the mount is performed.
Is the above configuration okay?
I want to confirm just in case.

Have you found a mitigation/solution?

The configuration seems to be working fine, and the latest BLOB data is being displayed without any issues.

Please share logs if available.

If you want to refresh the contents locally as and when they are updated on the container then this configuration will not work. What you need here is to use '-o direct_io' cli parameter. 'streamis not a stable component so you can migrate toblock-cache` instead. Sample command and config below :

blobfuse2 mount /blobmount --config-file=/etc/blobfuse2config.yaml -o allow_other -o direct_io
logging:
  type: syslog
  level: log_debug

components:
  libfuse
  block_cache
  attr_cache
  azstorage

libfuse:
  attribute-expiration-sec: 0
  entry-expiration-sec: 0
  negative-entry-expiration-sec: 0

block_cache:
  block-size-mb: 8
  mem-size-mb: 2048
  prefetch: 12
  parallelism: 64

attr_cache:
  timeout-sec: 7200

azstorage:
  account-name: xxxxx
  objid: xxxxxxxxx
  mode: msi
  container: xxxx

I see you are using objid for MSI based authentication. It's adivsed to change to appid based authentication as objid based is not natively supported and needs azcli as well to be installed. If you are using Azure VM then you can assign the identity to the VM itself and then skip providing any appid/objid here in the config file.

Closing this as there is no action item on blobfuse here. Feel free to post your questions/queries here.

Thank you for the information. I've implemented the provided config and mount, but it's not updating.
I'm checking the content with the following steps:
①Configuration settings (Received config file)

②Mounting
blobfuse2 mount /blobmount --config-file=/etc/blobfuse2config.yaml -o allow_other -o direct_io

③Confirming the content with the following command
cat /blobmount/contents

④Updating the content (From Azure Portal)

⑤Reconfirming the content with the following command
cat /blobmount/contents

Additionally, I was able to mount it without any issues, skipping the "objid".
(I'm using Azure VM)

Remove "attr_cache" from "components" section in your config file and remount.
As you have enabled log debug you can check the logs when you issue cat command for the second time. You shall receive a file open call for that and some downloads shall happen. If thats not happening then you can share the log files with us.

I removed 'attr_cache' from the config and remounted. Now the latest content is being displayed. Thank you for your assistance.
Could you provide additional information?
Would it be okay for all settings of 'block_cache' to be set to '0' when displaying always-updated content, as in this question?

block-size-mb:
mem-size-mb:
prefetch:
parallelism:

No in block-cache model you can not set all these parameters to 0 as that means you do not have any memory allocated to hold the incoming data. Based on your available memory and average file size you can tune these parameters.

Thank you.
The memory of the Azure VM and the average of the read files are as follows. Are there any recommended values for these parameters in this case?

The memory of the Azure VM:32GiB
average of the read files(read only):0.8GB

◆Parameters of Block_cache
block-size-mb:
mem-size-mb:
prefetch:
parallelism:

Is there a workload to determine the parameters for "block-cache"?

keep "block-size-mb" to 16 and based on avilable memory space you can allocate "mem-size-mb". I see you have 32GB memory in your VM so you can put 20GB for this value (if you are mounting only one instance of blobfuse and there are no other memory hungry applications running on same node). "prefetch" you can set to 50 as avg file size is not too huge. For "parallelism" you can set it to 50 as well.