Use blobfuse2 for streamable TesInputs
MattMcL4475 opened this issue · 0 comments
MattMcL4475 commented
Problem:
Customers need the ability to perform random reads using a file system for large genomics reference files without downloading the entire file, which costs more and puts pressure on the storage account.
Solution:
- If any
TesInput.Streamable
is set totrue
, the TES runner should download and install blobfuse2 - It should aggregate all of the container mounts and only mount the minimum required mounts with
blobfuse2 mount
- It should ensure the
path
specified for the TesInput.path works
I confirmed that random reads in blobfuse2 work as expected:
blobfuse2 mount /ref --config-file=./b2.yaml
dd if=stLFR.split_read.1.fq.gz skip=50000000000 bs=1 count=128 iflag=skip_bytes 2>/dev/null | xxd
#!/bin/bash
# Azure Blob URL - NOTE SAS has been removed
blob_url="https://mattmcl.blob.core.windows.net/inputs/stLFR.split_read.1.fq.gz"
# Byte range to download: Example uses the range from 50000000000 to 50000000127
range_start=50000000000
range_end=50000000127
# Using curl to download the specified byte range
curl -s -o downloaded_bytes.bin -H "Range: bytes=$range_start-$range_end" "$blob_url"
echo "From REST:"
# Display downloaded bytes in hex format for comparison
xxd downloaded_bytes.bin
echo "From blobfuse:"
# Optional: Compare with bytes extracted from the local file using dd
dd if=/ref/stLFR.split_read.1.fq.gz skip=$range_start bs=1 count=$((range_end - range_start + 1)) iflag=skip_bytes,count_bytes 2>/dev/null | xxd