Azure/azure-sdk-for-python

start_copy_from_url raised CannotVerifyCopySource when DefaultAzureCredential used

LiliDeng opened this issue · 6 comments

Now we have to get rid of using the shared access key, so we use below way to initialize BlobServiceClient with DefaultAzureCredential.
But when invoke start_copy_from_url method, it raised CannotVerifyCopySource exception, we can't use sas or public storage account. how can we find a solution in this situation? thanks!

from azure.storage.blob import BlobServiceClient
from azure.identity import DefaultAzureCredential
from azure.mgmt.storage import StorageManagementClient
import logging

logging.basicConfig(filename='c:/temp/log2.log', level=logging.DEBUG)
# Details of your Azure Storage Accounts
source_account_url = "https://lilitest001.blob.core.windows.net"
destination_account_url = "https://lilitest0002.blob.core.windows.net"
source_blob_name = "test.vhd" # 30 gb
destination_blob_name = "test.vhd"
# Name of the source and destination container
source_container_name = 'vhds'
destination_container_name = 'vhds'

# Initialize Azure credentials
credential = DefaultAzureCredential(
    exclude_environment_credential=True,
    exclude_managed_identity_credential=True,
    exclude_powershell_credential=True,
    exclude_shared_token_cache_credential=True,
    exclude_visual_studio_code_credential=True,
)

source_blob_service_client = BlobServiceClient(account_url=source_account_url, credential=credential)
destination_blob_service_client = BlobServiceClient(account_url=destination_account_url, credential=credential)

source_blob_url = f"{source_account_url}/{source_container_name}/{source_blob_name}"
dest_blob_url = f"{destination_account_url}/{destination_container_name}/{destination_blob_name}"
destination_blob_client = destination_blob_service_client.get_blob_client(container=destination_container_name, blob=destination_blob_name)

copy = destination_blob_client.start_copy_from_url(source_blob_url)

Log

INFO:azure.core.pipeline.policies.http_logging_policy:Response status: 401
Response headers:
    'Content-Length': '297'
    'Content-Type': 'application/xml'
    'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'
    'x-ms-request-id': '77834f6d-d01e-0025-5784-a2738d000000'
    'x-ms-client-request-id': '35c95c1e-0e78-11ef-8019-00249b6d4810'
    'x-ms-version': 'REDACTED'
    'x-ms-error-code': 'CannotVerifyCopySource'
    'Date': 'Fri, 10 May 2024 02:51:35 GMT'

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jalauzon-msft @vincenttran-msft.

Hi @LiliDeng, the start_copy_from_url method requires you to provide authenticate for both the destination and the source. The destination is handled via the destination client, but the source authentication info is not always handled for you. When using DefaultAzureCrendential (aka OAuth) for the destination client, you will need to provide authorization to the source resource yourself.

There are a couple of ways to do this. If you want to use OAuth to authenticate the source as well, you can generate an OAuth token from your credential and provide that to the source_authorization keyword. You will then also need to provide the requires_sync=True option because OAuth is only supported on server-side sync copy. Here is a sample of that:

token = f"Bearer {credential.get_token("https://storage.azure.com/.default").token}"
destination_blob_client.start_copy_from_url(source_blob_url, source_authorization=token, requires_sync=True)

# Can also try with `upload_blob_from_url`
destination_blob_client.upload_blob_from_url(source_blob_url, source_authorization=token, overwrite=True)

NOTE: start_copy_from_url with requires_sync=True has a size limit for the source of 256 MiB. If your Blob is larger, try upload_blob_from_url which has a limit of 5000 MiB

Your other option is to use a read-only SAS for the source Blob and include that in the source_url. I can provide more details on this if needed.

Hi @LiliDeng. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

Hi @LiliDeng, the start_copy_from_url method requires you to provide authenticate for both the destination and the source. The destination is handled via the destination client, but the source authentication info is not always handled for you. When using DefaultAzureCrendential (aka OAuth) for the destination client, you will need to provide authorization to the source resource yourself.

There are a couple of ways to do this. If you want to use OAuth to authenticate the source as well, you can generate an OAuth token from your credential and provide that to the source_authorization keyword. You will then also need to provide the requires_sync=True option because OAuth is only supported on server-side sync copy. Here is a sample of that:

token = f"Bearer {credential.get_token("https://storage.azure.com/.default").token}"
destination_blob_client.start_copy_from_url(source_blob_url, source_authorization=token, requires_sync=True)

# Can also try with `upload_blob_from_url`
destination_blob_client.upload_blob_from_url(source_blob_url, source_authorization=token, overwrite=True)

NOTE: start_copy_from_url with requires_sync=True has a size limit for the source of 256 MiB. If your Blob is larger, try upload_blob_from_url which has a limit of 5000 MiB

Your other option is to use a read-only SAS for the source Blob and include that in the source_url. I can provide more details on this if needed.

Thank you.
I tried start_copy_from_url, it has this exception azure.core.exceptions.ResourceExistsError: The copy source must be a block blob.
With upload_blob_from_url, I saw azure.core.exceptions.ResourceExistsError: The source request body is too large and exceeds the maximum permissible limit (5000MB).

Our copied file is 30+ Gb vhd file, we can't use SAS for security reason, any other suggestion?

Hi @LiliDeng, sorry I missed your Blob size originally. Yeah, for a Blob size of 30 GiB, I don't think you'll be able to use either of the above methods. If you wanted to use OAuth to copy that Blob, you'd have to split the transfer up and use a series of "chunk" copies from the source to destination. This is due to service limitations on copying data asynchronously with OAuth. This will vary based on the type of Blob you want at the destination but at a pseudo code level, that would look like this:

# Depending on how long this takes, you may need to refresh this
token = f"Bearer {credential.get_token("https://storage.azure.com/.default").token}"

# Page Blob
destination_blob_client.create_page_blob
while data left in source:
  destination_blob_client.upload_pages_from_url(source_url, offset, length, source_offset, source_authorization=token)

# Block Blob
blocks = []
while data left in source:
  block_id = id
  blocks.append(BlobBlock(block_id))
  destination_blob_client.stage_block_from_url(block_id, source_url, source_offset, source_length, source_authorization=token)
destination_blob_client.commit_block_list(blocks)

I also see you linked a changed using User Delegation SAS. That is also a fine approach is it meets your security needs.

One final note is if you are willing to go outside Python, we offer the AzCopy command line tool which could do this copy using OAuth by internally performing the operations I shared above.

Hi @LiliDeng, sorry I missed your Blob size originally. Yeah, for a Blob size of 30 GiB, I don't think you'll be able to use either of the above methods. If you wanted to use OAuth to copy that Blob, you'd have to split the transfer up and use a series of "chunk" copies from the source to destination. This is due to service limitations on copying data asynchronously with OAuth. This will vary based on the type of Blob you want at the destination but at a pseudo code level, that would look like this:

# Depending on how long this takes, you may need to refresh this
token = f"Bearer {credential.get_token("https://storage.azure.com/.default").token}"

# Page Blob
destination_blob_client.create_page_blob
while data left in source:
  destination_blob_client.upload_pages_from_url(source_url, offset, length, source_offset, source_authorization=token)

# Block Blob
blocks = []
while data left in source:
  block_id = id
  blocks.append(BlobBlock(block_id))
  destination_blob_client.stage_block_from_url(block_id, source_url, source_offset, source_length, source_authorization=token)
destination_blob_client.commit_block_list(blocks)

I also see you linked a changed using User Delegation SAS. That is also a fine approach is it meets your security needs.

One final note is if you are willing to go outside Python, we offer the AzCopy command line tool which could do this copy using OAuth by internally performing the operations I shared above.

thank you so much for your response!