Azure/azure-sdk-for-python

[engsys] Global Sanitizers inconsistently sanitize storage account names, recordings unreplayable

kdestin opened this issue · 1 comments

Describe the bug

#35196 introduced a collection of "global" sanitizers that scrub secrets from recordings as they are written to disk.

I'm currently writing a test, where the code path involves:

  1. Fetching details about a storage account

  2. Usage those details to build the uri for the next request

This sanitizer will redact the storage account name in the recording from the response in Step 1.

{"json_path": "$..accountName", "value": SANITIZED},

There is no "global" sanitizer that sanitizes storage account names from request urls.

This leaves my recording un-replayable.

In recording mode, the code receives the sanitized request and tries to send a subsequent request to a URL it builds with the sanitized values: https://sanitized.blob.core.windows.net. But the recording stored an unsanitized URL for that subsequent request, https://account-name.blob.core.windows.net, so the proxy is unable to find a match.

To Reproduce
Steps to reproduce the behavior:

  1. Succesfully record a test in live mode that:

    1. Fetches some response with details about a storage account
    // Example response
    {
            "id": "/subscriptions/00000000-0000-0000-0000-000000000/resourceGroups/00000/providers/Microsoft.MachineLearningServices/workspaces/00000/datastores/workspaceblobstore",
            "name": "workspaceblobstore",
            "type": "Microsoft.MachineLearningServices/workspaces/datastores",
            "properties": {
              ...,
              "subscriptionId": "00000000-0000-0000-0000-000000000",
              "resourceGroup": "resource-group",
              "datastoreType": "AzureBlob",
              "accountName": "account-name",
              "containerName": "d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore",
              "endpoint": "core.windows.net",
              "protocol": "https",
              "serviceDataAccessAuthIdentity": "WorkspaceSystemAssignedIdentity"
            },
            "systemData": {
                ...
            }
    
    }
    1. Uses that response to build the URL for a subsequent request

    https://account-name.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/path/to/files

  2. Attempt to re-run the test in recording mode

Expected behavior

The test should run off the recording, and pass

Actual behavior

The test fails

ERROR    root:proxy_fixtures.py:312 

-----Test proxy playback error:-----

Unable to find a record for the request PUT https://sanitized.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/LocalUpload/0e7abff4dcb2ddd489d3e72fa2039bf6/README.md?sv=2021-10-04&si=azureml-system-datastore-policy&sr=c&sig=Sanitized
Method doesn't match, request <PUT> record <HEAD>
Uri doesn't match:
    request <https://sanitized.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/LocalUpload/0e7abff4dcb2ddd489d3e72fa2039bf6/README.md?sv=2021-10-04&si=azureml-system-datastore-policy&sr=c&sig=Sanitized>
    record  <https://account-name.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/LocalUpload/0e7abff4dcb2ddd489d3e72fa2039bf6/README.md?sv=2021-10-04&si=azureml-system-datastore-policy&sr=c&sig=Sanitized>

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.