[engsys] Batch Sanitizers have the potential to mangle JSON responses and make them invalid
kdestin opened this issue · 2 comments
Describe the bug
#35196 introduced a collection of "global" sanitizers that scrub secrets from recordings as they are written to disk.
The sanitization is Regex based and not "file format aware". Some of the sanitizers as defined have the potential to leave responses in a corrupt state (e.g. as invalid json).
The sanitizers for query strings were the source of my issue, but this might be an issue with other sanitizers too:
To Reproduce
Steps to reproduce the behavior:
- Write a test that receives an HTTP response which'll get sanitized. Example (
&sig=
query param)
{
"secretsType": "Sas",
"sasToken": "sv=2021-10-04&si=azureml-system-datastore-policy&sr=c&sig=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
}
-
Observe that your test passes when run in live mode (with the following environment variable set
AZURE_TEST_RUN_LIVE=true
). -
Try to replay the test in recording mode (with the following environment variable set
AZURE_TEST_RUN_LIVE=false
).
Expected behavior
The test should pass, using the sanitized responses from the record.
Actual Behavior
This sanitizer mangles the JSON response:
- It "eats" the trailing double quote (
"
) that terminates the JSON string, which leaves the response as invalid JSON - It "eats" the leading ampersand (
&
), which corrupts the preceding query parameter.
except ValueError as err:
> raise DecodeError(
message="JSON is invalid: {}".format(err),
response=response,
error=err,
E azure.core.exceptions.DecodeError: JSON is invalid: Invalid control character at: line 3 column 78 (char 103)
E Content: {
E "secretsType": "Sas",
E "sasToken": "sv=2021-10-04&si=azureml-system-datastore-policy&sr=cSanitized
E }
../../../../venv/lib/python3.8/site-packages/azure/core/pipeline/policies/_universal.py:616: DecodeError
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
ML team has a regex for query parameters that might be helpful in finding a fix
azure-sdk-for-python/sdk/ml/azure-ai-ml/tests/conftest.py
Lines 72 to 87 in 434a248
@mccoyp can you take a look?