Should to_cmd_args pass complex types through json.dumps instead of str?
croth1 opened this issue · 1 comments
I am currently hunting down a bug that I could track back into this in sagemaker-training-toolkit.
In my case, I am passing a list as a hyperparameter type, just like:
estimator = Estimator(
image_uri=docker_img,
role=role,
instance_type=instance_type,
instance_count=1,
output_path=output_s3,
hyperparameters={
"str_param": "foobar",
"list_param": ["foo", "bar"]
}
)
Background
To my understanding, sagemaker now uses json.dumps
to convert all the values to string and again dumps the resulting dictionary to hyperparamters.json
In sage-maker-toolkit the transformation is reversedhere: https://github.com/aws/sagemaker-training-toolkit/blob/master/src/sagemaker_training/environment.py#L210, leaving us with the original dictionary and values that contain complex objects such as lists.
Issue
When I use env.to_cmd_args()
to convert my hyperparameters to command line args, complex values such as lists are not converted back to a json string but simply passed through the str()
function. This makes it difficult to reverse the encoding.
hyperparameters={
"str_param": "foobar",
"list_param": ["foo", "bar"]
}
Will end up in following command line args:
["--str_param", "foobar", "--list_param", "['foo', 'bar']"]
. Note that the value for --list_param
is a list converted to a string with the str
function, not json.dumps
.
A proposed solution
If json.dumps
was used instead of str
, (e.g. return obj if instanceof(obj, str) else json.dumps(obj)
) the transformation could be easily reversed with a command line parser:
parser.add_argument("--list_param", type=json.loads)
This seems to be a misunderstanding. As far as I understand now there is no json.dumps logic when passing the hyperparameters to the training instance, so the issue seems invalid.