Disable string to date auto conversion
ThorbenJ opened this issue · 18 comments
Good day,
It seems this module will try and convert strings that look like dates into date objects. This means the original string is lost, and it appears the date is then converted to local time, in local format.. so completely mangled from the original data that was expected.
This issue is another aspect of the problem reported in #69 and #86
It would be best if this could be disabled and let those who know the meaning and purpose of the data/string decide if a particular string should be converted to another data type (such as date). Perhaps a flag to toggle the feature?
Maybe a unit test to confirm that $input eq. $output when $output = $( $input | ConvertFrom-Yaml | ConvertTo-Yaml )
Thank you
-Thorben
BTW I have this running on MacOS
sound of frustration
I thought I'd work around this with something like:
$r.created_at | Write-Debug
$r.created_at = [String]$( $r.created_at.ToUniversalTime().ToString("yyyy-MM-ddTHH:mm:ssZ") )
$r.created_at | Write-Debug
Outputting
DEBUG: 03/30/2022 12:55:04
DEBUG: 2022-03-30T10:55:04Z
So appears ok, but then the ConvertTo-Json | Write-File results is a date to local timezone conversion again. So it not just powershell-yaml mangling data, its systemic...
Hi @ThorbenJ !
I understand your frustration.
In hindsight, converting date strings to powershell dates was not a great idea. Dealing with dates is complicated and you can never know how someone will prefer that date to be handled.
A quick fix will be, as you suggested, a flag to disable parsing dates. I can add it tomorrow. You will have to set that flag to true, as changing defaults will probably break things for a lot of people.
A longer term fix would be adding tag support to powershell-yaml
allowing users of this module to specify the data type of the values by adding things like !!string
, !!null
, !!date
, etc. But this will mean a bit more work and we would move away from the "thin wrapper" approach.
There is a new branch with the changes, here: https://github.com/cloudbase/powershell-yaml/tree/add-dont-convert-flag
It adds a new flag to ConvertFrom-Yaml
called -DontConvertValues
. This will disable all type conversion for values, and will leave them all as string. You will have to make sure you cast your own values to whatever type you're expecting, not just dates.
This also means that when converting back to yaml, all string values that have the potential to be ambiguous, will be quoted. For example:
test: 1
test2: "1"
When converted from yaml
will yield a hash map where the values of both test
and test2
will be strings. When converting back to yaml, both values will be quoted:
PS /tmp/powershell-yaml> Convertfrom-yaml $data | convertTo-yaml
test2: "1"
test: 1
PS /tmp/powershell-yaml> Convertfrom-yaml $data -DontConvertValues| convertTo-yaml
test2: "1"
test: "1"
This happens because we can't really save metadata about the values and their style, tags, etc in base powershell types. A future version of this module may migrate towards a PSCustomObject
which will give us more flexibility.
Let me know if this helps.
Hi. Thank you for looking at this.
However I am not sure I understand, why is the test integer converted to a string, when one asks not to convert? I had not before, but I just looked at the code and branch diff.
Instead of by-passing Convert-ValueToProperType entirely, I suggest let handle JSON/YAML native types (string, number, boolean, null) - but let this flag disable conversion of non-native types (currently just date).
I'd do this by moving the "Null" conversion before "Date" and have an early function return before the "Date" block if the flag is set.
Also for performance, you might want to build/compile that date detecting regex once, earlier, and not repeatedly with every call of Convert-ValueToProperType - which I assume happens for nearly every value in the original doc.
However I am not sure I understand, why is the test integer converted to a string, when one asks not to convert? I had not before, but I just looked at the code and branch diff.
This was a heavy handed approach that simply passed along the $Node.Value
we get from YamlDotNet
without any inference of type, which is not ideal.
Instead of by-passing Convert-ValueToProperType entirely, I suggest let handle JSON/YAML native types (string, number, boolean, null) - but let this flag disable conversion of non-native types (currently just date).
That might be best for now. Will need to find some cycles to work on this, and hopefully resolve some of the issues (including this one). Will hopefully free up some time in the coming weeks.
Sorry for the frustration this caused.
I understand your frustration.
In hindsight, converting date strings to powershell dates was not a great idea. Dealing with dates is complicated and you can never know how someone will prefer that date to be handled.
[...]
It's not only dates. I have a similar problem with Integers vs. Strings - e.g. international phone numbers. "+4922812345678" (string) becomes 4922812345678 (int) in two steps. Should I create a separate issue for this?
It's not only dates. I have a similar problem with Integers vs. Strings - e.g. international phone numbers. "+4922812345678" (string) becomes 4922812345678 (int) in two steps. Should I create a separate issue for this?
sigh... At least for ConvertFrom-Yaml
, I will try to properly parse tags. This way at least we avoid these kinds of issues. Sadly, without migrating to something like PSCustomObject
, it won't be enough to enable round-tripping, but at least we can remove all the type inference, and hand that over to the user.
phone: !!str +4922812345678
should then decode to string. For dates, there is an unofficial tag. We could support it, with the added caveat that it may lead to issues such as the one described here, and if omitted, we could just default to string.
I am still in the process of clearing my backlog, but I will hopefully have time for this in the next couple of days.
Okay, this is stuck to my brain now, so I have to fix it.
I will add some tag support in Convert-ValueToProperType
which will function as follows:
- If a tag is specified on the scalar, we will attempt to cast to that type without any further checks
- If no tag is specified, we attempt to do auto conversion
- Dates will default to
string
unless!!timestamp
is used as a tag
The following tags will be supported: http://yaml.org/spec/1.2-old/spec.html#id2804923
The seq
and map
tags should already be mapped by YamlDotNet
to YamlDotNet.RepresentationModel.YamlMappingNode
and YamlDotNet.RepresentationModel.YamlSequenceNode
which we already handle.
How does this sound?
Sounds like an improvement, thank you for taking the time to look at this.
I would still vote on a flag to control autoconversion. e.g. on/off or even an autoconversion level: none, native, all - native being just the YAML/JSON native types (string, num, bool), not derived types like dates or numbers in strings.
However the post storkzipisten suggests that a string with a number in it, is treated as a number, which is wrong - the native type would have been string. Is this a limitation of YamlDotNet? If so sounds like your plan is the best.
sigh... At least for
ConvertFrom-Yaml
, I will try to properly parse tags. This way at least we avoid these kinds of issues. Sadly, without migrating to something likePSCustomObject
, it won't be enough to enable round-tripping, but at least we can remove all the type inference, and hand that over to the user.
FWIW, for round tripping you could still | Add-Member ...
any value (even non-pscustom objects), but it'd have a perf cost...
I feel like making custom de/serializer easier to register would be a better way, but haven't had time to look I to into it.
FWIW, for round tripping you could still
| Add-Member ...
any value (even non-pscustom objects), but it'd have a perf cost...
This sounds great actually. We can have a flag -EnableRoundTrip
(or something) which defaults to $false
.. A custom de/serializer would be nice, but I have no bandwidth for it (sadly).
Added an initial implementation. I save the original YamlDotNet object as a member in the top level node that gets unserialized. We use extra memory to persist two copies of the object, but we can at least round-trip now.
To use it:
PS /tmp> $data = @"
>> aString: !!str 12345
>> aBool: !!bool TRUE
>> aNull: !!null "0"
>> aFloat: !!float 111.1111
>> aPhoneNumber: !!str +4922812345678
>> aDateWithNOTagsShouldBeString: 2022-03-30T10:55:04Z
>> aDateWithTag: !!timestamp 2022-03-30T10:55:04Z
>> "@
PS /tmp>
PS /tmp> $converted = ConvertFrom-Yaml $data -RoundTrip
PS /tmp> $converted.OrigObject
Key Value
--- -----
aString 12345
aBool TRUE
aNull 0
aFloat 111.1111
aPhoneNumber +4922812345678
aDateWithNOTagsShouldBeString 2022-03-30T10:55:04Z
aDateWithTag 2022-03-30T10:55:04Z
PS /tmp> $converted.OrigObject.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True False YamlMappingNode YamlDotNet.RepresentationModel.YamlNode
PS /tmp> ConvertTo-Yaml $converted
aString: !!str 12345
aBool: !!bool TRUE
aNull: !!null "0"
aFloat: !!float 111.1111
aPhoneNumber: !!str +4922812345678
aDateWithNOTagsShouldBeString: 2022-03-30T10:55:04Z
aDateWithTag: !!timestamp 2022-03-30T10:55:04Z
While still having native powershell objects to work with:
PS /tmp> $converted.aString.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True String System.Object
The Add-Member
commandlet is called only once, so performance impact should be minimal. This may actually be faster, as we don't process the input object in any way when converting back to yaml.
@gaelcolas Thanks a lot for the Add-Member
suggestion!
@storkzipisten Is #100 enough to fix the phone nr issue?
Folks, if you have any input in regards to #100, feel free to add it there. Will leave it open for a couple of days.
Merging this as is for now.
Ideally, in the future we'll have a custom de/serializer that will allow us to set tags on objects we craft in powershell, not just objects we get after de-serializing a yaml. For now, we do parse tags when de-serializing, and we save the original YamlDotNet object as a member in the return value. Child items do not have the extra member. The root node is saved only in the return value we get from ConvertFrom-Yaml
.
This allows us (at the cost of using extra memory), to round-trip later if we specify the -Options RoundTrip
to ConvertTo-Yaml
.
Will have to find a better way to enable round-tripping, as we may mutate the object after de-serializing and that would basically be ignored if we just serialize it back. Will look into it more. For now, I'll just leave in the tag parsing support, which should fix the original issue reported here. When time permits, a proper de/serializer will be written.