cloudbase/powershell-yaml

Disable string to date auto conversion

ThorbenJ opened this issue · 18 comments

Good day,

It seems this module will try and convert strings that look like dates into date objects. This means the original string is lost, and it appears the date is then converted to local time, in local format.. so completely mangled from the original data that was expected.

This issue is another aspect of the problem reported in #69 and #86

It would be best if this could be disabled and let those who know the meaning and purpose of the data/string decide if a particular string should be converted to another data type (such as date). Perhaps a flag to toggle the feature?

Maybe a unit test to confirm that $input eq. $output when $output = $( $input | ConvertFrom-Yaml | ConvertTo-Yaml )

Thank you
-Thorben

BTW I have this running on MacOS

sound of frustration
I thought I'd work around this with something like:

$r.created_at | Write-Debug
$r.created_at = [String]$( $r.created_at.ToUniversalTime().ToString("yyyy-MM-ddTHH:mm:ssZ") )
$r.created_at | Write-Debug

Outputting

DEBUG: 03/30/2022 12:55:04
DEBUG: 2022-03-30T10:55:04Z

So appears ok, but then the ConvertTo-Json | Write-File results is a date to local timezone conversion again. So it not just powershell-yaml mangling data, its systemic...

Hi @ThorbenJ !

I understand your frustration.

In hindsight, converting date strings to powershell dates was not a great idea. Dealing with dates is complicated and you can never know how someone will prefer that date to be handled.

A quick fix will be, as you suggested, a flag to disable parsing dates. I can add it tomorrow. You will have to set that flag to true, as changing defaults will probably break things for a lot of people.

A longer term fix would be adding tag support to powershell-yaml allowing users of this module to specify the data type of the values by adding things like !!string, !!null, !!date, etc. But this will mean a bit more work and we would move away from the "thin wrapper" approach.

There is a new branch with the changes, here: https://github.com/cloudbase/powershell-yaml/tree/add-dont-convert-flag

It adds a new flag to ConvertFrom-Yaml called -DontConvertValues. This will disable all type conversion for values, and will leave them all as string. You will have to make sure you cast your own values to whatever type you're expecting, not just dates.

This also means that when converting back to yaml, all string values that have the potential to be ambiguous, will be quoted. For example:

test: 1
test2: "1"

When converted from yaml will yield a hash map where the values of both test and test2 will be strings. When converting back to yaml, both values will be quoted:

PS /tmp/powershell-yaml> Convertfrom-yaml $data | convertTo-yaml
test2: "1"
test: 1

PS /tmp/powershell-yaml> Convertfrom-yaml $data -DontConvertValues| convertTo-yaml
test2: "1"
test: "1"

This happens because we can't really save metadata about the values and their style, tags, etc in base powershell types. A future version of this module may migrate towards a PSCustomObject which will give us more flexibility.

Let me know if this helps.

Hi. Thank you for looking at this.

However I am not sure I understand, why is the test integer converted to a string, when one asks not to convert? I had not before, but I just looked at the code and branch diff.

Instead of by-passing Convert-ValueToProperType entirely, I suggest let handle JSON/YAML native types (string, number, boolean, null) - but let this flag disable conversion of non-native types (currently just date).

I'd do this by moving the "Null" conversion before "Date" and have an early function return before the "Date" block if the flag is set.

Also for performance, you might want to build/compile that date detecting regex once, earlier, and not repeatedly with every call of Convert-ValueToProperType - which I assume happens for nearly every value in the original doc.

However I am not sure I understand, why is the test integer converted to a string, when one asks not to convert? I had not before, but I just looked at the code and branch diff.

This was a heavy handed approach that simply passed along the $Node.Value we get from YamlDotNet without any inference of type, which is not ideal.

Instead of by-passing Convert-ValueToProperType entirely, I suggest let handle JSON/YAML native types (string, number, boolean, null) - but let this flag disable conversion of non-native types (currently just date).

That might be best for now. Will need to find some cycles to work on this, and hopefully resolve some of the issues (including this one). Will hopefully free up some time in the coming weeks.

Sorry for the frustration this caused.

I understand your frustration.

In hindsight, converting date strings to powershell dates was not a great idea. Dealing with dates is complicated and you can never know how someone will prefer that date to be handled.

[...]

It's not only dates. I have a similar problem with Integers vs. Strings - e.g. international phone numbers. "+4922812345678" (string) becomes 4922812345678 (int) in two steps. Should I create a separate issue for this?

It's not only dates. I have a similar problem with Integers vs. Strings - e.g. international phone numbers. "+4922812345678" (string) becomes 4922812345678 (int) in two steps. Should I create a separate issue for this?

sigh... At least for ConvertFrom-Yaml, I will try to properly parse tags. This way at least we avoid these kinds of issues. Sadly, without migrating to something like PSCustomObject, it won't be enough to enable round-tripping, but at least we can remove all the type inference, and hand that over to the user.

phone: !!str +4922812345678

should then decode to string. For dates, there is an unofficial tag. We could support it, with the added caveat that it may lead to issues such as the one described here, and if omitted, we could just default to string.

I am still in the process of clearing my backlog, but I will hopefully have time for this in the next couple of days.

Okay, this is stuck to my brain now, so I have to fix it.

I will add some tag support in Convert-ValueToProperType which will function as follows:

  • If a tag is specified on the scalar, we will attempt to cast to that type without any further checks
  • If no tag is specified, we attempt to do auto conversion
  • Dates will default to string unless !!timestamp is used as a tag

The following tags will be supported: http://yaml.org/spec/1.2-old/spec.html#id2804923

The seq and map tags should already be mapped by YamlDotNet to YamlDotNet.RepresentationModel.YamlMappingNode and YamlDotNet.RepresentationModel.YamlSequenceNode which we already handle.

How does this sound?

Sounds like an improvement, thank you for taking the time to look at this.
I would still vote on a flag to control autoconversion. e.g. on/off or even an autoconversion level: none, native, all - native being just the YAML/JSON native types (string, num, bool), not derived types like dates or numbers in strings.

However the post storkzipisten suggests that a string with a number in it, is treated as a number, which is wrong - the native type would have been string. Is this a limitation of YamlDotNet? If so sounds like your plan is the best.

Okay, could you try out:

And let me know if this is acceptable?

sigh... At least for ConvertFrom-Yaml, I will try to properly parse tags. This way at least we avoid these kinds of issues. Sadly, without migrating to something like PSCustomObject, it won't be enough to enable round-tripping, but at least we can remove all the type inference, and hand that over to the user.

FWIW, for round tripping you could still | Add-Member ... any value (even non-pscustom objects), but it'd have a perf cost...

I feel like making custom de/serializer easier to register would be a better way, but haven't had time to look I to into it.

FWIW, for round tripping you could still | Add-Member ... any value (even non-pscustom objects), but it'd have a perf cost...

This sounds great actually. We can have a flag -EnableRoundTrip (or something) which defaults to $false.. A custom de/serializer would be nice, but I have no bandwidth for it (sadly).

@gaelcolas,

Added an initial implementation. I save the original YamlDotNet object as a member in the top level node that gets unserialized. We use extra memory to persist two copies of the object, but we can at least round-trip now.

To use it:

PS /tmp> $data = @"                                    
>> aString: !!str 12345
>> aBool: !!bool TRUE
>> aNull: !!null "0"
>> aFloat: !!float 111.1111
>> aPhoneNumber: !!str +4922812345678
>> aDateWithNOTagsShouldBeString: 2022-03-30T10:55:04Z
>> aDateWithTag: !!timestamp 2022-03-30T10:55:04Z
>> "@
PS /tmp> 
PS /tmp> $converted = ConvertFrom-Yaml $data -RoundTrip
PS /tmp> $converted.OrigObject                         

Key                           Value
---                           -----
aString                       12345
aBool                         TRUE
aNull                         0
aFloat                        111.1111
aPhoneNumber                  +4922812345678
aDateWithNOTagsShouldBeString 2022-03-30T10:55:04Z
aDateWithTag                  2022-03-30T10:55:04Z

PS /tmp> $converted.OrigObject.GetType()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     False    YamlMappingNode                          YamlDotNet.RepresentationModel.YamlNode

PS /tmp> ConvertTo-Yaml $converted      
aString: !!str 12345
aBool: !!bool TRUE
aNull: !!null "0"
aFloat: !!float 111.1111
aPhoneNumber: !!str +4922812345678
aDateWithNOTagsShouldBeString: 2022-03-30T10:55:04Z
aDateWithTag: !!timestamp 2022-03-30T10:55:04Z

While still having native powershell objects to work with:

PS /tmp> $converted.aString.GetType()      

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     String                                   System.Object

The Add-Member commandlet is called only once, so performance impact should be minimal. This may actually be faster, as we don't process the input object in any way when converting back to yaml.

@gaelcolas Thanks a lot for the Add-Member suggestion!

@storkzipisten Is #100 enough to fix the phone nr issue?

Folks, if you have any input in regards to #100, feel free to add it there. Will leave it open for a couple of days.

Merging this as is for now.

Ideally, in the future we'll have a custom de/serializer that will allow us to set tags on objects we craft in powershell, not just objects we get after de-serializing a yaml. For now, we do parse tags when de-serializing, and we save the original YamlDotNet object as a member in the return value. Child items do not have the extra member. The root node is saved only in the return value we get from ConvertFrom-Yaml.

This allows us (at the cost of using extra memory), to round-trip later if we specify the -Options RoundTrip to ConvertTo-Yaml.

Will have to find a better way to enable round-tripping, as we may mutate the object after de-serializing and that would basically be ignored if we just serialize it back. Will look into it more. For now, I'll just leave in the tag parsing support, which should fix the original issue reported here. When time permits, a proper de/serializer will be written.