ingydotnet/yaml-pm

Load(Dump($perl)) roundtrip causes errors: escaped double quote

hadjiprocopis opened this issue · 6 comments

I am not sure whether this is actually an issue. But roundtripping a perl data structure to a YAML string and back to a Perl data structure causes errors when a string starts with an escaped double quote and contains a single quote (perhaps there are other cases):

use Test::More;
use Test2::Plugin::UTF8;
use YAML;
use Data::Dumper;

$Data::Dumper::Useperl = 1;
$Data::Dumper::Useqq='utf8';

my $perl = [
	{
		"\"aaa'bbb" => "aaa",
		"bbb" => 1,
	}
];
my $yamlstr = eval { Dump($perl) };
ok(defined($yamlstr), "YAML::Dump() : called and got defined result.")
  or BAIL_OUT(Dumper($perl)."above data structure failed for YAML::Dump(): $@");
my $pd = eval { Load($yamlstr) };
ok(defined($pd), "YAML::Load() : called and got defined result.")
  or BAIL_OUT(Dumper($perl)."and YAML string:\n$yamlstr\nabove data structure failed for YAML::Load(): $@");

done_testing;
...
above data structure failed for YAML::Load(): YAML Error: Inconsistent indentation level
   Code: YAML_PARSE_ERR_INCONSISTENT_INDENTATION
   Line: 3
   Document: 1
 at /opt/perlbrew/perls/perl-5.36.0-O3/lib/site_perl/5.36.0/YAML/Loader.pm line 804.
...

I have discovered this when testing another module (Data::Roundtrip) which uses YAML with a random data structure generator and hundreds of trials to check a lot of corner cases.

I used something like this:

use Test::More;
use Test2::Plugin::UTF8;

use Data::Random::Structure;
use Data::Random::Structure::UTF8;
use YAML;
use Data::Dumper;

$Data::Dumper::Useperl = 1;
$Data::Dumper::Useqq='utf8';

my $randomiser = Data::Random::Structure->new(
	max_depth => 50,
	max_elements => 200,
);
my $randomiser_utf8 = Data::Random::Structure::UTF8->new(
	max_depth => 50,
	max_elements => 200,
);
for my $trials (1..1000){
  for my $perl_data_structure (
	$randomiser->generate(),
	$randomiser_utf8->generate()
  ){
	my $yamlstr = eval { Dump($perl_data_structure) };
	ok(defined($yamlstr), "YAML::Dump() : called and got defined result.")
          or BAIL_OUT(Dumper($perl_data_structure)."above data structure failed for YAML::Dump().");
	my $pd = eval { Load($yamlstr) };
	ok(defined($pd), "YAML::Load() : called and got defined result.")
          or BAIL_OUT(Dumper($perl_data_structure)."and YAML string:\n$yamlstr\nabove data structure failed for YAML::Load().");
  }
}

done_testing;

I am not sure if this is really an issue.

Bottomline is Load(Dump($perl)) fails for specific data structures.

It doesn't surprise me.

It seems I could fix it in #225, but no test so far.
There is other stuff I have to do which is more important, so I would be glad if someone else wants to add a test.

However, YAML.pm has many more problems, and not all are that easy to fix, or close to impossible without a major rewrite.
But it is not necessary, as there is a replacement. It is recommended to use YAML::PP instead.

AFAIK @ingydotnet at some point wants to update the YAML.pm docs, among other things to add a link to YAML::PP.

The docs were now updated in 1.31: https://metacpan.org/release/INGY/YAML-1.31

@hadjiprocopis I just stumbled upon your module https://metacpan.org/pod/Data::Roundtrip. I guess the issue was about that.
I noticed a couple of things:

  • The requirements of the module still say that YAML.pm is required, and YAML::PP only for building. I think that's wrong. edit: sorry, I think I was looking at an older version
  • Also the docs say:

A valid Perl variable may kill YAML::PP::Load because of escapes and quotes. For example this:

my $yamlstr = <<'EOS';
---
- 682224
- "\"w": 1
EOS
my $pv = eval { YAML::PP::Load($yamlstr) };
if( $@ ){ die "failed(1): ". $@ }
# it's dead

and I assume this was true for YAML.pm, but not anymore for YAML::PP.

thank you for spotting this. APOLOGIES. It is an error I made in Data::Roundtrip's documentation. I wanted to say YAML and not YAML::PP. I have corrected that now (v0.26)

Inside Data::Roundtrip's xt/deficiencies there are 3 testfiles for testing this edge case on all 3: YAML, YAML::PP and YAML::XS. Both YAML::PP and YAML::XS pass the tests. And that is true for quite some time now, i.e not just the last version. There is no problem with YAML::PP as far as I am concerned and this is what I am currently using internally.

I think I mixed my YAMLs :)

No need to apologize! I already assumed it was a simple typo; just wanted to note it :)