OpusVL/perldoc.perl.org

substr(undef, 0, 1)

philiprbrenan opened this issue · 2 comments

Please consider modifying:

https://perldoc.perl.org/functions/substr.html

to note the surprising result that:

substr(undef, 0, 1) eq q()

as in:

./perl -e 'print "!!!\n" if substr(undef, 0, 1) eq q();'
!!!

./perl -v
This is perl 5, version 31, subversion 6 (v5.31.6) built for x86_64-linux

One might think that adding -w would help - but instead it makes things much worse as the warning is not reported until an attempt is made to use the result of the substr operation - possibly many light years away as in:

https://stackoverflow.com/questions/31567699/why-does-encode-raise-use-of-uninitialized-value-within

tm604 commented

Any string-related function - lc, uc, fc, index etc. - will stringify its input, which would convert undef to an empty string. This causes a warning (assuming those warnings are enabled). Since this isn't unique to substr, it'd need to be repeated for every one of these functions to be consistent.

The warning happens as soon as substr is called, though:

$ perl -e'use strict; use warnings; my $rslt = substr undef, 0, 1;'
Use of uninitialized value in substr at -e line 1.

Do you have an example where that's happening later? The linked stackoverflow post is talking about the Encode.pm handling specifically, and https://stackoverflow.com/a/31568522 also shows an example where substr warns directly.

 1 use warnings;substr
 2 use strict;
 3 
 4 sub SomeOtherPackage::aaa($)
 5  {my ($s) = @_;
 6   my $a = $s;
 7  }
 8 
 9 SomeOtherPackage::aaa(substr(undef, 0, 1));
10 SomeOtherPackage::aaa(lc undef);

Please note that the above code results in confusing error messages:

Use of uninitialized value within @_ in list assignment at /home/phil/perl/z/substr/bug.pl line 5.
Use of uninitialized value in lc at /home/phil/perl/z/substr/bug.pl line 10.

For substr the error is reported only at line 5 not line 9 as would be
expected. The error message makes no mention of substr.

lc does not exhibit this problem, the error is reported correctly as occurring
at line 10 and mentions lc.

The discrepancy occurs because of the optimization of substr as noted in the discussion of encode. The results look as if the problem is in the called module when the fault is local. This wastes the time of the person who developed the called module because it looks as if they are responsible for the problem when, in fact, they are not.

May I therefore urge that this unique behavior of substr should be fully documented in the description of the substr function and not elsewhere?