08-screen.t fails with PERL_UNICODE and 5.22.1
Opened this issue · 11 comments
Migrated from rt.cpan.org #111143 (status was 'open')
Requestors:
Attachments:
From cjm@cpan.org (@madsen) on 2016-01-13 05:38:40:
With Perl 5.22.1, LANG=en_US.UTF-8, and PERL_UNICODE=SAL, 08-screen.t fails. It succeeds if I set either LANG=C or PERL_UNICODE=0.
With Perl 5.20.2, the test passes even with PERL_UNICODE=SAL.
I've attached the result of running
perl5.22.1 t/00-report-prereqs.t &>/tmp/00-report-prereqs.txt
along with the results of 00-report-prereqs.t for both 5.20.2 and 5.22.1.
From cjm@cpan.org (@madsen) on 2016-01-13 05:39:29:
1..1
#
# Versions for all modules listed in static metadata (including optional ones):
#
# === Configure Requires ===
#
# Module Want Have
# -------------------- ---- ----
# Dist::CheckConflicts 0.02 0.11
# ExtUtils::MakeMaker any 6.98
#
# === Test Requires ===
#
# Module Want Have
# ------------------- ---- --------
# Data::Dumper any 2.154
# Exporter any 5.71
# ExtUtils::MakeMaker any 6.98
# File::Spec any 3.48_01
# File::Temp any 0.2304
# FindBin any 1.51
# Getopt::Long any 2.42
# IO::File any 1.16
# IPC::Run3 any 0.048
# PerlIO any 1.09
# Test::Fatal any 0.014
# Test::More 0.96 1.001014
# Test::Requires any 0.08
# lib any 0.63
# utf8 any 1.13_01
#
# === Test Recommends ===
#
# Module Want Have
# ---------- -------- --------
# CPAN::Meta 2.120900 2.143240
#
# === Runtime Requires ===
#
# Module Want Have
# ------------------------ ---- ------
# Carp any 1.3301
# Devel::GlobalDestruction any 0.13
# Dist::CheckConflicts 0.02 0.11
# Encode any 2.70
# Fcntl any 1.11
# IO::Handle any 1.35
# Module::Runtime any 0.014
# Params::Validate 1.03 1.18
# Scalar::Util any 1.41
# Sys::Syslog 0.28 0.33
# base any 2.22
# strict any 1.08
# warnings any 1.23
#
ok 1
From ether@cpan.org (@karenetheridge) on 2016-01-13 06:00:55:
relevant: http://www.dagolden.com/index.php/1771/why-perl_unicode-makes-me-sad/
From cjm@cpan.org (@madsen) on 2016-01-13 06:09:39:
On Wed, Jan 13, 2016 12:00:55 AM, ETHER wrote:
relevant: http://www.dagolden.com/index.php/1771/why-perl_unicode-makes-me-sad/
Possibly, although I don't use D, just S(tandard IO handles) and A(rgv). (L means that SA applies only if the locale is UTF-8, which is why LANG=C makes the test pass.)
From drolsky@cpan.org (@autarch) on 2016-01-13 06:16:27:
On Wed Jan 13 01:09:39 2016, CJM wrote:
On Wed, Jan 13, 2016 12:00:55 AM, ETHER wrote:
relevant: http://www.dagolden.com/index.php/1771/why-perl_unicode-
makes-me-sad/Possibly, although I don't use D, just S(tandard IO handles) and
A(rgv). (L means that SA applies only if the locale is UTF-8, which
is why LANG=C makes the test pass.)
It'd probably the S causing the problem.
That said, I don't think running arbitrary tests with env vars that change the Perl interpreter's behavior should be expected to work.
From cjm@cpan.org (@madsen) on 2016-01-13 06:26:59:
As xdg mentions in that article, it's useful to test modules the way they'll be used.
I'm of the opinion that a test should clean up its environment if it's doing something tricky that breaks if certain variables are set. In this case, the attached patch will work.
From dagolden@cpan.org (@dagolden) on 2016-01-13 15:08:48:
On Wed Jan 13 01:16:27 2016, DROLSKY wrote:
That said, I don't think running arbitrary tests with env vars that
change the Perl interpreter's behavior should be expected to work.
People put all sorts of stuff in their environment, so tests ought to reset any that cause problems. This is no different than PERL5LIB, PERL5OPT, etc. Or setting up a temporary HOME directly to avoid automatically picking up config files.
PERL_UNICODE is just a particularly uncommon one. :-/
From drolsky@cpan.org (@autarch) on 2016-01-13 15:11:05:
On Wed Jan 13 10:08:48 2016, DAGOLDEN wrote:
On Wed Jan 13 01:16:27 2016, DROLSKY wrote:
That said, I don't think running arbitrary tests with env vars that
change the Perl interpreter's behavior should be expected to work.People put all sorts of stuff in their environment, so tests ought to
reset any that cause problems. This is no different than PERL5LIB,
PERL5OPT, etc. Or setting up a temporary HOME directly to avoid
automatically picking up config files.PERL_UNICODE is just a particularly uncommon one. :-/
This is something that should be done in Test::More or something like that, then. Do we really expect every CPAN module to add boilerplate to every test to reset every possible env var that someone might set?
I'm of course fine with patching this test now that we know the issue and the simple fix, but as a general rule I think that if you run tests with random perl env vars set, you should expect random failures.
From ether@cpan.org (@karenetheridge) on 2016-01-13 17:50:36:
On 2016-01-13 07:11:05, DROLSKY wrote:
This is something that should be done in Test::More or something like
that, then. Do we really expect every CPAN module to add boilerplate
to every test to reset every possible env var that someone might set?I'm of course fine with patching this test now that we know the issue
and the simple fix, but as a general rule I think that if you run
tests with random perl env vars set, you should expect random
failures.
Having Test::More reset the variables entirely defeats the point of
being able to test a CPAN module working with particular variables set --
which is something an application may wish to do (providing, of course,
that all the CPAN modules it uses actually does work with those variables,
which is what this patch is about).
This isn't a random variable -- it's one that affects the behaviour of
filehandle operations -- and so it's not unreasonable to try to make this
module work when the variable is set to various values.
I see it as similar to making a module work in various locales, or with
different filesystems. It may be an implementation variant that the author
hadn't thought of, or he doesn't have configured by default on his development
system, but that's why there are smokers, as well as a large contingent of
helpful users that send patches when edge cases are discovered :D
(I'm also now pondering which of my modules do interesting things with filehandles
that might need testing with PERL_UNICODE...)
From drolsky@cpan.org (@autarch) on 2016-01-13 21:16:21:
On Wed Jan 13 12:50:36 2016, ETHER wrote:
Having Test::More reset the variables entirely defeats the point of being able to test a CPAN module working with particular variables set -- which is something an application may wish to do (providing, of
course, that all the CPAN modules it uses actually does work with those
variables, which is what this patch is about).
Good point, except that this patch explicitly breaks your ability to test this module with the env var in question! In fact, it will mislead you into thinking that this will work when it fact it will not.
This isn't a random variable -- it's one that affects the behaviour of filehandle operations -- and so it's not unreasonable to try to make this module work when the variable is set to various values.
There's a limit to how many different environments a module can be expected to work in. An environment that globally messes with every single filehandle is likely to break a lot of CPAN modules. This seems like one of those "use at your own risk or with lots of testing things".
That said, if someone wants to patch the module (not the test) to make it actually do the right thing here, I would happily apply such a patch. I'm not sure exactly what the right thing is or why it breaks though. I'm guessing there some sort of double encoding issue going on.
Also, this is testing a "utf8" option that you would want to not pass if you set this global env var in all your code. So maybe just a doc patch?
From mauke@cpan.org (@mauke) on 2016-02-27 16:52:21:
On Wed Jan 13 16:16:21 2016, DROLSKY wrote:
On Wed Jan 13 12:50:36 2016, ETHER wrote:
Having Test::More reset the variables entirely defeats the point of being able to test a CPAN module working with particular variables set -- which is something an application may wish to do (providing, of
course, that all the CPAN modules it uses actually does work with those variables, which is what this patch is about).Good point, except that this patch explicitly breaks your ability to test this module with the env var in question! In fact, it will mislead you into thinking that this will work when it fact it will not.
This isn't a random variable -- it's one that affects the behaviour of filehandle operations -- and so it's not unreasonable to try to make this module work when the variable is set to various values.
There's a limit to how many different environments a module can be expected to work in. An environment that globally messes with every single filehandle is likely to break a lot of CPAN modules. This seems like one of those "use at your own risk or with lots of testing things".
That said, if someone wants to patch the module (not the test) to make it actually do the right thing here, I would happily apply such a patch. I'm not sure exactly what the right thing is or why it breaks though. I'm guessing there some sort of double encoding issue going on.
Also, this is testing a "utf8" option that you would want to not pass if you set this global env var in all your code. So maybe just a doc patch?
I don't know about this problem but there's a kind of similar bug in DBI:
https://rt.cpan.org/Ticket/Display.html?id=71341
That one is caused by sending binary data over STDIN/STDOUT without calling binmode() on them. Which breaks with PERL_UNICODE=S because that makes the STD* handles start out as ":utf8".
Eight years later ...
Log::Dispatch tests are still failing (and writing mojibake to the screen):
t/binmode.t ................................ ok
t/close-after-write.t ...................... ok
t/email-exit.t ............................. skipped: This test only runs for the maintainer
t/file-locked.t ............................ ok
t/lazy-open.t .............................. ok
# Failed test 'got expected stdout from Screen output'
# at t/screen.t line 95.
# got: 'test message - á½ '
# expected: 'test message - á½ '
# Looks like you failed 1 test of 2.
# Failed test 'stderr = 0, utf8 = 1'
# at t/screen.t line 104.
# Failed test 'got expected stderr from Screen output'
# at t/screen.t line 101.
# got: 'test message - á½ '
# expected: 'test message - á½ '
# Looks like you failed 1 test of 2.
# Failed test 'stderr = 1, utf8 = 1'
# at t/screen.t line 104.
# Looks like you failed 2 tests of 5.
t/screen.t .................................
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/5 subtests
t/short-syntax.t ........................... ok
t/syslog-lock-without-preloaded-threads.t .. ok
t/syslog-threads.t ......................... ok
t/syslog.t ................................. ok
Test Summary Report
-------------------
t/screen.t (Wstat: 512 (exited 2) Tests: 5 Failed: 2)
Failed tests: 4-5
Non-zero exit status: 2
Files=13, Tests=84, 4 wallclock secs ( 0.05 usr 0.01 sys + 2.82 cusr 1.31 csys = 4.19 CPU)
Result: FAIL
Failed 1/13 test programs. 2/84 subtests failed.
The underlying cause of the problem is now understood: The test writes binary data to a file handle (STDOUT/STDERR) without switching it to binary mode first (with binmode()
), which implicitly leaves the handle in text mode. Thus (re-)encoding happens on some platforms.
Meanwhile, the DBI issue I mentioned above did turn out to be a very similar problem. A patch adding the missing binmode()
calls was applied (perl5-dbi/dbi#32) and has been working fine since DBI 1.635 (released in April 2016).
That said, if someone wants to patch the module (not the test) to make it actually do the right thing here, I would happily apply such a patch. I'm not sure exactly what the right thing is or why it breaks though. I'm guessing there some sort of double encoding issue going on.
Hard to do without breaking the existing documented behavior:
utf8 (0 or 1)
If this is true, then the output will be encoded using the UTF-8 encoding. If you have already applied an encoding layer to the relevant filehandle, "STDOUT" or "STDERR", then your output will end up double-encoded if this is true.
This defaults to false.
Morally, the right thing to do is to call binmode(STDOUT)
or binmode(STDERR)
in Log::Dispatch::Screen if utf8
mode is enabled because then the module tries to write binary data to a handle, so the handle better be in binary mode.
Also, this is testing a "utf8" option that you would want to not pass if you set this global env var in all your code. So maybe just a doc patch?
It's an environment variable, so people set it in their environment, not in code. In particular, this issue affects third-party applications that use Log::Dispatch internally, not just my own code.
Would you be willing to accept a patch that calls binmode
in Log::Dispatch::Screen? If not, I feel the right thing to do is document that setting utf8 => 1
with Log::Dispatch::Screen requires the programmer to call binmode
manually beforehand in any code that uses the option (and then do just that in t/screen.t).