PDLPorters/pdl-linearalgebra

0.27 fails to build on i386 due to test failures

sebastic opened this issue · 4 comments

The Debian package build for 0.27 failed on i386 due to test failures:

https://buildd.debian.org/status/package.php?p=libpdl-linearalgebra-perl

From the buildlog

PERL_DL_NONLAZY=1 "/usr/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(1, 'blib/lib', 'blib/arch')" t/*.t

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 10.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 11.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 12.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 13.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 14.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 15.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 16.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 17.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 18.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# Looks like you failed 9 tests of 237.
t/1.t ....... 
[...]
ok 6 - mschur
not ok 7 - native complex mschur
ok 8 - mschur
not ok 9 - native complex mschur
ok 10 - mschur
not ok 11 - native complex mschur
ok 12 - mschur
not ok 13 - native complex mschur
ok 14 - mschur
not ok 15 - native complex mschur
ok 16 - mschur
not ok 17 - native complex mschur
ok 18 - mschur
not ok 19 - native complex mschur
ok 20 - mschur
not ok 21 - native complex mschur
ok 22 - mschur
not ok 23 - native complex mschur
[...]
Dubious, test returned 9 (wstat 2304, 0x900)
Failed 9/237 subtests 
[...]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 10.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 11.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 12.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 13.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 14.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 15.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 16.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 17.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 18.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]
# Looks like you failed 9 tests of 117.
t/legacy.t .. 
[...]
not ok 7 - PDL::Complex mschur
not ok 8 - PDL::Complex mschur
not ok 9 - PDL::Complex mschur
not ok 10 - PDL::Complex mschur
not ok 11 - PDL::Complex mschur
not ok 12 - PDL::Complex mschur
not ok 13 - PDL::Complex mschur
not ok 14 - PDL::Complex mschur
not ok 15 - PDL::Complex mschur
[...]
Dubious, test returned 9 (wstat 2304, 0x900)
Failed 9/117 subtests 

Test Summary Report
-------------------
t/1.t     (Wstat: 2304 Tests: 237 Failed: 9)
  Failed tests:  7, 9, 11, 13, 15, 17, 19, 21, 23
  Non-zero exit status: 9
t/legacy.t (Wstat: 2304 Tests: 117 Failed: 9)
  Failed tests:  7-15
  Non-zero exit status: 9
Files=4, Tests=404,  3 wallclock secs ( 0.06 usr  0.02 sys +  2.60 cusr  0.14 csys =  2.82 CPU)
Result: FAIL
Failed 2/4 test programs. 18/404 subtests failed.

Full buildlog

Thanks for the report! This captures the heart of it:

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 11.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

The Schur decomposition of matrix A (I've just had to finally actually learn what that is) finds Q and U so that:

A = Q x U x Q^-1

mschur uses (for its complex form) LAPACK's zgees. As we can see above, U (an upper-triangular matrix) has similar values on i386 to what it has on x64, except the sign is switched. The test suite currently doesn't capture e.g. both Q and U to actually try reconstituting A to see if it's correct, though possibly it should.

A quick check shows Debian uses ATLAS as the LAPACK it links this library with (for my ref: https://salsa.debian.org/perl-team/modules/packages/libpdl-linearalgebra-perl/-/blob/master/debian/control). Would it be easy for you to, on i386, run this and tell the results here? (results from my x64 box with reference LAPACK shown)

$ perldl
pdl> use PDL::LinearAlgebra
pdl> p $A = pdl([0.43,0.03],[0.75,0.72])->r2C

[
 [0.43 0.03]
 [0.75 0.72]
]

pdl> p +($U, undef, $Z) = $A->mschur(1)

[
 [0.366373539549749             -0.72]
 [                0 0.783626460450251]
]
 [0.366373539549749 0.783626460450251] # just the eigenvalues, not important as same as diagonal of $U
[
 [ 0.426473531380713  0.904500042582455]
 [-0.904500042582455  0.426473531380713]
]

With that I can see if the $Z is genuinely different but still mathematically valid, or if something truly weird is going on (quite unlikely but best to be sure).

In an i386 chroot with pdl (1:2.077-1+b1) & libpdl-linearalgebra-perl (0.26-2+b1):

pdl> use PDL::LinearAlgebra

pdl> p $A = pdl([0.43,0.03],[0.75,0.72])->r2C

[
 [0.43 0.03]
 [0.75 0.72]
]

pdl> p +($U, undef, $Z) = $A->mschur(1)
Can't locate object method "initialize" via package "PDL::Complex" at /usr/lib/i386-linux-gnu/perl5/5.34/PDL/Core.pm line 2288, <STDIN> line 3.

And with pdl (1:2.077-1+b1) & libpdl-linearalgebra-perl (0.27-1) built without running tests:

pdl> use PDL::LinearAlgebra

pdl> p $A = pdl([0.43,0.03],[0.75,0.72])->r2C

[
 [0.43 0.03]
 [0.75 0.72]
]

pdl> p +($U, undef, $Z) = $A->mschur(1)

[
 [0.366373539549749              0.72]
 [                0 0.783626460450251]
]
 [0.366373539549749 0.783626460450251] 
[
 [-0.426473531380713  0.904500042582456]
 [ 0.904500042582456  0.426473531380713]
]

Thank you! So the Z I was seeing on x64 is:

[
 [ 0.426473531380713  0.904500042582455]
 [-0.904500042582455  0.426473531380713]
]

while on i386:

[
 [-0.426473531380713  0.904500042582456]
 [ 0.904500042582456  0.426473531380713]
]

Not very surprisingly, Z x U x Z' correctly multiplies out to the original A. Evidently, negating the top-right value of U means one has to negate the left-hand column of Z. That's probably very obvious to someone with basic matrix-maths competence, unlike me. I'll make sure the test can accommodate the alternative answer.

For my own education (this is simplified by z2 and z3 in m3 being identical rather than conjugated, since the imaginary values in this scenario are 0):

A = Z U Z'

            "m1"    "m2"   "m3"
[a1 a2    [z1 z2  [u1 u2  [z1 z3
 a3 a4] =  z3 z4]   0 u4]  z2 z4]

m1 m2 = [z1u1 z1u2+z2u4
         z3u1 z3u2+z4u4]

m1m2 m3 = [(z1u1)z1+(z1u2+z2u4)z2 (z1u1)z3+(z1u2+z2u4)z4
           (z3u1)z1+(z3u2+z4u4)z2 (z3u1)z3+(z3u2+z4u4)z4]

The 0 in U is because it's upper-triangular, and it really clarifies the last bit: when you negate u2, that self-evidently means you need to also negate both z1 and z3 in order to retain the same result, as z1, z3 and u2 only appear as multiplicative pairs.