number_of_partitions returns wrong results under the Windows Subsystem for Linux
soehms opened this issue · 28 comments
Currently, installing Sage under WSL 2 (Ubuntu 18.04) produces 8 doctests failures in sage.combinat.partitions.number_of_partitions for example:
File "src/sage/combinat/partitions.pyx", line 72, in sage.combinat.partitions.number_of_partitions
Failed example:
number_of_partitions( n - (n % 385) + 369) % 385 == 0
Expected:
True
Got:
False
...
File "src/sage/combinat/partitions.pyx", line 95, in sage.combinat.partitions.number_of_partitions
Failed example:
len([n for n in [1..500] if number_of_partitions(n) != Partitions(n).cardinality(algorithm='pari')])
Expected:
0
Got:
21
In more detail:
Failures for 9.1.rc0 and 9.0 (installed from binaries):
sage: [(i, number_of_partitions(i, algorithm='bober') - number_of_partitions(i)) for i in range(17000) if number_of_partitions(i, algorithm='bober') != number_of_partitions(i)]
[(241, 1),
(243, 1),
(244, 1),
(245, 1),
(246, 1),
(248, 1),
(250, 1),
(252, 1),
(253, 1),
(256, 2),
(258, 2),
(259, 3),
(262, 1),
(263, 1),
(264, 3),
(266, 4),
(267, 4),
(268, 1),
(269, 5),
(270, -1),
(271, -1),
(9894, -1),
(9908, -1),
(16564, -1),
(16573, -1),
(16672, -1),
(16771, -1),
(16789, -1),
(16798, -1),
(16802, -1),
(16816, -1),
(16820, -1),
(16825, -1),
(16914, -1),
(16942, -1),
(16946, -1),
(16951, -1),
(16969, -1)]
Failures for 8.1 (installed using apt):
sage: [(i, number_of_partitions(i, algorithm='bober') - number_of_partitions(i)) for i in range(17000) if number_of_partitions(i, algorithm='bober') != number_of_partitions(i)]
[(239, 1),
(242, 1),
(248, 2),
(249, 1),
(250, 1),
(251, 2),
(252, 1),
(253, 1),
(256, -1),
(258, 2),
(259, 3),
(260, 3),
(261, 4),
(262, -3),
(263, -4),
(264, 4),
(265, 6),
(266, -2),
(267, 4),
(268, 1),
(269, 5),
(270, -1),
(271, 7),
(9894, -1),
(9908, -1),
(16564, -1),
(16573, -1),
(16672, -1),
(16771, -1),
(16789, -1),
(16798, -1),
(16802, -1),
(16816, -1),
(16820, -1),
(16825, -1),
(16914, -1),
(16942, -1),
(16946, -1),
(16951, -1),
(16969, -1)]
See also discussions on sage-devel and ticket #28549.
CC: @sagetrac-tmonteil @tscrim @mezzarobba @jdemeyer @dimpase @fredrik-johansson @sagetrac-bober
Component: combinatorics
Keywords: number of partitions WSL
Author: Thierry Monteil
Branch/Commit: 2c37f3d
Reviewer: Matthias Koeppe
Issue created by migration from https://trac.sagemath.org/ticket/29600
This might be related to: microsoft/WSL#830 (long double floating point calculations give inexact results)
The ./src/sage/combinat/partitions_c.cc piece of C code was added into Sage in 2007. I guess it was invented because there was no fast enough existing wheel for that: the doctest of number_of_partitions function says it is *very* fast. Now, there is flint, which is still maintained and much faster:
sage: from sage.misc.sage_timeit import sage_timeit
sage: tf = lambda i : sage_timeit('_=number_of_partitions({})'.format(i), globals() , preparse=True, number=1, seconds=True)
sage: tb = lambda i : sage_timeit('_=number_of_partitions({}, algorithm="bober")'.format(i), globals() , preparse=True, number=1, seconds=True)
sage: [(i,tb(10^i)/tf(10^i)) for i in range(9)]
[(0, 1.076654167643153),
(1, 51.60920756939386),
(2, 51.13886939147276),
(3, 122.66713137190546),
(4, 471.61390056849905),
(5, 2741.5143212699077),
(6, 18715.222483310186),
(7, 170662.0538251338),
(8, 1930629.845842312)]
So, let me propose a backtrack version of Sage's motto: "To maintain the bike, forget your own old wheels, replace them with subsequently invented ones".
Why not just remove that piece of code ?
Note that arb and flint implementations are inspired by Bober Sage implementation, see https://arxiv.org/abs/1205.5991
+1
If everyone agree, I can do the removal (let me CC people involved in #24667).
-1
I think it is good to keep it for testing purposes. In this case, I am pretty sure it is exposing an underlying bug rather than an issue with this code itself as nearly all of the main computations are outsourced. There might be some issue with a rounding mode not being handled correctly (a la point 9 in #24667). I think this warrants further investigation rather than sweeping the issue under the rug (or maybe out the door is the better analogy, but you should check the house to make sure things are getting in some other way).
The bug is clearly in the ad-hoc float precision code in that file! It has ifdefs for the behavior of long double.
It doesn't appear to be ad-hoc but due to architecture differences and the code assuming a particular behavior. So if that is indeed the problem, this could be showing any code within Sage that is using long double. From this, I think it is related to this WSL issue.
cc-ing Fredrik to also get his thoughts on this.
I dont get the argument. I am not suggesting to remove a doctest, but the actual code, so that if there is a bug in that code, well, it will disapear as well. If the proposal is to keep a test for flint implementation (besides self-tests), we should perhaps compare with arb implementation instead, which is proven to be correct (or even Gap implementation, to be sure that it was written by unrelated developers), with explicit doctests. If the issue comes from a known WSL bug, i am not sure that this legacy code should be kept just to show another instance of it.
I won't be surprised if the bug in question also shows up on non-x86(_64) hardware. e.g. on ARM (which lacks that 80-bit extended precision).
+1 for removal of outdated code, which is better served by a maintained external library.
Replying to @tscrim:
... this could be showing any code within Sage that is using
long double.
That's a very short list of places.
./libs/mpfr/__init__.pxd:23: int mpfr_set_ld(mpfr_t rop, long double op, mpfr_rnd_t rnd)
./libs/mpfr/__init__.pxd:42: int mpfr_init_set_ld(mpfr_t rop, long double op, mpfr_rnd_t rnd)
./libs/mpfr/__init__.pxd:50: long double mpfr_get_ld(mpfr_t op, mpfr_rnd_t rnd)
./libs/mpfr/__init__.pxd:53: long double mpfr_get_ld_2exp(long *exp, mpfr_t op, mpfr_rnd_t rnd)
./libs/mpfr/__init__.pxd:120: int mpfr_cmp_ld(mpfr_t op1, long double op2)
./libs/mpc/__init__.pxd:27: int mpc_pow_ld(mpc_ptr, mpc_srcptr, long double, mpc_rnd_t)
./libs/mpc/__init__.pxd:48: int mpc_set_ld(mpc_ptr, long double, mpc_rnd_t)
./libs/mpc/__init__.pxd:49: int mpc_set_ld_ld(mpc_ptr, long double, long double, mpc_rnd_t)
./libs/gsl/math.pxd:36: long double GSL_MAX_LDBL(long double a, long double b)
./libs/gsl/math.pxd:37: long double GSL_MIN_LDBL(long double a, long double b)
./matrix/matrix_integer_dense.pyx:142: 'ld': 'long double',
./matrix/matrix_integer_dense.pyx:2810: - ``'ld'`` -- long doubles (fpLLL only)
./matrix/matrix_integer_dense.pyx:3039: - ``'ld'`` -- long doubles (fpLLL only)
I propose removing the 'bober' implementation and supporting algorithm='pari' instead.
Pari's numbpart() was improved in the recent past and it's now twice as fast as Bober's code on my computer.
I don't think it is fair to call it outdated code. Since Pari's implementation is at least as fast as this (so we can use that to test against each other) and everyone seems intent on just sweeping the problem away, then go ahead and remove it.
Removing sounds hard in my ears, as well. Wouldn't it be enough to print a warning on the first invocation of a functionality that uses long double? That could be done once per session which is running under WSL or some other platforms having similar problems.
Indeed, there are not many of them:
- function
number_of_partitionswith optionalgorithm='bober' - method
BKZof dense integer matrices with optionfp='ld' - method
LLLof dense integer matrices with optionfp='ld'
Such a warning should refer to the WSL issue 830 and make clear that wrong results are possible.
I don't see the point of maintaining obsolete code in Sage when there are multiple superior implementations in its standard libraries. It's one thing to have a 20-line reference implementation written in Sage, but in this case it's 2000 lines of C++ code that will be a future maintenance burden if it's a maintenance burden right now.
Don't get me wrong, this code was a terrific addition to Sage when Jon wrote it (lots of credit to him), but it has already more than served its purpose by inspiring even better implementations :-)
Of course, if Jon himself thinks the code should stay, I wouldn't argue with him.
Replying to @fredrik-johansson:
I don't see the point of maintaining obsolete code in Sage when there are multiple superior implementations in its standard libraries.
In principal I agree! But that's a more general question which applies to many other cases, as well. I think, it's not necessary to decide this in the context of this ticket. At the moment, we are not maintaining the code. We are maintaining a WSL issue.
What I want to say is: Shall we kill the messenger of the bad news, or shall we do something against the bad news?
Anyway, I would agree if everyone would like to take the occasion to get rid of that code.
Nobody has worked on fixing this code, so let's please remove it for 9.2. Any takers?
I can take that one.
I'm a bit late to this conversation.
In principle I agree that there is not much reason to leave this code in Sage given that it is so flaky and there is just about no good reason to use it.
On the other hand, if you actually want to trust WSL, it may still be worth trying to figure out what the actual problem is. The WSL ticket you point to seems to claim that floating point issues were fixed in WSL2, which is what you specifically mention as failing the tests. If they haven't been completely fixed then I would expect issues elsewhere.
If you get rid of the 4 lines starting at line 922, (long double ld_partial_sum = ...) and change level_two_precision to level_five_precision on line 873 (for (k = 1; current_precision > level_two_precision; k++)), I think you'll get a version that should work and doesn't use long doubles. (Maybe you get the same effect by setting level_two_precision = level_five_precision or long_double_precision = double_precision towards the top of the file, which is what appears to be done on some systems.)
I don't think I ever put in any of those ifdefs, and the ternary operator that checks if LDBL_MANT_DIG == 106 looks quite strange. (Also, the whole thing reads rather strangely if you don't know that once upon a time it used quad doubles and double doubles.)
Commit: 2c37f3d
I uploaded a branch that removes the code and redirects its uses to the faster flint implementation. To be complete, i am thinking of an additional refactor commit on top of that branch that merges the code of sage.combinat.partition.Partition_n.cardinality and sage.combinat.partition.number_of_partitions which are kind of duplicated (with some algorithms missing in the later), and adds some cross-algorithm testing (note that we have flint, pari and gap implementations).
Now, regarding whether we should merge that branch into Sage, i still do not understand how maintaining that code could benefit to the rest of Sage, even with respect to WSL2, and i do not understand how that piece of code could improve our trust of WSL2 with respect to the rest of Sage code. If the only point is to understand some floating-point subtleties of WSL2, then i am pretty sure that maintaining this code within Sage is not the right way (people who want to understand can think out of Sage source tree). If someone sees how this codes illustrates some WSL2 bug, it might be better to report upstream.
The ticket could be re-purposed to "Refactor the computation of the number of partitions".
New commits:
f566a3e | #29600 : remove partitions_c.cc C code |
2c37f3d | #29600 : remove sage.combinat.partitions.number_of_partitions and redirect calls to it |
Author: Thierry Monteil
Replying to @sagetrac-tmonteil:
I uploaded a branch that removes the code and redirects its uses to the faster
flintimplementation.
Looks good to me.
To be complete, i am thinking of an additional refactor commit on top of that branch that merges the code of
sage.combinat.partition.Partition_n.cardinalityandsage.combinat.partition.number_of_partitionswhich are kind of duplicated (with some algorithms missing in the later), and adds some cross-algorithm testing (note that we haveflint,pariandgapimplementations).
I would suggest to use a follow-up ticket for this.
Now, regarding whether we should merge that branch into Sage, i still do not understand how maintaining that code could benefit to the rest of Sage [...]
I would think it's definitely out of the scope of the Sage project to investigate hypothetical errors of other upstream libraries on WSL2 because of ABI problems.
Reviewer: Matthias Koeppe
If the patchbot light shows green, you can set this to positive review on my behalf
The patchbot doctest failure seems unrelated.