test_check_function[cfcf71] fails on 32bit openSUSE

Question

test_check_function[cfcf71] fails on 32bit openSUSE

Closed this issue 2 years ago · 27 comments

When running test suite (yes, perhaps you are right in #39 and it is an hopeless effort) on 32bit machine (see
build log) I get this one test failing:

======================== FAILURES ===================================
_________________________ test_check_function[cfcf71] __________________________

cfcf = ({'n2min_alpha': array([[3.18312465e-04, 3.13145844e-04, 9.01115850e-05],
       [3.18142757e-04, 3.13790954e-04, 6.43...           nan]]), 'Ipt_second_deriv': array([15], dtype=int32)}, <check_functions.FunctionCheck object at 0xaa1a6f4c>)

    def test_check_function(cfcf):
        cv, cf, mfunc = cfcf
        mfunc.run(locals())
        if mfunc.exception is not None or not mfunc.passed:
            print('\n', mfunc.name)
            print('  ', mfunc.runline)
            print('  ', mfunc.testline)
            if mfunc.exception is None:
                mfunc.exception = ValueError('Calculated values are different from the expected matlab results.')
>           raise mfunc.exception
E           ValueError: Calculated values are different from the expected matlab results.

gsw/tests/test_check_functions.py:62: ValueError
----------------------------- Captured stdout call -----------------------------

 pt_second_derivatives
   cf.pt_SA_SA, cf.pt_SA_CT, cf.pt_CT_CT = pt_second_derivatives(cv.SA_chck_cast,cv.CT_chck_cast)
   cf.Ipt_second_deriv = find((abs(cv.pt_SA_SA - cf.pt_SA_SA) >= cv.pt_SA_SA_ca  ) | ( abs(cv.pt_SA_CT - cf.pt_SA_CT) >= cv.pt_SA_CT_ca ) | ( abs(cv.pt_CT_CT - cf.pt_CT_CT) >= cv.pt_CT_CT_ca))
===================== 1 failed, 162 passed in 3.65 seconds =====================
error: Bad exit status from /var/tmp/rpm-tmp.0NSNJz (%check)

Answer 1 · 2018-07-26T14:00:19.000Z

We would need to run the tests in a "more verbose" mode to check the values. My guess is that the diff tolerance would need to be adjusted there and those are not real failures. (I may be wrong though, we never tested on 32 bit!)

Answer 2 · 2018-07-26T17:59:45.000Z

What do you get if you build the underlying C library (https://github.com/TEOS-10/GSW-C) and run its test function?

Answer 3 · 2018-07-27T05:59:16.000Z

Sorry, we don't have that packaged, and I don't have time right now to hunt some proper 32bit VMs to play with it.

Answer 4 · 2019-02-16T09:46:50.000Z

(I may be wrong though, we never tested on 32 bit!)

So would you say that 32bit is not (officially) supported?

Answer 5 · 2019-02-16T16:46:35.000Z

Yes, unless someone steps forward to handle both 32-bit testing and solving any problems that arise, it will not be supported. I'm surprised that even one test failure has turned up, given that the calculations are all explicitly double precision, which should not depend on 32 vs 64 bit architecture. I would be happy to see someone track this down and supply a PR to fix it, if it really is in the GSW code (probably in GSW-C, if it is), and not in some other arcane detail of the test VM on which the failure occurred.

Answer 6 · 2019-02-20T03:03:47.000Z

I did some digging into this one and would like to summarize what I found with some options for what we might do.

The Journey

Using the i586 version of openSUSE/Tumbleweed, I was able to reproduce the reported test failure in the python version of GSW. Compiling the GSW-C version directly also produced a failure with gsw_check specifically in the pt_second_derivatives function. I also attempted to reproduce the error the following:

32 bit ubuntu 14.04 VM (i686)
32 bit ubuntu docker (i686)
setting the -m32 flag on the above environments
setting the -m32 flag when compiling on my Mac (10.14.3) using llvm

I then started to mess with the compilation flags for GSW-C in the openSUSE VM where I could reproduce the failure. When compiled with -O0, all the check functions pass. In an attempt to figure out which optimization was causing the issue, I manually enabled all the flags listed under -O0 of the Optimize-Options page of GCC, but the built library still passed all the gsw_check functions. I couldn't even reproduce when I used GCC itself to output all the optimization flags it was using.

Some googling for how optimization levels might change doubles I was lead to a stack overflow question which suggested the -ffloat-store flag. Enabling this on the 32bit openSUSE VM allowed all check functions to pass. Using a command such as CFLAGS=-ffloat-store pip install . when in the GSW-Python source directory also resulted in an installation which passes all tests.

One more bit of information, at the end of this digging, I finally noticed that the failing function was at the GSW_ERROR_LIMIT for this function. If I changed the check function (in GSW-C) to look for > instead of >=, all the tests pass.

What should we do?

I see some options:

Enable the -ffloat-store flag upstream.
Change the check functions to look for things only greater than the allowed limit, not greater than or equal to.
Have the openSUSE maintainer enable the -ffloat-store flag for their i586 build.
Set this as a won't fix.

Option 1 has some performance implications I don't know, but some quick searching shows that it will make things slower, and is unnecessary for most of the systems I can get my hands on. That said, when I hacked the gsw_check to always output the diff, having the flag enabled really tightened up the differences (most became 0). But, with the option not enabled, the differences were much less than the precision of the instruments, as such, I favor speed in this case.

I think no matter what happens, we might just want to say that 32bit is not officially supported.

Answer 7 · 2019-02-20T04:04:41.000Z

@DocOtak, that is an impressive investigation. Thank you.

It looks like some (at least one) of the most recent matlab package check value tolerances differ from what we are using. Three values are used in the failing test. The current Matlab file:

In [3]: xx = matfile.loadmatbunch('gsw_data_v3_0.mat')

In [5]: xx.gsw_cv['pt_SA_SA_ca']
3.122502256758253e-14

In [6]: xx.gsw_cv['pt_SA_CT_ca']
1.1991276027689679e-14

In [7]: xx.gsw_cv['pt_CT_CT_ca']
4.440892098500626e-14

What we are using:

In [19]: cv['pt_SA_SA_ca'].item()
1.734723475976807e-14

In [20]: cv['pt_SA_CT_ca'].item()
1.1991276027689679e-14

In [21]: cv['pt_CT_CT_ca'].item()
4.440892098500626e-14

Notice that only one of these values ('SA_SA') has changed, and it is much larger than the one we are using. Evidently we need to update our check values to match. Maybe this is the only one that has changed--I haven't tried to check that.

Answer 8 · 2019-02-20T04:59:18.000Z

Looks like I failed to say that I could only reproduce this problem on the openSUSE system, none of the other 32-bit systems I tried would have this issue. Also, I didn't dig into any differences in i586 vs i686 and other arch issues.

@efiring sounds like we might just want to update the check data... the 1.73[...]e-14 value was the one being given by the gsw_check tool in GSW-C. My guess would be that this issue would be fixed by doing that.

Answer 9 · 2019-02-20T07:05:13.000Z

Unfortunately, it turns out that a wholesale updating of the check values and tolerances breaks 2 other tests.

Answer 10 · 2019-02-26T19:41:08.000Z

Does it break them by much? I've been thinking about this and have some questions:

How are these check values and tolerances made?
Would this need to be fixed in the upstream GSW-C?
For this specific issue, is there concern that the library will produce the "wrong answer" for people using the 32bit openSUSE install?

I had some other question... but in the middle of writing this lost track of what it was... I guess a general "what do we do with this information?"

Answer 11 · 2019-02-26T21:02:53.000Z

On 2019/02/26 9:41 AM, Andrew Barna wrote: Does it break them by much? I've been thinking about this and have some questions: * How are these check values and tolerances made?

Paul Barker will have to answer that. It might be documented somewhere. I presume the check values are simply whatever the current matlab code yields, and the tolerances are the output of some algorithm that estimates how much floating-point slop there might be from one implementation to another.

* Would this need to be fixed in the upstream GSW-C?

Yes, I think we will find that the tolerance is incorrect in GSW-C, because I don't think the numbers used there are any more recent than the ones we are using in GSW-Python. I think they are identical, but I'm not positive.

* For this specific issue, is there concern that the library will produce the "wrong answer" for people using the 32bit openSUSE install?

No, the problem appears to be simply a single incorrect tolerance, not an error in the calculation. But with respect to the wholesale updating of test and check values: if we want to do that, it will probably require changes in 2 routines, almost certainly at the C level. The first steps would be to see what the magnitude and nature of the new "failures" is, and to figure out what has changed in the matlab version to which the new test and check values apply.

I had some other question... but in the middle of writing this lost track of what it was... I guess a general "what do we do with this information?"

Nothing here is urgent at all; and dealing with this is much harder than it would need to be. Eric

Answer 12 · 2020-08-20T08:55:32.000Z

It fails the same way on aarch64. But -ffloat-store workaround does not work there.

Answer 13 · 2020-08-20T17:32:29.000Z

@ggardet thanks for the report.

I'm wonder what the implications are for the recently announced Apple transition to an ARM architecture. Same with a major cloud provider (AWS Graviton).

Answer 14 · 2020-08-20T18:31:34.000Z

I'm wonder what the implications are for the recently announced Apple transition to an ARM architecture.

They cannot run Fortran on that yet so most of the scientific stack won't have SciPy and numpy. Numpy can be Fortran free by removing f2py and linking against a non-Fortran blas but that is not the usual way, most of numpy variants out there are openblas or mkl, both rely on Fotran.

Answer 15 · 2020-08-20T20:17:39.000Z

For reference, the aarch64 failure is reproducible in a docker container of the right architecture. I think it uses qemu for emulating arm on intel.

pytest FAILURES

================================================================================================================================ FAILURES ================================================================================================================================
______________________________________________________________________________________________________________________ test_check_function[cfcf72] _______________________________________________________________________________________________________________________

cfcf = ({'n2min_alpha': array([[3.18312465e-04, 3.13145844e-04, 9.01115850e-05],
       [3.18142757e-04, 3.13790954e-04, 6.43...e-05,             nan]]), 'Ipt_second_deriv': array([ 7, 24])}, <check_functions.FunctionCheck object at 0x400f983880>)

    def test_check_function(cfcf):
        cv, cf, mfunc = cfcf
        mfunc.run(locals())
        if mfunc.exception is not None or not mfunc.passed:
            print('\n', mfunc.name)
            print('  ', mfunc.runline)
            print('  ', mfunc.testline)
            if mfunc.exception is None:
                mfunc.exception = ValueError('Calculated values are different from the expected matlab results.')
>           raise mfunc.exception
E           ValueError: Calculated values are different from the expected matlab results.

gsw/tests/test_check_functions.py:62: ValueError
-------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------

 pt_second_derivatives
   cf.pt_SA_SA, cf.pt_SA_CT, cf.pt_CT_CT = pt_second_derivatives(cv.SA_chck_cast,cv.CT_chck_cast)
   cf.Ipt_second_deriv = find((abs(cv.pt_SA_SA - cf.pt_SA_SA) >= cv.pt_SA_SA_ca  ) | ( abs(cv.pt_SA_CT - cf.pt_SA_CT) >= cv.pt_SA_CT_ca ) | ( abs(cv.pt_CT_CT - cf.pt_CT_CT) >= cv.pt_CT_CT_ca))
======================================================================================================================== short test summary info =========================================================================================================================
FAILED gsw/tests/test_check_functions.py::test_check_function[cfcf72] - ValueError: Calculated values are different from the expected matlab results.
===================================================================================================================== 1 failed, 168 passed in 20.02s =====================================================================================================================

Answer 16 · 2020-08-30T00:29:18.000Z

I'm testing both aarch64 and ppc64le in conda-forge/gsw-feedstock#22, they both fail the checks.

Answer 17 · 2020-08-30T12:55:09.000Z

BTW, this is a commented from someone in conda-forge who is very knowledgeable on this topic:

there's a good chance that the tests expect a high tolerance
x86_64 has 80-bit floating point registers and some calculations might benefit from it

I'm not sure what is the best solution here. Maybe we are requiring a precision in the comparisons that does not really make sense for the data and maybe lowering the tolerance for all tests is fine. Or, we could have a separate set of tolerance values for other architectures.

Answer 18 · 2020-08-30T17:56:17.000Z

As noted above (#40 (comment) and subsequent comment), the case that triggered this thread can be closed by updating a tolerance, here and in the C repo, but wholesale updating would require more work.

@PaulMBarker, how are the check-value tolerances calculated?

Answer 19 · 2020-08-30T18:03:32.000Z

As noted above (#40 (comment) and subsequent comment), the case that triggered this thread can be closed by updating a tolerance, here and in the C repo, but wholesale updating would require more work.

Yep. I just wanted to bump this so we can get conda-forge/gsw-feedstock#22 merged without ignoring the failures.

Answer 20 · 2020-08-30T18:53:30.000Z

OK, I will see if I can come up with a quick fix.

Answer 21 · 2020-08-30T21:31:44.000Z

#63 should close this--unless it ends up triggering some new failures on some architectures. Almost everything changed in the test input data, check values, and tolerances between the matlab version we were using as our base before, and what is in the current Matlab zipfile, v3_06_11, which is used in #63.

Answer 22 · 2020-08-31T17:46:16.000Z

In my emulated arm environment, #63 moved the failing test to the ICT_second_deriv test, Ipt_second_deriv seems happy now though.

================================= test session starts ==================================
platform linux -- Python 3.8.2, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /GSW-Python
collected 169 items

gsw/tests/test_check_functions.py .............................................. [ 27%]
......................F......................................................... [ 74%]
...............................                                                  [ 92%]
gsw/tests/test_geostrophy.py ........                                            [ 97%]
gsw/tests/test_utility.py ....                                                   [100%]

======================================= FAILURES =======================================
_____________________________ test_check_function[cfcf68] ______________________________

cfcf = ({'SA_Arctic': array([[27.66974178, 28.53097125, 29.79813634],
       [28.13780824, 28.15201013, 29.69700253],
       ...0517e-05,             nan]]), 'ICT_second_deriv': array([18])}, <check_functions.FunctionCheck object at 0x400fe296d0>)

    def test_check_function(cfcf):
        cv, cf, mfunc = cfcf
        mfunc.run(locals())
        if mfunc.exception is not None or not mfunc.passed:
            print('\n', mfunc.name)
            print('  ', mfunc.runline)
            print('  ', mfunc.testline)
            if mfunc.exception is None:
                mfunc.exception = ValueError('Calculated values are different from the expected matlab results.')
>           raise mfunc.exception
E           ValueError: Calculated values are different from the expected matlab results.

gsw/tests/test_check_functions.py:62: ValueError
--------------------------------- Captured stdout call ---------------------------------

 CT_second_derivatives
   cf.CT_SA_SA, cf.CT_SA_pt, cf.CT_pt_pt = CT_second_derivatives(cv.SA_chck_cast,cf.pt)
   cf.ICT_second_deriv = find((abs(cv.CT_SA_SA - cf.CT_SA_SA) >= cv.CT_SA_SA_ca ) | ( abs(cv.CT_SA_pt - cf.CT_SA_pt) >= cv.CT_SA_pt_ca ) | ( abs(cv.CT_pt_pt - cf.CT_pt_pt) >= cv.CT_pt_pt_ca))
=============================== short test summary info ================================
FAILED gsw/tests/test_check_functions.py::test_check_function[cfcf68] - ValueError: C...
============================ 1 failed, 168 passed in 18.19s ============================

Answer 23 · 2020-08-31T18:22:45.000Z

Interesting: it looks like the C version closely follows the Matlab version, and both make a fundamental mistake in their numerics that needlessly loses accuracy. Here is an example from part of the Matlab version:

dpt  = 1e-2;        % increment of potential temperature is 0.01 degrees C.
pt_l = pt - dpt;
pt_u = pt + dpt;

[CT_SA_l, CT_pt_l] = gsw_CT_first_derivatives(SA,pt_l);
[CT_SA_u, CT_pt_u] = gsw_CT_first_derivatives(SA,pt_u);

CT_SA_pt = (CT_SA_u - CT_SA_l)./(pt_u - pt_l);
CT_pt_pt = (CT_pt_u - CT_pt_l)./(pt_u - pt_l);

Notice that those denominators should be just 2 * dpt, for which (pt_u - pt_l) is a less accurate estimate, throwing away 2-4 significant digits of precision. @PaulMBarker, would you like to fix that in the Matlab version and update the test values?

Answer 24 · 2020-09-01T02:21:06.000Z

Thanks for bringing this to my attention. Fortunately I have been side tracked and have been working on combining all of my observed data into one dataset and have not added the new code to the website.

Answer 25 · 2020-09-01T02:42:53.000Z

I have added 3 programmes that I use to make the values contained in the gsw_checkfunctions to the matlab directory.

gsw_check_values_gsw.m - makes the check values
gsw_check_values_ca.m - makes the tolerances
gsw_check_values_datafile.m - makes the netcdf file

the code is in GSW-Matlab/checkfunctions_code/

Answer 26 · 2020-09-01T03:15:37.000Z

Thank you. Now I see how the tolerances are based on perturbations of the check cast values, with the largest resulting perturbation of the function output being used as the tolerance. The perturbations are 1.3e-10 for SA, 6e-10 for t, and 2.3e-8 for p.

Answer 27 · 2022-10-09T23:51:00.000Z

I think this can be closed. Test casts and check values are all different now, with the move to the v3_05_16
Matlab zip. Maybe new failures will pop up on some architecture, and we will have to investigate again.