TEOS-10/GSW-Python

test_check_function[cfcf71] fails on 32bit openSUSE

Closed this issue ยท 27 comments

mcepl commented

When running test suite (yes, perhaps you are right in #39 and it is an hopeless effort) on 32bit machine (see
build log) I get this one test failing:

======================== FAILURES ===================================
_________________________ test_check_function[cfcf71] __________________________

cfcf = ({'n2min_alpha': array([[3.18312465e-04, 3.13145844e-04, 9.01115850e-05],
       [3.18142757e-04, 3.13790954e-04, 6.43...           nan]]), 'Ipt_second_deriv': array([15], dtype=int32)}, <check_functions.FunctionCheck object at 0xaa1a6f4c>)

    def test_check_function(cfcf):
        cv, cf, mfunc = cfcf
        mfunc.run(locals())
        if mfunc.exception is not None or not mfunc.passed:
            print('\n', mfunc.name)
            print('  ', mfunc.runline)
            print('  ', mfunc.testline)
            if mfunc.exception is None:
                mfunc.exception = ValueError('Calculated values are different from the expected matlab results.')
>           raise mfunc.exception
E           ValueError: Calculated values are different from the expected matlab results.

gsw/tests/test_check_functions.py:62: ValueError
----------------------------- Captured stdout call -----------------------------

 pt_second_derivatives
   cf.pt_SA_SA, cf.pt_SA_CT, cf.pt_CT_CT = pt_second_derivatives(cv.SA_chck_cast,cv.CT_chck_cast)
   cf.Ipt_second_deriv = find((abs(cv.pt_SA_SA - cf.pt_SA_SA) >= cv.pt_SA_SA_ca  ) | ( abs(cv.pt_SA_CT - cf.pt_SA_CT) >= cv.pt_SA_CT_ca ) | ( abs(cv.pt_CT_CT - cf.pt_CT_CT) >= cv.pt_CT_CT_ca))
===================== 1 failed, 162 passed in 3.65 seconds =====================
error: Bad exit status from /var/tmp/rpm-tmp.0NSNJz (%check)

We would need to run the tests in a "more verbose" mode to check the values. My guess is that the diff tolerance would need to be adjusted there and those are not real failures. (I may be wrong though, we never tested on 32 bit!)

What do you get if you build the underlying C library (https://github.com/TEOS-10/GSW-C) and run its test function?

mcepl commented

Sorry, we don't have that packaged, and I don't have time right now to hunt some proper 32bit VMs to play with it.

sebix commented

(I may be wrong though, we never tested on 32 bit!)

So would you say that 32bit is not (officially) supported?

Yes, unless someone steps forward to handle both 32-bit testing and solving any problems that arise, it will not be supported. I'm surprised that even one test failure has turned up, given that the calculations are all explicitly double precision, which should not depend on 32 vs 64 bit architecture. I would be happy to see someone track this down and supply a PR to fix it, if it really is in the GSW code (probably in GSW-C, if it is), and not in some other arcane detail of the test VM on which the failure occurred.

I did some digging into this one and would like to summarize what I found with some options for what we might do.

The Journey

Using the i586 version of openSUSE/Tumbleweed, I was able to reproduce the reported test failure in the python version of GSW. Compiling the GSW-C version directly also produced a failure with gsw_check specifically in the pt_second_derivatives function. I also attempted to reproduce the error the following:

  • 32 bit ubuntu 14.04 VM (i686)
  • 32 bit ubuntu docker (i686)
  • setting the -m32 flag on the above environments
  • setting the -m32 flag when compiling on my Mac (10.14.3) using llvm

I then started to mess with the compilation flags for GSW-C in the openSUSE VM where I could reproduce the failure. When compiled with -O0, all the check functions pass. In an attempt to figure out which optimization was causing the issue, I manually enabled all the flags listed under -O0 of the Optimize-Options page of GCC, but the built library still passed all the gsw_check functions. I couldn't even reproduce when I used GCC itself to output all the optimization flags it was using.

Some googling for how optimization levels might change doubles I was lead to a stack overflow question which suggested the -ffloat-store flag. Enabling this on the 32bit openSUSE VM allowed all check functions to pass. Using a command such as CFLAGS=-ffloat-store pip install . when in the GSW-Python source directory also resulted in an installation which passes all tests.

One more bit of information, at the end of this digging, I finally noticed that the failing function was at the GSW_ERROR_LIMIT for this function. If I changed the check function (in GSW-C) to look for > instead of >=, all the tests pass.

What should we do?

I see some options:

  1. Enable the -ffloat-store flag upstream.
  2. Change the check functions to look for things only greater than the allowed limit, not greater than or equal to.
  3. Have the openSUSE maintainer enable the -ffloat-store flag for their i586 build.
  4. Set this as a won't fix.

Option 1 has some performance implications I don't know, but some quick searching shows that it will make things slower, and is unnecessary for most of the systems I can get my hands on. That said, when I hacked the gsw_check to always output the diff, having the flag enabled really tightened up the differences (most became 0). But, with the option not enabled, the differences were much less than the precision of the instruments, as such, I favor speed in this case.

I think no matter what happens, we might just want to say that 32bit is not officially supported.

@DocOtak, that is an impressive investigation. Thank you.

It looks like some (at least one) of the most recent matlab package check value tolerances differ from what we are using. Three values are used in the failing test. The current Matlab file:

In [3]: xx = matfile.loadmatbunch('gsw_data_v3_0.mat')

In [5]: xx.gsw_cv['pt_SA_SA_ca']
3.122502256758253e-14

In [6]: xx.gsw_cv['pt_SA_CT_ca']
1.1991276027689679e-14

In [7]: xx.gsw_cv['pt_CT_CT_ca']
4.440892098500626e-14

What we are using:

In [19]: cv['pt_SA_SA_ca'].item()
1.734723475976807e-14

In [20]: cv['pt_SA_CT_ca'].item()
1.1991276027689679e-14

In [21]: cv['pt_CT_CT_ca'].item()
4.440892098500626e-14

Notice that only one of these values ('SA_SA') has changed, and it is much larger than the one we are using. Evidently we need to update our check values to match. Maybe this is the only one that has changed--I haven't tried to check that.

Looks like I failed to say that I could only reproduce this problem on the openSUSE system, none of the other 32-bit systems I tried would have this issue. Also, I didn't dig into any differences in i586 vs i686 and other arch issues.

@efiring sounds like we might just want to update the check data... the 1.73[...]e-14 value was the one being given by the gsw_check tool in GSW-C. My guess would be that this issue would be fixed by doing that.

Unfortunately, it turns out that a wholesale updating of the check values and tolerances breaks 2 other tests.

Does it break them by much? I've been thinking about this and have some questions:

  • How are these check values and tolerances made?
  • Would this need to be fixed in the upstream GSW-C?
  • For this specific issue, is there concern that the library will produce the "wrong answer" for people using the 32bit openSUSE install?

I had some other question... but in the middle of writing this lost track of what it was... I guess a general "what do we do with this information?"

It fails the same way on aarch64. But -ffloat-store workaround does not work there.

@ggardet thanks for the report.

I'm wonder what the implications are for the recently announced Apple transition to an ARM architecture. Same with a major cloud provider (AWS Graviton).

I'm wonder what the implications are for the recently announced Apple transition to an ARM architecture.

They cannot run Fortran on that yet so most of the scientific stack won't have SciPy and numpy. Numpy can be Fortran free by removing f2py and linking against a non-Fortran blas but that is not the usual way, most of numpy variants out there are openblas or mkl, both rely on Fotran.

For reference, the aarch64 failure is reproducible in a docker container of the right architecture. I think it uses qemu for emulating arm on intel.

pytest FAILURES
================================================================================================================================ FAILURES ================================================================================================================================
______________________________________________________________________________________________________________________ test_check_function[cfcf72] _______________________________________________________________________________________________________________________

cfcf = ({'n2min_alpha': array([[3.18312465e-04, 3.13145844e-04, 9.01115850e-05],
       [3.18142757e-04, 3.13790954e-04, 6.43...e-05,             nan]]), 'Ipt_second_deriv': array([ 7, 24])}, <check_functions.FunctionCheck object at 0x400f983880>)

    def test_check_function(cfcf):
        cv, cf, mfunc = cfcf
        mfunc.run(locals())
        if mfunc.exception is not None or not mfunc.passed:
            print('\n', mfunc.name)
            print('  ', mfunc.runline)
            print('  ', mfunc.testline)
            if mfunc.exception is None:
                mfunc.exception = ValueError('Calculated values are different from the expected matlab results.')
>           raise mfunc.exception
E           ValueError: Calculated values are different from the expected matlab results.

gsw/tests/test_check_functions.py:62: ValueError
-------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------

 pt_second_derivatives
   cf.pt_SA_SA, cf.pt_SA_CT, cf.pt_CT_CT = pt_second_derivatives(cv.SA_chck_cast,cv.CT_chck_cast)
   cf.Ipt_second_deriv = find((abs(cv.pt_SA_SA - cf.pt_SA_SA) >= cv.pt_SA_SA_ca  ) | ( abs(cv.pt_SA_CT - cf.pt_SA_CT) >= cv.pt_SA_CT_ca ) | ( abs(cv.pt_CT_CT - cf.pt_CT_CT) >= cv.pt_CT_CT_ca))
======================================================================================================================== short test summary info =========================================================================================================================
FAILED gsw/tests/test_check_functions.py::test_check_function[cfcf72] - ValueError: Calculated values are different from the expected matlab results.
===================================================================================================================== 1 failed, 168 passed in 20.02s =====================================================================================================================

I'm testing both aarch64 and ppc64le in conda-forge/gsw-feedstock#22, they both fail the checks.

BTW, this is a commented from someone in conda-forge who is very knowledgeable on this topic:

there's a good chance that the tests expect a high tolerance
x86_64 has 80-bit floating point registers and some calculations might benefit from it

I'm not sure what is the best solution here. Maybe we are requiring a precision in the comparisons that does not really make sense for the data and maybe lowering the tolerance for all tests is fine. Or, we could have a separate set of tolerance values for other architectures.

As noted above (#40 (comment) and subsequent comment), the case that triggered this thread can be closed by updating a tolerance, here and in the C repo, but wholesale updating would require more work.

@PaulMBarker, how are the check-value tolerances calculated?

As noted above (#40 (comment) and subsequent comment), the case that triggered this thread can be closed by updating a tolerance, here and in the C repo, but wholesale updating would require more work.

Yep. I just wanted to bump this so we can get conda-forge/gsw-feedstock#22 merged without ignoring the failures.

OK, I will see if I can come up with a quick fix.

#63 should close this--unless it ends up triggering some new failures on some architectures. Almost everything changed in the test input data, check values, and tolerances between the matlab version we were using as our base before, and what is in the current Matlab zipfile, v3_06_11, which is used in #63.

In my emulated arm environment, #63 moved the failing test to the ICT_second_deriv test, Ipt_second_deriv seems happy now though.

================================= test session starts ==================================
platform linux -- Python 3.8.2, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /GSW-Python
collected 169 items

gsw/tests/test_check_functions.py .............................................. [ 27%]
......................F......................................................... [ 74%]
...............................                                                  [ 92%]
gsw/tests/test_geostrophy.py ........                                            [ 97%]
gsw/tests/test_utility.py ....                                                   [100%]

======================================= FAILURES =======================================
_____________________________ test_check_function[cfcf68] ______________________________

cfcf = ({'SA_Arctic': array([[27.66974178, 28.53097125, 29.79813634],
       [28.13780824, 28.15201013, 29.69700253],
       ...0517e-05,             nan]]), 'ICT_second_deriv': array([18])}, <check_functions.FunctionCheck object at 0x400fe296d0>)

    def test_check_function(cfcf):
        cv, cf, mfunc = cfcf
        mfunc.run(locals())
        if mfunc.exception is not None or not mfunc.passed:
            print('\n', mfunc.name)
            print('  ', mfunc.runline)
            print('  ', mfunc.testline)
            if mfunc.exception is None:
                mfunc.exception = ValueError('Calculated values are different from the expected matlab results.')
>           raise mfunc.exception
E           ValueError: Calculated values are different from the expected matlab results.

gsw/tests/test_check_functions.py:62: ValueError
--------------------------------- Captured stdout call ---------------------------------

 CT_second_derivatives
   cf.CT_SA_SA, cf.CT_SA_pt, cf.CT_pt_pt = CT_second_derivatives(cv.SA_chck_cast,cf.pt)
   cf.ICT_second_deriv = find((abs(cv.CT_SA_SA - cf.CT_SA_SA) >= cv.CT_SA_SA_ca ) | ( abs(cv.CT_SA_pt - cf.CT_SA_pt) >= cv.CT_SA_pt_ca ) | ( abs(cv.CT_pt_pt - cf.CT_pt_pt) >= cv.CT_pt_pt_ca))
=============================== short test summary info ================================
FAILED gsw/tests/test_check_functions.py::test_check_function[cfcf68] - ValueError: C...
============================ 1 failed, 168 passed in 18.19s ============================

Interesting: it looks like the C version closely follows the Matlab version, and both make a fundamental mistake in their numerics that needlessly loses accuracy. Here is an example from part of the Matlab version:

dpt  = 1e-2;        % increment of potential temperature is 0.01 degrees C.
pt_l = pt - dpt;
pt_u = pt + dpt;

[CT_SA_l, CT_pt_l] = gsw_CT_first_derivatives(SA,pt_l);
[CT_SA_u, CT_pt_u] = gsw_CT_first_derivatives(SA,pt_u);

CT_SA_pt = (CT_SA_u - CT_SA_l)./(pt_u - pt_l);
CT_pt_pt = (CT_pt_u - CT_pt_l)./(pt_u - pt_l);

Notice that those denominators should be just 2 * dpt, for which (pt_u - pt_l) is a less accurate estimate, throwing away 2-4 significant digits of precision. @PaulMBarker, would you like to fix that in the Matlab version and update the test values?

Thanks for bringing this to my attention. Fortunately I have been side tracked and have been working on combining all of my observed data into one dataset and have not added the new code to the website.

I have added 3 programmes that I use to make the values contained in the gsw_checkfunctions to the matlab directory.

gsw_check_values_gsw.m - makes the check values
gsw_check_values_ca.m - makes the tolerances
gsw_check_values_datafile.m - makes the netcdf file

the code is in GSW-Matlab/checkfunctions_code/

Thank you. Now I see how the tolerances are based on perturbations of the check cast values, with the largest resulting perturbation of the function output being used as the tolerance. The perturbations are 1.3e-10 for SA, 6e-10 for t, and 2.3e-8 for p.

I think this can be closed. Test casts and check values are all different now, with the move to the v3_05_16
Matlab zip. Maybe new failures will pop up on some architecture, and we will have to investigate again.