๐ Unit test test_subsurface_floats_opens fails
Closed this issue ยท 6 comments
Describe the bug
I get a failure when I run the unit test suite
To Reproduce
git clone https://github.com/cloud-drift/clouddrift
cd clouddrift/
mamba env create -f environment.yml --force
conda activate clouddrift
mamba install -c conda-forge matplotlib cartopy
pip install -e .
python -m unittest tests/*.py
Desktop (please complete the following information):
Additional context
ERROR: test_subsurface_floats_opens (tests.datasets_tests.datasets_tests.test_subsurface_floats_opens)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/mattia/MyGit/JOSS/clouddrift/tests/datasets_tests.py", line 47, in test_subsurface_floats_opens
with datasets.subsurface_floats() as ds:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/mattia/MyGit/JOSS/clouddrift/clouddrift/datasets.py", line 484, in subsurface_floats
return _dataset_filecache(
^^^^^^^^^^^^^^^^^^^
File "/Users/mattia/MyGit/JOSS/clouddrift/clouddrift/datasets.py", line 630, in _dataset_filecache
ds = get_ds()
^^^^^^^^
File "/Users/mattia/MyGit/JOSS/clouddrift/clouddrift/adapters/subsurface_floats.py", line 49, in to_xarray
source_data = scipy.io.loadmat(local_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/mattia/miniforge3/envs/clouddrift/lib/python3.12/site-packages/scipy/io/matlab/_mio.py", line 227, in loadmat
matfile_dict = MR.get_variables(variable_names)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/mattia/miniforge3/envs/clouddrift/lib/python3.12/site-packages/scipy/io/matlab/_mio5.py", line 330, in get_variables
res = self.read_var_array(hdr, process)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/mattia/miniforge3/envs/clouddrift/lib/python3.12/site-packages/scipy/io/matlab/_mio5.py", line 290, in read_var_array
return self._matrix_reader.array_from_header(header, process)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "_mio5_utils.pyx", line 665, in scipy.io.matlab._mio5_utils.VarReader5.array_from_header
File "_mio5_utils.pyx", line 710, in scipy.io.matlab._mio5_utils.VarReader5.array_from_header
File "_mio5_utils.pyx", line 885, in scipy.io.matlab._mio5_utils.VarReader5.read_cells
File "_mio5_utils.pyx", line 663, in scipy.io.matlab._mio5_utils.VarReader5.read_mi_matrix
File "_mio5_utils.pyx", line 694, in scipy.io.matlab._mio5_utils.VarReader5.array_from_header
File "_mio5_utils.pyx", line 768, in scipy.io.matlab._mio5_utils.VarReader5.read_real_complex
File "_mio5_utils.pyx", line 445, in scipy.io.matlab._mio5_utils.VarReader5.read_numeric
File "_mio5_utils.pyx", line 350, in scipy.io.matlab._mio5_utils.VarReader5.read_element
File "_streams.pyx", line 171, in scipy.io.matlab._streams.ZlibInputStream.read_string
File "_streams.pyx", line 164, in scipy.io.matlab._streams.ZlibInputStream.read_into
OSError: could not read bytes
----------------------------------------------------------------------
Ran 216 tests in 34.575s
FAILED (errors=1)
I can't reproduce this.. Someone else can try?
Side note - not a requirement for JOSS:
Do you perform a checksum validation on the test data?
If not, I suggest looking into pooch. Many SciPy packages use it, and in my experience, it works well out of the box. Additionally, I'm personally not a big fan of test data being downloaded to a hidden directory in the home. By default,pooch
uses the standard machine-specific cache directory.
- We don't do checksums at the moment. We could, but that would break tests and require more work when the underlying data (that we don't control) is updated.
- I did not implement the download module, but looks like
pooch
could be used instead. - Not a fan of the hidden folder also personally haha. We do want to be able to save it locally and not in a temporary cache folder to avoid having to redownload the ~20k files of the GDP datasets.
๐
Sorry, too much focus on pooch in my comment. My main point was, maybe you can't reproduce because we are using different test data? I'll try again tomorrow.
Ah ok. The tests should always pickup the latest datasets from their respective repos. There is sometime issues with timeout but right now it seems to be ok. Let me know when you give it another go.
It worked this morning, same environment. I guess it's either a flake or the data downloaded was corrupted.
We don't do checksums at the moment. We could, but that would break tests and require more work when the underlying data (that we don't control) is updated.
Maybe the best solution is to store a copy of the data that you need for unit testing where you have full control, like a dedicated GitHub repo, Zenodo, a cloud provider, ...