SciTools/iris-grib

Incorrectly loaded grib file on Windows

Closed this issue · 10 comments

I have a grib file which loads differently on Windows than on Linux. It seems to be some kind of indexing issue, possibly related to dask. The file contains 37 timesteps so has multiple grib messages, and every other message seems to give unexpected data on Windows (i.e. does not match the array extracted directly using eccodes). Seems to be OK on Linux.

I can provide the file to someone in the Met Office if required (to be clear this behaviour is not unique to this file, but this file suffices to demonstrate the issue).

The following code demonstrates the issue:

import eccodes
import iris
import numpy as np


def callback(cube, field, filename):
    gid = field._raw_message._message_id
    nx = eccodes.codes_get(gid, 'Ni', int)
    ny = eccodes.codes_get(gid, 'Nj', int)
    arr = np.reshape(eccodes.codes_get_values(gid), (ny, nx))
    print(arr.sum(), field.data.sum().compute())
    print("offset", field._recreate_raw.offset)


if __name__ == "__main__":
    cube = iris.load_cube("ukv-dev-soundwave_agl_temperature_10_2021120800.grib", callback=callback)

Linux output:

82342111.64331436 82342111.64331436
offset 0
82341115.14520645 82341115.14520645
offset 442655
82340366.75977707 82340366.75977707
offset 885310
82346012.60280609 82346012.60280609
offset 1327965
82340807.58714676 82340807.58714676
offset 1770620
82337358.89959335 82337358.89959335
offset 2213275
82330093.26195717 82330093.26195717
...

Windows output:

82342111.64331436 82342111.64331436
offset 0
82341115.14520645 82340366.75977707
offset 442656
82340366.75977707 82340366.75977707
offset 885310
82346012.60280609 82340807.58714676
offset 1327966
82340807.58714676 82340807.58714676
offset 1770620
82337358.89959335 82330093.26195717
offset 2213276
82330093.26195717 82330093.26195717

Note the different offsets between both outputs, and the mismatch between array sums in every second message in the Windows output.

@mo-david-stevenson

The offset is calculated at https://github.com/SciTools/iris-grib/blob/main/iris_grib/message.py#L62, i.e.,

        while True:
            offset = grib_fh.tell()
            grib_id = gribapi.grib_new_from_file(grib_fh)
            if grib_id is None:
                break
            raw_message = _RawGribMessage(grib_id)
            recreate_raw = _MessageLocation(filename, offset)
            yield GribMessage(raw_message, recreate_raw, file_ref=file_ref)

Changing this to

        offset = 0
        while True:
            grib_id = gribapi.grib_new_from_file(grib_fh)
            if grib_id is None:
                break
            raw_message = _RawGribMessage(grib_id)
            recreate_raw = _MessageLocation(filename, offset)
            offset += gribapi.grib_get_message_size(grib_id)
            yield GribMessage(raw_message, recreate_raw, file_ref=file_ref)

Fixes the issue. I've obviously set the starting offset to be 0, which I'm not sure is a safe assumption?

Presumably also need a fix at https://github.com/SciTools/iris-grib/blob/main/iris_grib/__init__.py#L148?

Or avoiding that assumption:

        while True:
            grib_id = gribapi.grib_new_from_file(grib_fh)
            if grib_id is None:
                break
            offset = gribapi.grib_get_message_offset(grib_id)
            raw_message = _RawGribMessage(grib_id)
            recreate_raw = _MessageLocation(filename, offset)
            yield GribMessage(raw_message, recreate_raw, file_ref=file_ref)

I'm experiencing a similar problem loading GRIB2 on Windows. Every alternate GRIB message is getting the correct meta data in the cube but the data values are all from the next GRIB message instead. However, I do have at least some files of GRIB2 where this problem does not happen. Can't see any difference between the files.

E.g. if first 3 GRIB messages are land binary mask, model terrain height and geopotential height on a pressure level. All 3 are OK looking grib_dump output and in Panoply. Loading in iris, land binary mask cube looks OK, model terrain height has the expected meta data but the data values from the geopotential height message (e,g. all values > 30000). Geopotential height is correct again. This pattern continues with every alternate message.

Adding a callback, I can see the cube data values are offset to the next GribMessage but the field GribMessage actually has the correct coded values. So I've currently got a workaround that reshapes and replaces the data in the cube.

EREPS ticket: https://metoffice.atlassian.net/browse/DE-639

@SciTools/peloton: @david-bentley, @mo-helendavis sorry for the delay with this. One thing that is important to check is the version(s) of Iris and iris-grib that you are using on Windows and on Linux.

From the command line:

python -c "import iris; print(iris.__version__)";
python -c "import iris_grib; print(iris_grib.__version__)";

@trexfeathers Not sure what versions I would have been using back then, but I've created a new environment with the same versions on both Windows and Linux; versions are:

Linux:

$ python -c "import iris; print(iris.__version__)"
3.6.0
$ python -c "import iris_grib; print(iris_grib.__version__)"
0.18.0
$ python -c "import eccodes; print(eccodes.__version__)"
2.30.2

Windows:

python -c "import iris; print(iris.__version__)"
3.6.0
python -c "import iris_grib; print(iris_grib.__version__)"
0.18.0
python -c "import eccodes; print(eccodes.__version__)"
2.30.2

And here is the output showing the issue is still present using the following code, saved in iris_test_grib.py and run with python iris_test-grib.py | more. I've used a different file because I can't find the original file I used previously, but the issue is not specific to a particular file.

import eccodes
import iris
import numpy as np


def callback(cube, field, filename):
    gid = field._raw_message._message_id
    nx = eccodes.codes_get(gid, 'Ni', int)
    ny = eccodes.codes_get(gid, 'Nj', int)
    arr = np.reshape(eccodes.codes_get_values(gid), (ny, nx))
    print(arr.sum(), field.data.sum().compute())
    print("offset", field._recreate_raw.offset)


if __name__ == "__main__":
    cube = iris.load_cube("2021100806_ukv-dev-soundwave_temperature.grib", callback=callback)

Linux:

669134.5234870911 669134.5234870911
offset 0
669579.7834396362 669579.7834396362
offset 3713
669659.5537185669 669659.5537185669
offset 7426
669386.830329895 669386.830329895
offset 11139
667856.9562911987 667856.9562911987
offset 14852
666605.8734893799 666605.8734893799
offset 18565
666640.6002998352 666640.6002998352
offset 22278
666665.0799751282 666665.0799751282
offset 25991
666563.8387680054 666563.8387680054
offset 29704
666692.4885749817 666692.4885749817
offset 33417
666673.7871170044 666673.7871170044
offset 37130
666599.6193885803 666599.6193885803
offset 40843
...

Windows:

669134.5234870911 669134.5234870911
offset 0
669579.7834396362 669659.5537185669
offset 3714
669659.5537185669 669659.5537185669
offset 7426
669386.830329895 667856.9562911987
offset 11140
667856.9562911987 667856.9562911987
offset 14852
666605.8734893799 666640.6002998352
offset 18566
666640.6002998352 666640.6002998352
offset 22278
666665.0799751282 666563.8387680054
offset 25992
666563.8387680054 666563.8387680054
offset 29704
666692.4885749817 666673.7871170044
offset 33418
666673.7871170044 666673.7871170044
offset 37130
666599.6193885803 666866.8663978577
offset 40844
...

Thanks for the specifics @david-bentley. We suspect that Windows is doing something that forces even numbers; it could possibly be that a binary vs text switch needs to be set for Windows, but Linux is handling it gracefully. Will try to find some time to investigate further.

Thanks for the specifics @david-bentley. We suspect that Windows is doing something that forces even numbers; it could possibly be that a binary vs text switch needs to be set for Windows, but Linux is handling it gracefully. Will try to find some time to investigate further.

Note that the change as described in #286 (comment), and implemented in the associated PR #287 (i.e. calculating the offset using gribapi.grib_get_message_offset instead of grib_fh.tell) fixes the issue.

Thanks for the specifics @david-bentley. We suspect that Windows is doing something that forces even numbers; it could possibly be that a binary vs text switch needs to be set for Windows, but Linux is handling it gracefully. Will try to find some time to investigate further.

Note that the change as described in #286 (comment), and implemented in the associated PR #287 (i.e. calculating the offset using gribapi.grib_get_message_offset instead of grib_fh.tell) fixes the issue.

Thanks, and sorry for missing that! I will instead take a look at #287.

Sorry, I must have missed the github email (I get so many!).

(science-wrapper) C:\workspace\ereps\science>python -c "import iris; print(iris.version)"
3.1.0

(science-wrapper) C:\workspace\ereps\science>python -c "import iris_grib; print(iris_grib.version)"
0.18.0

(science-wrapper) C:\workspace\ereps\science>python -c "import eccodes; print(eccodes.version)"
2.28.0

(science-wrapper) C:\workspace\ereps\science>

Closed by #287