GEOS-ESM/MAPL

Request support for HDF4

Closed this issue · 13 comments

metdyn commented

@bena-nasa and I find that a number of NASA MODIS data (probably many) are in HDF4 format, for example,
/discover/nobackup/dao_ops/intermediate/flk/modis/MOD04/2017/090/MOD04_L2.A2017090.0010.051.NRT.hdf
Hierarchical Data Format (version 4) data

In order for swath sampler to work directly with these files, we need HDF4 library installed in baselibs and we need a few rudimentary subroutines add to MAPL to handle HDF4. The implemented code needs to read dimensions (nx, ny) and variables LON, LAT, ScanTime (nx, ny) from H4. Some examples are found at
https://support.hdfgroup.org/release4/examples/ug-examples.html

@metdyn HDF4 is in Baselibs. It's annoying at times to build because it's so old, it doesn't like modern OSs every so often, but it is there.

Now, we need to make up some sort of "FindHDF4.cmake" file to let CMake know where it is. I'll look at this.

metdyn commented

@mathomp4 Thank you and good to know it is already there!

metdyn commented

@metdyn HDF4 is in Baselibs. It's annoying at times to build because it's so old, it doesn't like modern OSs every so often, but it is there.

Now, we need to make up some sort of "FindHDF4.cmake" file to let CMake know where it is. I'll look at this.

@mathomp4 Thank you and good to know it is already there!

tclune commented

Arlindo also suggested just using a tool to convert to netcdf. Forgot what he named, but it was something standard. Not the best path in the long term, but with how far behind the project is, anything that works is acceptable.

metdyn commented

Thanks! I will do some file conversion and bypass this stage.

The one I've heard of is h4tonccf_nc4 (https://hdfeos.org/software/h4cflib.php), but h4toh5 might do it as well.

tclune commented

Arlindo seemed to specify something much more common and portable. I just assumed it was something you would be familiar with or I would have pressed him to repeat the name of the command.

@tclune was right. Thanks to our friend, GitHub Copilot you can do:

nccopy -k 4 MOD04_L2.A2017090.2000.051.NRT.hdf MOD04_L2.A2017090.2000.051.NRT.nc

Note that this has to be done on Linux because of the same weird dimension thing. But once you do that, macOS is happy when it sees the file:

❯ uname -s
Darwin
❯ ncdump -hsc MOD04_L2.A2017090.2000.051.NRT.nc | head
netcdf MOD04_L2.A2017090.2000.051.NRT {
dimensions:
	Cell_Along_Swath\:mod04 = 203 ;
	Cell_Across_Swath\:mod04 = 135 ;
	Solution_3_Land\:mod04 = 3 ;
	Solution_1_Land\:mod04 = 2 ;
	Solution_2_Land\:mod04 = 3 ;
	Solution_4_Land\:mod04 = 4 ;
	MODIS_Band_Land\:mod04 = 7 ;
	QA_Byte_Land\:mod04 = 5 ;
metdyn commented

Thanks, @mathomp4. (I can also use h4toh5 tools for conversion.) I now get error from reading feilds within groups. Fundamentally I accessed
file ncid --> group: mod04 ncid1 --> group: Data Fields (ncid2) --> var (Scan_Start_Time). Irrespective of real4 or real8, I get NetCDF: Start+count exceeds dimension bound

group: mod04 {
group: Data\ Fields {
double Scan_Start_Time(Cell_Along_Swath:mod04, Cell_Across_Swath:mod04) ;

ck grp1
ck grp2
pe=00003 FAIL at line=00245 Plain_netCDF_Time.F90 <netCDF error: NetCDF: Start+count exceeds dimension bound>

Well, that is definitely a 64-bit real:

     3 : unknown  unknown  c instant       1   1     27405   1  F64  : Scan_Start_Time
metdyn commented

@mathomp4 Sorry, when I used your method on discover 'nccopy -k 4'. After ncdump, the output is different from the content generated using h4toh5. There is no group name associated. Oops, I should be able to read the file then. Thank you!

@metdyn If nccopy doesn't work, you can try:

ncks -4 input.hdf4 output.nc4

Actually, you might want to do:

ncks -4 -L 1 input.hdf4 output.

that will (re-)compress the output. Not as well as it was before, but better:

-rw-r--r-- 1 mathomp4 g0620 558K Oct 10 13:49 MOD04_L2.A2017090.2000.051.NRT.hdf
-rw-r--r-- 1 mathomp4 g0620  13M Oct 11 15:45 MOD04_L2.A2017090.2000.051.NRT.nc4
-rw-r--r-- 1 mathomp4 g0620 819K Oct 11 15:46 MOD04_L2.A2017090.2000.051.NRT.nc4.Z

So from 558K to 819K, but better than 13M!

metdyn commented

Thank you, @mathomp4, I see the difference. I find these formats including h5, nc4 (nc4.Z) can all be read in by netCDF Fortran API on mac and I can read the Scan_Start_Time:
this%t_alongtrack(j)= 7.6507264000000000E+08 7.6507264000000000E+08
(I previously had an error in the along and across swath index, I also have no complains for H5 or the h4toh5 conversion tool).
This solves my file read-in problem. Thank you!