/hdf5_handler

BES module to read hdf5 files

Primary LanguageC++GNU Lesser General Public License v2.1LGPL-2.1

This file describes the HDF5 handler developed by The HDF Group and OPeNDAP, 
Inc. under a grant from NASA. For information about building the HDF5 handler,
see the INSTALL.


What is the HDF5 handler?
---------------------

  Hierarchical Data Format Version 5 (HDF5) is a general-purpose library and 
file format for storing, managing, archiving, and exchanging scientific data. 
The HDF5 data model includes two primary types of objects, a number of 
supporting object types, and metadata describing how HDF5 files and objects 
are to be organized and accessed. The HDF5 file format is self-describing in 
the sense that the structures of HDF5 objects are described within the file.

 The HDF5 handler is a Hyrax Back-end Server(BES) module that maps HDF5 
objects into OPeNDAP's DAP2 data model. This allows users to access the data 
in remote  HDF5 files using the OPeNDAP clients. There can be many ways to 
serve HDF5 data but this handler differentiates itself by putting the goal of 
following CF conventions to support NASA HDF5/HDF-EOS5 products first. The 
HDF5 handler team strives to achieve what is called "CF-compliant" status of 
all NASA HDF5 data products. This means that OPeNDAP visualization client 
users should feel easy in accessing and visualizing the remote NASA 
HDF5/HDF-EOS5 data products if NASA data centers provide the Hyrax OPeNDAP 
services.

 Since some NASA HDF5/HDF-EOS5 products either do not follow or only partially 
follow CF conventions, the handler developers tried to make them be 
CF-compliant  so that OPeNDAP client tools can visualize these products.
This can be realized by the developers' knowledge and experiences as well
as intensive discussions with developers of the corresponding NASA data 
centers. The output of this handler has been checked carefully with OPeNDAP 
client tools such as IDV, Panoply, GrADS, Ferret, NCL, MATLAB, and IDL.

 A comprehensive list of the improvements since the 2.0.0 release is
available on the next section.

What's new for Hyrax 1.16.5
 - Now based on HDF5-1.12.1
 - Updated libdap headers references.
 - Repaired a problem with the evaluation of "hard links" in dataset dimensions.
 - Added the use of default configuration when the user fails to specify configuration
   values.

What's new for Hyrax 1.16.4
CF option:
1. The DMR response can be directly generated from the hdf5 file rather from DDS and DAS.
   To take advantage of this feature, the H5.EnableCFDMR key needs to be set to true in the configuration 
   file(h5.conf etc.)
   The biggest advantage of this implementation is that the signed 8-bit integer mapping is kept. 
   The DMR generation from DDS and DAS maps the signed 8-bit integer to 16-bit integer because of 
   the limitation in the DAP2 data model.
2. Bug fix: Ensure the path inside the coordinates attribute for TROPOMI AI product to be flattened. 
3. Update the handling of escaping special characters due to NASA request.
   The '\' and the '"' are no longer escaped. 

What's new for Hyrax 1.16.3
Default option:
1. Significantly enhance the handling of netCDF-4 like HDF5 files for the DAP4(DMR) response.
   1) Correctly handle the pure netCDF-4 dimensions.
   2) Remove the pre-defined netCDF-4 attributes.
   3) Update the testsuite that tests NASA DAP4 response.
CF option:
1. ICESat-2 ATL03 and ATL08 support: 
   Add the group path and make the variable names follow the CF naming 
   conventions for "coordinates" attributes of ATL03-like variables.
2. Ensure the unsupported objects are removed in the DMR response.
3. Add the support of the new GPM DPR level 3 version product. 


What's new for Hyrax 1.16.2
CF option:
1. Support the correct generation of DMRPP files. 
   1) Add BES keys to turn off the mapping of 64-bit integer to DMR and 
      not to generate the fullnamepath attribute when the storage size is zero.
   2) Ensure that the fullnamepath and origname attributes are not generated
      when the corresonding BES keys are turned off. 
2. Enhance the support to make general two dimensional latitude/longitude
   HDF5 files follow CF so that NASA OMPS-NPP products can be supported.
3. Enhance the support of netCDF-4 like files. 


What's new for Hyrax 1.16.1

Performance Improvement for both CF and default options:

The Hyrax data access will not build the DAS if not necessary. 

CF option:
1. Correct the unlimited dimension name for an HDF5 file that has both unlimited
   dimensions and group hierarchy.GES DISC GPM level 3 belongs to this category.

2. Remove the netCDF-4 reserved NAME attributes from Hyrax DAS output since these
   attributes will cause the the failure of the fileout netCDF-4 module. 

3. Add a BES key so that the CF global attribute "Conventions" will not prepend
   the HDF-EOS group path.


What's new for Hyrax 1.16.0

CF option:
Fix a small memory leaking when handling OCO2 Lite product. 

What's new for Hyrax 1.15.3
The handler is updated to successfully compile and test with both HDF5 1.8.21 and HDF5 1.10.4.


What's new for Hyrax 1.15.0
CF option:
1. Add the support of the HDF-EOS5 Polar Stereographic(PS) and Lambert Azimuthal Equal Area(LAMAZ) grid projection files. 
   Both projection files can be found in NASA LANCE products.
2. Add the HDF-EOS5 grid latitude and longitude cache support. This is the same as what we did for the HDF-EOS2 grid. 
3. Add the support for TROP-OMI, new OMI level 2 and OMPS-NPP product.
4. Removed the internal reserved netCDF-4 attributes for DAP output.
5. Make the behavior of the drop long string BES key consistent with the current limitation of netCDF Java.

 
What's new for Hyrax 1.14.0
CF option:
1. Add the support of Hybrid HDF-EOS5 products. An example is the NASA ASDC AirMSPI product.
   (1) The HDF-EOS5 specified group path will be removed before flattening the variable name by following 
       the CF. 
   (2) If a hybrid HDF-EOS5 product follows the netCDF-4 data model, the handler will also follow
       the netCDF-4 data model to map the HDF5 objects to DAP2.
   (3) For a product that contains the "coordinate" attribute in a variable, the attribute value will
       be adjusted to reflect the removal of the HDF-EOS5 specified group path.
   (4) For a grid product that has CF grid_mapping information, the related grid_mapping information is 
       also adjusted to reflect the removal of the HDF-EOS5 specified group.
2. Enhance the HDF-EOS5 parser to support new OMI level 2 products. 
   

What's new for Hyrax 1.13.4
----------------------------

CF option: 
1. Add the disk cache support for raw data and DAS. 
   This support aims to improve the performance to access HDF5 data via Hyrax.
   Since this support may vary from case to case, by default we turn it off. 
   (1) Users, who want to use the disk cache for raw data, should read the description of the following 
       BES keys at h5.conf under /etc/bes/modules and change the BES key values to fit for your own use 
       cases.

       H5.DiskCacheComp=false
       H5.DiskCacheFloatOnlyComp=true
       H5.DiskCacheCompThreshold=2.0
       H5.DiskCacheCompVarSize=100

   (2) Users, who want to use the disk cache for DAS, should also go to the h5.conf file and change the 
       BES key H5.EnableDiskMetaDataCache to true. The appropriate path that stores the DAS cache file
       should also be set with the BES key H5.DiskMetaDataCachePath.

2. Add the HDF-EOS5 sinussoidal projection support. 
   For the HDF-EOS5 sinusoidal projection, the latitude and longitude are calculated and CF grid projection 
   information is added.

*****************************************************
********Special note about the version number********
*****************************************************
Since Hyrax 1.13.2, Hyrax just treats each handler as a module. 
So We stop assigning an individual version number for the HDF5 handler.

What's new for version 2.3.3(Released with Hyrax 1.13.2)
----------------------------

- The retrieval of BES key values are moved to the the constructor of the hdf5 handler to improve the
  performance.
- The DAP metadata responses: DDS,DAS and DMR can be cached in memory to improve the performance. 

CF option:

- Add the memory cache support to store  data values of coordinate variables and specific data variables.
- Update the support of the fillvalue and the addition of new coordinate variables in the new GPM products.
- Fix a bug to identify variables latitude and longitude for SMAP-like products.

Default option:
- Add the mapping of root attributes to DAP4. 

Note: 

1) The description of memory cache feature can be found in h5.conf.in under 
   https://github.com/OPENDAP/hdf5_handler.
2) Since Hyrax 1.13.2 is an emergency release, the handler version is not bumped. 

What's new for version 2.3.3(Released with Hyrax 1.13.1)
----------------------------
CF option:
- HDF5 Scalar dataset reading
  HDF5 scalar datasets with all atomic datatypes are supported. In previous versions, only string scalar 
  dataset is supported. 
- Unlimited dimension  
  Now the client that understands the unlimited dimension can correctly retrieve this information. 
- 0-size attribute 
  0-size attribute will be ignored. This case was not considered in the previous versions.
- empty array reading
  Update the way to check if the array index is valid. This will ensure that the empty array reading 
  doesn't fail. 
- _FillValue checking
  Both _FillValue range and datatype are checked. 
  Sometimes the data producers will provide the wrong value and the wrong datatype. 
  In the previous versions,
  the handler only corrects the datatype if the _FillValue type is not the same as the variable type. 


What's new for version 2.3.2(Released with Hyrax 1.13.0)
----------------------------
CF option:
- By default, the leading underscore of a variable path is removed for all files. Although not recommended,
  Users can change the BES key H5.KeepVarLeadingUnderscore at h5.conf.in to be true for 
  backward compatibility if necessary.

- Significantly improve the support of generic HDF5 files that have 2-D lat/lon. This improvement makes 
  some SMAP level 1, level 3 and level 4 products plot-able by CF tools such as Panoply.

- Add the general support of netCDF-4-like HDF5 files that have 2-D lat/lon. This improvement makes TOMS 
  MEaSURES product plot-able by CF tools such as Panoply. It will also support future potential products
  that follow netCDF generic data model.

What's new for version 2.3.1(Released with Hyrax 1.12.2)
----------------------------
There are no new features added in this release. We improve the code quality by fixing a potential resource
leaking issue and other misc. issues.


What's new for version 2.3.0(Released with Hyrax 1.12.1)
----------------------------
Default option:
 -  Add the pure DAP4 support.
    a) HDF5 group is mapped to DAP4 group.
    b) HDF5 dimensions that follow netCDF-4 data model are mapped to DAP4 dimensions.
    c) HDF5 signed 8-bit integer, signed and unsigned 64-bit integers are mapped to correspondng DAP4 
       datatypes.


 -  Re-implement the data access of DAP structure mapped from an HDF5 compound datatype dataset.
    a) The nested compound datatype(array or scalar) and the array type inside a compound datatype are 
       supported.
    b) The base datatype inside an HDF5 compound datatype can contain compound, array datatype and integer,
       float and string(including variable length string)datatypes. Other HDF5 datatypes are not supported. 
    c) The base datatype of an array datatype inside an HDF5 compound datatype can be compound datatype 
       and integer, float and string(including variable length string).
  
 -  Re-implment the data access of a DAP string(array or scalar)mapped from an HDF5 variable length string 
    dataset.
 -  New enforced limitations
    these limitations were not clearly stated in the previous versions. 
    a) We don't support the mapping of HDF5 array datatype to DAP except when the array datatype is used 
       inside an HDF5 compound datatype.
    b) We don't support the mapping of HDF5 compound datatype to DAP when an attribute datatype is an HDF5 
       compound. Such an HDF5 attribute is ignored in the DAP DAS.
CF option: 
 -  Add an option to generate ignored object information from HDF5 to DAP2 mapping.
 -  Add the support of new GPM level 3 products. 
 -  Add the support of OCO-2 products.
 -  Add the support of netCDF-4 classic-like HDF5 files that have 2-D lat/lon. This effectively supports 
    the ASF SeaSat product.
 -  Add the support of generic HDF5 files that have 1-D or 2-D lat/lon. This also generally supports the 
    LP DAAC ASTER GED product.
 -  Fix a few bugs related to _FillValue and duplicate coordiate variables discovered when testing with 
    OMI,GPM and Aquarius products.

[Known Issues]

  - We found that the file netCDF module still fails to generate a netCDF-4 file when a DAP string array
    is mapped to netCDF-4. It can generate the netCDF-3 files correctly. This is not a bug inside the HDF5
    handler or libdap and BES.
    The detailed description of this issue can be found in OPeNDAP's trac ticket
    https://opendap.atlassian.net/browse/TRAC-2185.

  - We also found that "Get as NetCDF 4" function may not work with hdf5_handler especially when
    you download the entire data on big HDF5 files without subsetting. One reason is that
    the current CentOS 6 uses the old NetCDF-4 and HDF5 RPM packages.
    Please contact RedHat directly to speed up the release of new RPM packages through EPEL.


What's new for version 2.2.3(Hyrax 1.11.2, 1.11.1, 1.11.0,1.10.1 1.10.0)
----------------------------
For the CF option: 
 -  Implement an option not to pass HDF5 file ID from DDS/DAS service to data service
    since NcML may not work when the file ID is passed.
 -  Add support for several NASA HDF5 products:
    GES DISC GPM level 1, level 2, level 3 DPR, level 3 GPROF, and level 3 IMERGE products
    GES DISC some netCDF-4 like MEaSUREs products
    OBPG level 3m HDF5 and MOPITT level 3 proudcts
 -  Performanc tuning: Add a BES option not to generate StructMetadata for HDF-EOS5-like files. 
 -  Correct the values for the predefined attribute orig_dimname_list.
 -  Read the description of BES keys in the file h5.conf.in to see if the default values need
    to be changed for your service. 

[Known Issues]

  - We found that the file netCDF module still fails to generate a netCDF-4 file when a DAP string array
    is mapped to netCDF-4. It can generate the netCDF-3 files correctly. This is not a bug inside the HDF5 
    handler or libdap or BES.
    The detailed description of this issue can be found in OPeNDAP's trac ticket
    https://opendap.atlassian.net/browse/TRAC-2185

  - We also found that "Get as NetCDF 4" function may not work with hdf5_handler especially when
    you download the entire data on big HDF5 files without subsetting. One reason is that
    the current CentOS 6 uses the old NetCDF-4 and HDF5 RPM packages.
    Please contact RedHat directly to speed up the release of new RPM packages through EPEL.


What's new for version 2.2.2(Hyrax 1.9.7)
----------------------------
For the CF option: 
 -  Improve file I/O by reducing the number of HDF5 file open/close requests. 
 -  Error handling is greatly improved: resources are released properly when errors occur.

For both CF and default options:
 - Some memory leaks detected by valgrind are fixed.


What's new for version 2.2.1
----------------------------
  Internal code improvements.

What's new for version 2.2.0
----------------------------
  This version supports dimension scale and ICESat/GLAS product. It also fixes
a few bugs. Please see ChangeLog for details about bug fixes.

What's new for version 2.1.1
----------------------------

  This version fixes a few bugs. It handles the concatenation of metadata files
 in a format like "coremetadata.0" and "coremeata.0.1." In previous versions,
it handled only "coremetadata.x" format. It fixes a bug to access GESDISC 
BUV Ozone files.


What's new for version 2.1.0
----------------------------

  This version improves the performance to read HDF5 variables. The previous 
assumption was that NASA files usually don't have many variables. Thus, to save
 time of opening APIs and better coding for error handling, the handler just 
held the API IDs and released them at last. However, GES DISC recently has 
produced a file with more than 1000 objects and want it to be served by 
OPeNDAP. It took much longer than expected. A thorough investigation 
revealed that the retrieval of HDF5 objects was the performance bottle neck. 
The new version addresses this issue by closing the HDF5 object IDs gradually.

  Also, it fixes a bug that one HDF5 object API is not closed and leaked the
 system resources. 

  A new BES key H5.DisableStructMetaAttr is added so the handler can skip
parsing StructMetadata and generating the attribute in DAS output for HDF-EOS5
 files.
  
What's new for version 2.0.0
----------------------------

  This version has significant changes in handling NASA HDF5/HDF-EOS5 data 
products. As the new major version number change indicates, the CF support  
part of the handler is completely re-engineered.

  Since the main effort of this version of the handler is to support the easy
access of most NASA HDF5/HDF-EOS5 data products by following the CF conventions,
the CF option of the HDF5 handler is turned on by default.

  The --enable-cf configuration option is replaced with the BES key called
"H5.EnableCF". You can enable or disable CF feature of the HDF5 handler 
dynamically by modifying the /etc/bes/modules/h5.conf configuration file
first and then restarting the BES server using "besctl restart".

  The "IgnoreUnknownTypes" BES key is removed because the same functionality is
implemented with the new "EnableCF" key. We added three more keys and they
are explained in the NOTES section of INSTALL file.

  The handler is tested with HDF5 version 1.8.8. We believe that the handler 
should work with HDF5 1.8.5 and versions after. To achieve better performance,
 we strongly suggest users to use the latest HDF5 release. See REQUIREMENTS 
section of INSTALL file on how to get the latest RPMs of HDF5.


Supported NASA HDF5/HDF-EOS5 data products in CF Option in the current release
-------------------------------------------------------------------------------

        AURA OMI/HIRDLS/MLS/TES
        MEaSUREs SeaWiFS
        MEaSUREs Ozone
        Aquarius
        GOSAT/acos
        SMAP(simulation)

 Please see the Limitation section below for special notes about OMI L2G and 
GOSAT/acos products. We plan to add new NASA HDF5 and HDF-EOS5 products in the 
future release.


Supported HDF5 data types for both CF and default options
---------------------------------------------------------

  NASA data products do not use all HDF5 datatypes provided by the HDF5 
library. Not all HDF5 datatypes can be mapped to DAP2 datatypes, either. 
Thus, the HDF5 handler team focused on the most common HDF5 datatypes. 
Generally, non-supported data types are ignored. 

        unsigned char, char, 
        unsigned 16-bit integer, 16-bit integer,
        unsigned 32-bit integer, 32-bit integer, 
        32-bit and 64-bit floating data, 
        HDF5 string. 
                  

Supported HDF5 data types for the default option only
-----------------------------------------------------

	Compounds: Compound data types are mapped into DAP2 Structure.

	References: Object or regional references are mapped into URLs.

Other mapping information
-------------------------

CF option:

        Group path: An HDF5 dataset's full path information can be found in 
        "fullnamepath" attribute.  

Default option:

        Group path: An HDF5 dataset's full path information can be found in
        in "HDF5_OBJ_FULLPATH" attribute.

       	Group structure: Group structure, the relation among groups, is 
        mapped into a special attribute called "HDF5_ROOT_GROUP".

	Soft/hard Links: Links are mapped to attributes in DAS.

        Comments: Comments are mapped into DAS attributes.


Implementation details in general
----------------------------------

  The implementation largely follows the design. Please read the following
design note for details at

       http://hdfeos.org/software/hdf5_handler/doc/Reengineering-HDF5-OPeNDAP-handler.pdf

  Here are a few highlights for the implementation.

       o The implementation of the CF option is separated from that of the 
       default option.

        o The HDF5 1.8 APIs are used to retrieve HDF5 object information for 
        both the CF and the default options.

        o The CF option only:
           - HDF5 products are categorized and are separately handled except 
             for the modules that can be shared. One such example is the 
             module that makes the object names follow the CF name conventions.
           - Translating metadata to DAP2 is separated from retrieving the 
             raw data.
           - The handler provides an option to handle object name clashing.
           - BES keys are used to replace the #ifdef macro. This makes the 
             code much cleaner and easier to maintain.
           - The DAP2 variable and attribute names strictly follow the object 
             name conventions in the section 3.2.3 of the design note.

Implementation Details for HDF-EOS5 in CF option 
------------------------------------------------

        Swath: Based on the dimension information specified in the 
        StructMetadata file, fake dimension variables are generated with 
        integer values.

        Zonal Average: The current version only supports the zonal average
        file augmented by the HDF-EOS5 augmentation tool since only the 
        augmented zonal average files are found among NASA HDF-EOS5 zonal 
        average products. Dimension variables are constructed based on the 
        augmentation information stored in the file. For more information 
        about the augmentation, please refer to the BACKGROUND section of 
        HDF-EOS5 augmentation tool page at 

              http://hdfeos.org/software/aug_hdfeos5.php


	Grid: The fake dimension handling is the same as Swath. 
        In addition, based on the projection parameters specified in the 
        StructMetadata file, 1-D latitude and longitude arrays are 
        automatically computed and added in the DAP2 output.

	Metadata: If metadata (e.g., StructMetadata or CoreMetadata) is
        split and stored into multiple attributes (e.g., StructMetadata.0,
        StructMetadata.1, ..., StructMetadata.n), they are merged into one
	string and then parsed so that it can be represented in structured 
        attribute format in DAS output. 


Testing the HDF5 handler
------------------------

  The handler source package has more than 50 test files under data/ directory.
If you build the handler from the source, the 'make check' command will test 
both CF and default options using the test HDF5 files. The full C source codes
for generating the test HDF5 files are also available under data/src directory
 although they are not compiled during the building or testing the handler.  


Limitations for the CF option
-----------------------------

        o Generally the mappings of 64-bit integer, time, enum, bitfield, 
        opaque, compound, array, and reference types are not supported. 
        The mapping of one HDF5 64-bit integer variable into two DAP2 32-bit 
        integers in GOSAT/acos is based on the discussions with the data 
        producers. Except one dimensional variable length string array, 
        the mapping of the variable length datatype is not supported either. 
        The handler simply ignores these unsupported datatypes.
        
        o HDF5 files containing cyclic groups are not supported. 
        If such files are encountered, the handler hangs with infinite loops.

        o The handler ignores soft links, external links and comments. 
        A hardlink is handled as an HDF5 object.

        o For the HDF5 datasets created with the scalar dataspace, the handler 
        can only support the string datatypes. It ignores the datasets created
         with other datatypes. HDF5 allows the size of a dimension to be 0 
        (zero) for a dataspace. The handler also ignores the datasets created 
        with such dataspace. The mapping of any HDF5 datasets with NULL 
        dataspace is also ignored.

        o Currently, GOSAT/acos and OMI level 2G products cannot be visualized 
        by OPeNDAP visualization tools because of the limitations of the 
        current CF conventions and netCDF-Java visualization tools (IDV, 
        Panoply, etc.)
        
        o We found object reference attributes in several NASA products. 
        Since these attributes are only used to generate the DAP2 dimensions 
        and coordinate variables, ignoring the mapping of these attributes 
        doesn't lose any essential information for OPeNDAP users.

        o fileout_netcdf prints H5Fclose() internal error message on CentOS6 
        with Hyrax-1.8.8 and hdf5-1.8.5.patch1-7.el6.x86_64.rpm:

        HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
        #000: ../../src/H5F.c line 1957 in H5Fclose(): invalid file identifier
        major: Invalid arguments to routine
        minor: Inappropriate type

        However, users can still get HDF5 as either NetCDF-3 or NetCDF-4
        successfully. We strongly recommend you to use the latest HDF5 and
        NetCDF RPMs.

Limitations in default option
-----------------------------

	o No support for HDF5 files that have a '.' in a group/dataset
	  name.

	o The mappings of HDF5 64-bit integer, time, enum, bitfield, and 
        opaque datatypes are not supported.

	o Except for one dimensional HDF5 variable length string array, HDF5 
        variable length datatype is not supported either.

        o HDF5 external links are ignored. The mapping of HDF5 objects with 
        NULL dataspace is not supported.


Additional background on the HDF5 handler
-----------------------------------------
  The HDF5 handler is one component of the Hyrax BES; the Hyrax BES
 software is designed to allow any number of handlers to be
configured easily. See the BES Server README and INSTALL files for
information about configuration, including how to use this handler.


Installing the HDF5 handler in Hyrax
------------------------------------

  The Linux RPM package will install h5.conf file with all options true except
 for the H5.EnableCheckNameClashing option. 

  A test HDF5 file is also installed, so after installing this handler, Hyrax
will have a data to serve providing an easy way to test your new
installation and to see how a working handler should look. To use
this, make sure that you first install the BES, and that dap-server
gets installed too. 

  Finally, every time you install or reinstall handlers, make sure to 
restart the BES and OLFS.


Muqun Yang (myang6@hdfgroup.org)
Hyo-Kyung Lee (hyoklee@hdfgroup.org)