rotal/alembic

Alembic files are malformed, h5ls and abcecho show execption stack traces for H5Oget_info_by_name

Closed this issue · 5 comments

see discussion thread:
http://groups.google.com/group/alembic-discussion/browse_thread/thread/a741b4137
49591e2/3618e96b726e55e4?show_docid=3618e96b726e55e4

HDF5-DIAG: Error detected in HDF5 (1.8.7) thread 139740866865248:
  #000: H5O.c line 657 in H5Oget_info_by_name(): object not found
    major: Symbol table
    minor: Object not found
  #001: H5Gloc.c line 747 in H5G_loc_info(): can't find object
    major: Symbol table
    minor: Object not found
  #002: H5Gtraverse.c line 905 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #003: H5Gtraverse.c line 688 in H5G_traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
  #004: H5Gloc.c line 702 in H5G_loc_info_cb(): can't get object info
    major: Symbol table
    minor: Can't get value
  #005: H5O.c line 2865 in H5O_get_info(): can't retrieve object's btree & heap info
    major: Object header
    minor: Can't get value
  #006: H5Goh.c line 358 in H5O_group_bh_info(): can't read LINFO message
    major: Symbol table
    minor: Can't get value
  #007: H5Omessage.c line 545 in H5O_msg_read_oh(): unable to decode message
    major: Object header
    minor: Unable to decode value
  #008: H5Olinfo.c line 129 in H5O_linfo_decode(): bad version number for message

Original issue reported on code.google.com by ble...@gmail.com on 18 Nov 2011 at 2:48

valgrind of Kevin's scene didn't turn up any complaints about Alembic.

Original comment by miller.lucas on 18 Nov 2011 at 2:57

I've been hunting this bug where a bad cache is generated. I'm now 70% 
convinced it's an hdf5 bug.   On the read side was have an HDF5 group (our 
compound property) and we are asking for the hdf5 group that stores the 
after-the-first sample values of the property childBnds.  In my test case this 
is "LarmPvMidOriNUL/.prop/.xform/.childBnds.smpi".   The hdf5 code fails to be 
able to resolve the HDF5 hard link to actual link's destination so it can't 
open the smpi group.

I've been debugging this on the write side, and I've been trying to see how an 
hdf5's group's hard links are created. At a certain point hdf5 has code that 
says "ok, you have too many hard links in the parent group, I'll change the 
parent group to use "dense link storage".  If you're curious you can see this 
in H5G_obj_insert ():
        /* If there's still a small enough number of links, use the 'link' message */
        /* (If the encoded form of the link is too large to fit into an object
         *  header message, convert to using dense link storage instead of link messages)
         */
I don't think this actual routine has a bug... my hunch is that some other bit 
of hdf5 has similar logic though for adding other things into the parent group 
(maybe hdf5 attributes or datatypes. not certain), and this other location has 
a bad interaction.

What i have done though, is I changed my hdf5 so that the switch over to "dense 
link storage" happens sooner (there's a simple link.nlinks < ginfo.max_compact 
check that tweaked by adding 4).  I realize that is a total hack, but in effect 
I'm causing hdf5 to convert to dense link storage sooner, and with this change 
my test case creates a valid cache.  I haven't changed anything in Alembic at 
all, so we aren't keeping any hdf5 objects open differently.
This leads me to strongly suspect that some bit of logic that relates to group 
conversion operations inside of hdf5 has a bug.  

Original comment by cookingw...@gmail.com on 21 Dec 2011 at 9:24

That is very compelling, luckily we don't need to to hack HDF5, we could use:
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetAttrPhaseChange

And if this isn't just a problem for attrs we might need to investigate:
 http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLinkPhaseChange
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetSharedMesgPhaseChang
e

Original comment by miller.lucas on 22 Dec 2011 at 1:42

This should be at least partially fixed in 1.0.4 with the use of set link phase 
change.

Original comment by miller.lucas on 24 Jan 2012 at 1:36

  • Changed state: PleaseVerify

Original comment by miller.lucas on 24 Jan 2012 at 1:41

  • Changed state: Verified