Reading h5 with `consistency_check = False` fails
Closed this issue · 4 comments
Example pull request: #217
https://github.com/gafusion/omas/actions/runs/3413780388/jobs/5680901424
It seems that consistency_check ensures the right type of conversions. In my case, it was failing when reading fields of type str
which are stored in h5 as bytes which resulted in
test_load_omas_coil_description.py:16:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../coils/coils_io/pf_coils.py:149: in load_pf_configuration_from_ods_h5
ds.coords["coil_label"] = ods["pf_active.coil.:.name"]
../../../anaconda3/envs/compass/lib/python3.7/site-packages/omas/omas_core.py:1322: in __getitem__
return value.__getitem__(key[1:], cocos_and_coords)
../../../anaconda3/envs/compass/lib/python3.7/site-packages/omas/omas_core.py:1322: in __getitem__
return value.__getitem__(key[1:], cocos_and_coords)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = [{'element': [{'geometry': {'rectangle': {'height': 0.208, 'r': 0.42, 'width': 0.095, 'z': 0.1015}}, 'turns_with_sign'...z': -0.42}}, 'turns_with_sign': 40.0}], 'identifier': b'PF4L', 'mass': 1145.0, 'name': b'PF4L', 'resistance': 21910.0}]
key = [slice(None, None, None), 'name'], cocos_and_coords = True
def __getitem__(self, key, cocos_and_coords=True):
"""
ODS getitem method allows support for different syntaxes to access data
:param key: different syntaxes to access data, for example:
* ods['equilibrium']['time_slice'][0]['profiles_2d'][0]['psi'] # standard Python dictionary syntax
* ods['equilibrium.time_slice[0].profiles_2d[0].psi'] # IMAS hierarchical tree syntax
* ods['equilibrium.time_slice.0.profiles_2d.0.psi'] # dot separated string syntax
* ods[['equilibrium','time_slice',0,'profiles_2d',0,'psi']] # list of nodes syntax
NOTE: Python3.6+ f-strings can be very handy when looping over arrays of structures. For example:
for time_index in range(len(ods[f'equilibrium.time_slice'])):
for grid_index in range(len(ods[f'equilibrium.time_slice.{time_index}.profiles_2d'])):
print(ods[f'equilibrium.time_slice.{time_index}.profiles_2d.{grid_index}.psi'])
:param cocos_and_coords: processing of cocos transforms and coordinates interpolations [True/False/None]
* True: enabled COCOS and enabled interpolation
* False: enabled COCOS and disabled interpolation
* None: disabled COCOS and disabled interpolation
:return: ODS value
"""
# handle pattern match
if isinstance(key, str) and key.startswith('@'):
key = self.search_paths(key, 1, '@')[0]
# handle individual keys as well as full paths
key = p2l(key)
if not len(key):
return self
# negative numbers are used to address arrays of structures from the end
if isinstance(key[0], int) and key[0] < 0:
if self.omas_data is None:
key[0] = 0
elif isinstance(self.omas_data, list):
if not len(self.omas_data):
key[0] = 0
else:
key[0] = len(self.omas_data) + key[0]
# '+' is used to append new entry in array structure
if key[0] == '+':
if self.omas_data is None:
key[0] = 0
elif isinstance(self.omas_data, list):
key[0] = len(self.omas_data)
# slice
elif isinstance(key[0], str) and ':' in key[0]:
key[0] = slice(*map(lambda x: int(x.strip()) if x.strip() else None, key[0].split(':')))
dynamically_created = False
# data slicing
# NOTE: OMAS will try to return numpy arrays if the sliced data can be stacked in a uniform array
# otherwise a list will be returned (that's where we do `return data0` below)
if isinstance(key[0], slice):
data0 = []
for k in self.keys(dynamic=1)[key[0]]:
try:
data0.append(self.__getitem__([k] + key[1:], cocos_and_coords))
except ValueError:
data0.append([])
# raise an error if no data is returned
if not len(data0):
raise ValueError('`%s` has no data' % self.location)
# if they are filled but do not have the same number of dimensions
shapes = [numpy.asarray(item).shape for item in data0 if numpy.asarray(item).size]
if not len(shapes):
return numpy.asarray(data0)
if not all(len(shape) == len(shapes[0]) for shape in shapes[1:]):
return data0
# find maximum shape
max_shape = []
for shape in shapes:
for k, s in enumerate(shape):
if len(max_shape) < k + 1:
max_shape.append(s)
else:
max_shape[k] = max(max_shape[k], s)
max_shape = tuple([len(data0)] + max_shape)
# find types
dtypes = [numpy.asarray(item).dtype for item in data0 if numpy.asarray(item).size]
if not len(dtypes):
return numpy.asarray(data0)
if not all(dtype.char == dtypes[0].char for dtype in dtypes[1:]):
return data0
dtype = dtypes[0]
# array of strings
if dtype.char in 'U':
return numpy.asarray(data0)
# define an empty array of shape max_shape
if dtype.char in 'iIl':
data = numpy.full(max_shape, 0)
elif dtype.char in 'df':
data = numpy.full(max_shape, numpy.nan)
elif dtype.char in 'O':
data = numpy.full(max_shape, object())
else:
> raise ValueError('Not an IMAS data type %s' % dtype.char)
E ValueError: Not an IMAS data type S
The above error is raised on
ods["pf_active.coil.:.name"]
Partial fix would be, if the consitency_check
method is divided into two where one is controlling IMAS typing and the other is controlling whether all fields are following IMAS structure. This would be generally very useful for our use case, where we want to have some kind of extended IMAS schema for data description. Some data (for example mass of the pf_active coil or time-dependent conductivity) are not in the structure and could benefit from having a possibility to extend the schema while using benefits of OMAS.
I looked at this some. I did not make any progress.
@kripnerl to extend the data schema use the extra_structures
feature in OMAS
https://gafusion.github.io/omas/auto_examples/extra_structures.html#sphx-glr-auto-examples-extra-structures-py
Would this solve your issue?
Stale issue message