manual creation of dataset
Opened this issue ยท 8 comments
The following code created a dataset, but only the z
array is shown.
There are two issues here:
- The
x
andy
arrays do not show. This is something related to the.action_indices
- According to the documentation https://github.com/qdev-dk/Qcodes/blob/master/qcodes/data/data_set.py#L76 the argument
arrays
should be adict
, but that generates an error
import qcodes
from qcodes.tests.data_mocks import *
def DataSet2D(location=None):
# DataSet with one 2D array with 4 x 6 points
yy, xx = numpy.meshgrid(range(4), range(6))
zz = xx**2+yy**2
# outer setpoint should be 1D
xx = xx[:, 0]
x = DataArray(name='x', array_id='x', label='X', preset_data=xx, is_setpoint=True)
y = DataArray(name='y', array_id='y', label='Y', preset_data=yy, set_arrays=(x,),
is_setpoint=True)
z = DataArray(name='z', array_id='z', label='Z', preset_data=zz, set_arrays=(x, y))
#return new_data(arrays={'x': x, 'y': y, 'z': z}, location=location) # this would fail
return new_data(arrays=[x,y,z], location=location)
d=DataSet2D()
print(d)
@peendebak @eendebakpt yeah, it must not be a dict, because iterating over a dict yelds keys and we clearly want to add the data_array.
But man the code/docs are misleading.
There is a data_set.arrays, which is indeed a {arra_id: data_array} thing.
But actually the data is all there.
Maybe a bug in the repr function of the dataset ?
the bug is in def _clean_array_ids(self, arrays), and precisely in the return statement.
If any array has the same action_index then it will not exist in the action_id_map (whose meaning goes beyond my understanding). Maybe @alexcjohnson can shed some light on it ?
@eendebakpt @peendebak I guess that any data create from the loop will always have different action ids.
@giulioungaretti @alexcjohnson The following does work. The issue is indeed with the action_id_map
. Do we want to solve this, or wait untill the entire DataSet
objects gets fixed?
def DataSet2D(location=None):
# DataSet with one 2D array with 4 x 6 points
yy, xx = numpy.meshgrid(range(4), range(6))
zz = xx**2+yy**2
# outer setpoint should be 1D
xx = xx[:, 0]
x = DataArray(name='x', array_id='x', label='X', preset_data=xx, is_setpoint=True)
y = DataArray(name='y', array_id='y', label='Y', preset_data=yy, set_arrays=(x,),
is_setpoint=True)
z = DataArray(name='z', array_id='z', label='Z', preset_data=zz, set_arrays=(x, y))
print('new data...')
dd = new_data(arrays=[], location=location)
dd.add_array(x)
dd.add_array(y)
dd.add_array(z)
return dd
d=DataSet2D()
print(d)
@peendebak thanks for bringing this up.
The bandaid solution to the __repr__
problem would be to check if action_id_map
actually points to all the arrays before trying to use it. Or I guess to give each array a unique action_indices
(in the order the arrays were provided!) inside _clean_array_ids
, that would solve it for this particular case.
But really, per my TODO we should get action_id_map
out of DataSet
entirely, its function is really internal to a Loop
so DataSet
shouldn't know anything about it.
One difficulty with this, and the reason I think @MerlinSmiles used action_id_map
in __repr__
in the first place, is that we'd like to maintain the order of arrays within a DataSet
so that it tells you the order of acquisition. Currently they're unordered because DataSet.arrays
is a dict
, so only the action_id
s gives an order. We can change it to an OrderedDict
or something to solve this. That change would need to be propagated to our storage formats - @AdriaanRol I think we talked about this at some point, I don't know if there's a natural way to do this within HDF5? Right now I believe action_id_map
is lost when you save and reload a DataSet
, so the order of __repr__
entries will be undefined after that even if you made it with a regular Loop
in the first place.
@alexcjohnson
I approve of using OrderedDict
for the arrays ๐ .
From a hdf5/h5py technical perspective the h5py Group works like a dictionary. The way I would encode this is by adding a list containing array_id's that contains the order of the arrays. That way it is easy to both store and extract in the proper order (in any case quite natural).
Additionally I would like to have a good example dataset and a good test to see if two datasets are identical to see if I correctly write and read. (most importantly this test will tell me what actually defines the dataset)
In microsoft/Qcodes#179 I am currently passing all tests for writing and saving simple data but I have not included things like the action id (which may explain why it does not yet work with the loop). The tests I use are based on the test_format, which tests the gnuplot formatter.
tl;dr
OrderedDict
๐ , action_ids ๐OrderedDict
in hdf5 -> easy to implement- required -> way to test if working correctly
microsoft/Qcodes#162 won't happen if the madness in action_id_map gets fixed, which in turn will probably fix this.