manual creation of dataset

Question

manual creation of dataset

Opened this issue 8 years ago · 8 comments

The following code created a dataset, but only the z array is shown.

There are two issues here:

The x and y arrays do not show. This is something related to the .action_indices
According to the documentation https://github.com/qdev-dk/Qcodes/blob/master/qcodes/data/data_set.py#L76 the argument arrays should be a dict, but that generates an error

import qcodes
from qcodes.tests.data_mocks import *

def DataSet2D(location=None):
    # DataSet with one 2D array with 4 x 6 points
    yy, xx = numpy.meshgrid(range(4), range(6))
    zz = xx**2+yy**2
    # outer setpoint should be 1D
    xx = xx[:, 0]
    x = DataArray(name='x', array_id='x', label='X', preset_data=xx, is_setpoint=True)
    y = DataArray(name='y', array_id='y',  label='Y', preset_data=yy, set_arrays=(x,),
                  is_setpoint=True)
    z = DataArray(name='z',  array_id='z', label='Z', preset_data=zz, set_arrays=(x, y))
    #return new_data(arrays={'x': x, 'y': y, 'z': z}, location=location) # this would fail
    return new_data(arrays=[x,y,z], location=location)

d=DataSet2D()
print(d)

@alexcjohnson @giulioungaretti

Answer 1 · 2016-08-19T13:20:53.000Z

@peendebak @eendebakpt yeah, it must not be a dict, because iterating over a dict yelds keys and we clearly want to add the data_array.
But man the code/docs are misleading.
There is a data_set.arrays, which is indeed a {arra_id: data_array} thing.

But actually the data is all there.
Maybe a bug in the repr function of the dataset ?

Answer 2 · 2016-08-19T13:32:49.000Z

giulioungaretti commented 8 years ago

Answer 3 · 2016-08-19T14:01:52.000Z

the bug is in def _clean_array_ids(self, arrays), and precisely in the return statement.

If any array has the same action_index then it will not exist in the action_id_map (whose meaning goes beyond my understanding). Maybe @alexcjohnson can shed some light on it ?

Answer 4 · 2016-08-19T14:02:33.000Z

@eendebakpt @peendebak I guess that any data create from the loop will always have different action ids.

Answer 5 · 2016-08-23T21:01:23.000Z

@giulioungaretti @alexcjohnson The following does work. The issue is indeed with the action_id_map. Do we want to solve this, or wait untill the entire DataSet objects gets fixed?

def DataSet2D(location=None):
    # DataSet with one 2D array with 4 x 6 points
    yy, xx = numpy.meshgrid(range(4), range(6))
    zz = xx**2+yy**2
    # outer setpoint should be 1D
    xx = xx[:, 0]
    x = DataArray(name='x', array_id='x', label='X', preset_data=xx, is_setpoint=True)
    y = DataArray(name='y', array_id='y',  label='Y', preset_data=yy, set_arrays=(x,),
                  is_setpoint=True)
    z = DataArray(name='z',  array_id='z', label='Z', preset_data=zz, set_arrays=(x, y))

    print('new data...')
    dd =  new_data(arrays=[], location=location)
    dd.add_array(x)
    dd.add_array(y)
    dd.add_array(z)
    return dd
d=DataSet2D()
print(d)

Answer 6 · 2016-08-24T04:04:58.000Z

@peendebak thanks for bringing this up.

The bandaid solution to the __repr__ problem would be to check if action_id_map actually points to all the arrays before trying to use it. Or I guess to give each array a unique action_indices (in the order the arrays were provided!) inside _clean_array_ids, that would solve it for this particular case.

But really, per my TODO we should get action_id_map out of DataSet entirely, its function is really internal to a Loop so DataSet shouldn't know anything about it.

One difficulty with this, and the reason I think @MerlinSmiles used action_id_map in __repr__ in the first place, is that we'd like to maintain the order of arrays within a DataSet so that it tells you the order of acquisition. Currently they're unordered because DataSet.arrays is a dict, so only the action_ids gives an order. We can change it to an OrderedDict or something to solve this. That change would need to be propagated to our storage formats - @AdriaanRol I think we talked about this at some point, I don't know if there's a natural way to do this within HDF5? Right now I believe action_id_map is lost when you save and reload a DataSet, so the order of __repr__ entries will be undefined after that even if you made it with a regular Loop in the first place.

Answer 7 · 2016-08-24T12:29:50.000Z

@alexcjohnson
I approve of using OrderedDict for the arrays 👍 .

From a hdf5/h5py technical perspective the h5py Group works like a dictionary. The way I would encode this is by adding a list containing array_id's that contains the order of the arrays. That way it is easy to both store and extract in the proper order (in any case quite natural).

Additionally I would like to have a good example dataset and a good test to see if two datasets are identical to see if I correctly write and read. (most importantly this test will tell me what actually defines the dataset)

In microsoft/Qcodes#179 I am currently passing all tests for writing and saving simple data but I have not included things like the action id (which may explain why it does not yet work with the loop). The tests I use are based on the test_format, which tests the gnuplot formatter.

tl;dr

OrderedDict 👍 , action_ids 👎
OrderedDict in hdf5 -> easy to implement
required -> way to test if working correctly

Answer 8 · 2016-10-11T11:33:56.000Z

microsoft/Qcodes#162 won't happen if the madness in action_id_map gets fixed, which in turn will probably fix this.