signetlabdei/sem

Error with get_results_as_numpy_array

mrequena opened this issue · 6 comments

If I try to use get_results_as_numpy_array instead of get_results_as_xarray in wifi-plotting-xarray.py example, I get the following error:

Traceback (most recent call last):
File "my-wifi-plotting-xarray.py", line 123, in
main()
File "my-wifi-plotting-xarray.py", line 81, in main
get_average_throughput)
File "/mnt/data/sem/git-repos/develop/sem/manager.py", line 335, in get_results_as_numpy_array
result_parsing_function)))
TypeError: get_space() missing 1 required positional argument: 'runs'

By the way, there is no example using this method. So, I don't really know how to use it. But it seems 'run' parameter is missing in the method declaration and in the call to self.get_space()

The error seems to point that way, you are right, however in the current develop version of the code the parameters seem to line up right. Can you make sure you are using the current develop branch?

You are right, there is no example using this function, since it's very similar to the one which uses xarray. You can find the documentation for the function in the API reference, here!

You are right. This error is not present in devel branch.

How can I access/use the returned numpy array?
If I use:
params = {
'nWifi': [1, 3],
'distance': [1, 5, 10],
'useRts': ['false', 'true'],
'useShortGuardInterval': ['false', 'true'],
'mcs': list(range(2, 8, 1)),
'channelWidth': ['20'],
'simulationTime': [4],
}

I get results.shape = (2, 6, 2, 3, 2, 2), so I don't know how the params are ordered.

This is one of the tricky parts of supporting Python versions lower than 3.6, where dictionaries are not ordered by default. When I was originally developing this, the data structure would be ordered according to the order of keys in the dictionary passed by the user, but this is no longer the case in Python 3.4 and 3.5 - thanks for pointing this out, I had missed it!

As I see it, we have a few different ways of dealing with this:

  1. Return, together with the Numpy multidimensional array, a list of the parameters with the order that is used in the array;
  2. Always order the output params according to some criterion, e.g. alphabetically;
  3. Ask the user to provide the parameters in an ordered data structure (like a list of param-value tuples, or an OrderedDict).

Personally I don't like 3, and would rather go with either 1 or 2. Which one would you find most convenient?

By the way, what do you think about squeezing the array? Currently, if a dimension only has one possible value, we remove that dimension altogether. Did you find this confusing?

I understand, this is a big difference between 3.6 (dicts maintains the insertion ordered) and 3.4/3.5 (dicts are unordered).

"What's new in Python 3.6" says:
https://docs.python.org/3/whatsnew/3.6.html#new-dict-implementation
"The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5)"

So, I would not rely on this insertion ordered to return the results (even with Python 3.6). The simplest thing is to order alphabetically (2).

For (1), what order do you mean? The insertion order? Can you get this in python 3.4/3.5?

For (3), at least, by default, I would not ask the user to provide the parameters in an ordered data structure. It may be confusing for new users. But on the other hand, it may be useful for some experienced users. Maybe as you have limited time, the simplest thing to implement is (2). (3) could be a new feature.

For squeezing the array... Initially, I found it confusing. What is the rationale (advantages/disadvantages) behind this?
Imaging a user that launches a campaign with a few values for the parameters (only 1 value for some parameters) and when he/she knows it works, he/she launches the full campaign with all the parameters values. In both cases, the results array will have different dimensions, right? This may complicate the code to process the results.

I agree this 'feature' should not be relied upon, at least for now.

For 1, I mean that together with the output numpy array, we can pass a list like [[param1, values], [param2, values], ...], where the parameters have the same order they have in the numpy array. I agree that we can simply return them alphabetically, since it's the easiest solution.

Squeezing should help with accessing the data structure: if you query for 10 parameters, of which only one consists in a range, it's useless to output a 10-dimensional array where 9 dimensions have size 1. Squeezing lightens the code to access the structure because you are able to use results[range] instead of results[0,0,0,0,0,range,0,0,0,0], for instance, however I agree it can be confusing (hence the request for feedback).

Maybe we can make squeezing optional, or simply rely on the user knowing he can squeeze the data structure we output to access it more easily!

I've decided to remove squeezing in 5dc3e21, since it was more of a confusion point than a strength. Users can perform squeezing on their own with just a line of code, anyway.