atmtools/arts

pyarts and unsupported classes

Closed this issue · 6 comments

Is your enhancement request related to a problem? Please describe.
When running pyarts, there are several classes that are not supported by value return from the workspace variable. Specifically:

import pyarts
ws = pyarts.workspace.Workspace()
pyarts.xml.save(ws.sensor_time, "test.xml") # or pyarts.xml.save(ws.sensor_time.value, "test.xml")

Does not work. Instead you have to do:

import pyarts
ws = pyarts.workspace.Workspace()
pyarts.classes.from_workspace(ws.sensor_time).savexml("test.xml")

Describe the solution you'd like
I would like the first option to work. Both the active and out-commented versions. The out-commented version is the only one that works today, and then only for some types.

Describe alternatives you've considered
There are two steps needed to make this happen that I can see:

  1. Make pyarts.xml.save accept WorkspaceVariable and make it wrap the behavior above for this type.
  2. Make WorkspaceVariable::value return pyarts.classes.from_workspace(ws.VAR) when VAR type is not explicitly supported by other WorkspaceVariable::value methods. Now, pyarts.xml.save needs to know if the method called has its own complete type(ws.VAR.value)::savexml method. If it does, prefer it to the previous mapped version of XML writing.

The savexml methods in classes all wraps the ARTS versions. This makes them preferred for writing XML files to the re-written python versions. I would prefer if WorkspaceVariable::value always returns pyarts.classes.from_workspace(ws.VAR), but this might break a few scripts? I don't really know what would break. ArrayOfIndex?

There are probably better versions here but I don't know what they would be.

Are you willing to help adding this enhancement?
No. I don't understand all the attributes and all that of python. This makes me a bad candidate to implement these changes.

Additional context
The fact that some WorkspaceVariable::value methods doesn't work has led to user questions and I think it will lead to more. The pyarts.classes methods will support all this because the github build process will fail if ARTS developers don't add

Totally see your point and this should not to be too difficult to achieve. However, I am currently working in a different direction and planning to refactor the Python interface prior to the 3.0 release.

The problem with the current approaches (pyarts.types and pyarts.classes) is that there is way too much redundancy. Together with the ARTS workspace interface it essentially means that for all functionality we currently need 2 or 3 additional interfaces. This doesn't scale and and things will get inconsistent.

The solution I am currently working towards is to automate the generation of pybind11-based interfaces. If I get this to work, it would allow us to make all ARTS class accessible through Python, even those which aren't represented in the workspace. Apart from solving the problem with the interface, this approach has the benefits of greatly increasing the capabilities of ARTS and will allow us to focus on improving ARTS core functionality instead of writing layers and layers of interfaces.

I have a working prototype of this, but it still lacks some functionality to fully automate the generation of interface files. This will likely take a few more months but I am on it and (still) fairly confident that it's doable.

tl;dr: If it's really urgent I can do it, but otherwise there will be a more general solution in a few months.

I agree about the redundancies. It is a bit much, especially as we start using more specialized classes. It's not urgent, so a better solution would be nice.

I've been using pybind11 only for getting python code running in C++. Not the opposite. If we add it as a dependency then I can see it being useful for the stuff you describe. I had a look at the generation of class interfaces with pybind11 before but it looked like it required quite some code in connection with the class definitions in C++. It would be great if you have some way to automate that.

If pybind11 can do that without much overhead, that would be great. Some time ago, I looked at SWIG and abandoned it because it required writing interface files in some pseudo language manually for all types. I agree with Richard that there is no urgency to have this right now.

@olemke This is the pybind11 docs for simple class interfaces: https://pybind11.readthedocs.io/en/latest/classes.html. You need quite a bit more for virtual classes if you read on. You will have a lot of pseudo language. However, I find it much better than the pseudo-language we have today in arts_api_classes.cc, which is just my own way to wrap classes/structs. I trust it works because the opposite works flawlessly (though I had to write some 150-lines of weird code in a wrapper to understand what was going on, since pybind11's object class handles everything, and that made debug a pain without wrappers for class interface, class instance, functions and actual values.)

@simonpf I would of course like it if this is made to work automatically and cover most of our use-cases. However, reading the pybind11 docs again this morning, I think it would be a better idea to go forward with a simple working example in the draft list of our PRs. The stuff in arts_api_classes.cc+pyarts.classes makes a mockery of pybind11 in terms of style and usability, but it can mimic much of what at least I need inside python (direct data access, file IO, and some interaction with matplotlib). If you go ahead with the full automation, I think there are many loopholes to be missed.

Some of the things where automation might be difficult from the top-of-my-mind:

  1. How do you deal with enum? In C++, having Enum::Option interfaces exposed is possible because no code would ever write to it in a bad way. So a class where the enum option needs to be exposed can do so, knowing that any interface with the enum of the class will be caught in a code review or protected by a throwing statement of some kind if a user sees it. There are also some special enums in ARTS, where the author overwrite default behavior of the enum list for whatever reason. An example of this latter is the particle orientation. Everything using the ENUMCLASS interface is also going to be very different from the pure enums, even from pure enums that are defined as class statics (e.g., QuantumIdentifier). These latters have to all be manually mapped to strings (or moved out of the class and into an ENUMCLASS, but that's even more work for an interface change).

  2. How do you deal with when we don't want to expose the internals? This is the case for instance for RetrievalQuantity, where several matpack types have been purposely hidden (no file IO, no writing, private).

  3. How do you intend to deal with file IO and screen output? Some classes don't have the latter (e.g. Ppath) and pretty much none of the internal classes have the latter. It is very useful to know that you are calling the same file interaction function from ARTS and python, because that way you can properly meld your outputs while debugging things.

All in all, I repeat: I think it would be a better idea to go forward with a simple working example. Mark it as a draft, mark it as experimental and "will break", but go ahead and submit what you have that is already working.

I have a prototype of how that would look one of my branches if you want to have a look. But the main work here is related to the refactoring of the scattering data so I will wait with a PR. Regarding the other points:

  1. I don't see why normal use of enums should pose any problem. There's enums in this file if that answers your question.

  2. The tool I am working on is based on libclang and has full semantic understanding of the source code. So it obviously won't expose anything that's private. For additional customization I am currently using short directives in comments, e.g.

// pxx :: hide
/**
 * ...
 *

but plan to add options to exclude names and type names using a command line flag. I am not aiming for a full integration into ARTS, i.e. add a transitive dependency on libclang, but instead include the generated source files maybe together with a script that generates them.

  1. I am planning to check if the object has a matching << operator and then just piggy back that one. File IO will of course use the native functions. The general aim is increase the use of native functions which will save us time and make code more robust. I am also planning to expose the doxygen comments as native Python docstrings, which I think would be pretty neat.

Closed with #400 merging. We leave other things for the future.