leon-thomm/ryvencore

dtypes default not used

liamhuber opened this issue · 5 comments

Hi @leon-thomm, I've been using ryvencore over on ironflow to build a jupyter-based gui to drive for our project pyiron and I'm absolutely loving it.

I discovered that the default values in the initial input channels were not being used. In addition to the dtypes, where default is actively solicited in the initializer, I thought it might be nice to allow specifying defaults for non-typed input using add_data['val'].

A MWE example is below with a little fix to put in NodeInput.__init__ if you'd like. I'd be happy to make a PR, but when I forked the repo I discovered that main here is way out of date compared to the conda-forge installation (v0.3.1.1). NodeInput.__init__ doesn't even take a dtypes: Dtypes = None kwarg. Where is the more up-to-date code coming from? The only branch ahead of main here doesn't have it, and unfortunately I don't have permissions to dig into the conda feedstock source to look there.

MWE:

import ryvencore as rc

class MyNode(rc.Node):
    title = 'mwe'
    init_inputs = [
        rc.NodeInputBP(type_='data', add_data={'val': 'towel'}),  # Without dtype
        rc.NodeInputBP(type_='data', dtype=rc.dtypes.Integer(default=42))
    ]
    
    def place_event(self):
        super().place_event()
        print(f"{self.__class__.__name__} got placed")
    
session = rc.Session()
session.register_node(MyNode)
script = session.create_script(title='mwe')
node = script.flow.create_node(MyNode)
print("Initialized?", node.initialized)
print("Values after initialization:", node.input(0), node.input(1))

def set_defaults(node_input):
    """Something like could be added to NodeInput.__init__"""
    if node_input.dtype is not None:
        node_input.update(node_input.dtype.default)
    elif 'val' in node_input.add_data.keys():
        node_input.update(node_input.add_data['val'])
        
set_defaults(node.inputs[0])
set_defaults(node.inputs[1])
print("Values after setting defaults:", node.input(0), node.input(1))

>>> MyNode got placed
>>> Initialized? True
>>> Values after initialization: None None
>>> Values after setting defaults: towel 42

edited for hyperlinks

Hey, thanks for the suggestion! Regarding the out-of-date main branch: I recently made some rapid changes, resulting in ryvencore v0.4 (experimental). They were published as alpha versions on pypi but the conda feedstock doesn't consider alpha versions, that's why this main branch strongly differs from the conda version. If you'd like to use v0.4 in anaconda you need to explicitly install it from pypi.

A few words on the changes: Semantically, ryvencore hasn't changed so much, it has just become much more clean and simple internally. One of the more important interface changes is that all data passed between nodes must be wrapped in a Data object on which serialization must be defined, though this can be defined arbitrarily, so one could also just not implement it and deal with the effects but that shouldn't be the usual case. This attempts to solve a fundamental flaw in the original design that can generally break de-/serialization (save and load). Also, this should be much faster in many cases: theoretically, in v0.3 loading could in the worst case run in $\mathcal{O}(|V|!)$, in v0.4 it is $\mathcal{O}(|V|)$.

The whole dtypes thing was so far really only used to generate default widgets in Ryven. Because I didn't find a pretty way of implementing them as add-on with the new add-on system (maybe this calls for adapting that as well?), for the time being I decided to outsource those dtypes to Ryven, but I have yet to adapt Ryven to the new ryvencore, and further changes are possible on both sides.

I also made an effort in adding comprehensive documentation, you can view it currently at https://leon-thomm.github.io/ryvencore/. For your integration of ryvencore, I think v0.4 really has some benefits. It solves that save & load issue, is even more lightweight, easier to debug, should be a bit faster already, and has higher potential for optimizations. But there are certainly some bugs as well.

I think defaults for inputs make sense, even as a native feature. In case you look into v0.4, would you consider creating a PR?

Thanks for the fast reply! 0.4 sounds great. We've been thinking a bit about stricter type checking when making new connections between ports, and even leveraging an ontology to further restrict/give hints based on how that data winds up getting used up-/down-stream. Having a centralized Data class that is always produced/consumed by ports should be super helpful for developments in that direction!

The docs for 0.4 and other improvements therein also look nice -- $n!$ to $n$ is some wicked speedup. There is a bit of a hiccup using it for ironflow though, as the pyiron project is quite strict about all dependencies coming strictly from conda-forge. I have to confess that my knowledge of conda-forge feedstock maintenance is extremely limited, but it looks like setting up a pre-release is at least possible. What do you think of getting 0.4 out this way?

I think defaults for inputs make sense, even as a native feature. In case you look into v0.4, would you consider creating a PR?

Yep, with or without having it on conda I'd be happy to make a couple quick PRs for this and my other issue (if it is still relevant in 0.4). Forgive me if I'm just being super blind but...where is the source for 0.4? I do see it available on pypi, but pypi just points back to this repo and I don't see "v0.4" or "experimental" branches, and "dev" is only a single commit ahead of "main" with none of the changes mentioned here.

the pyiron project is quite strict about all dependencies coming strictly from conda-forge. [...] it looks like setting up a pre-release is at least possible.

Yes, my knowledge about the feedstocks is very basic as well, and I couldn't quickly figure out how to configure it to automatically publish alpha versions as dev pre-releases on the anaconda index. In conda-forge/ryvencore-feedstock#5, I agreed with Jan Janssen that alpha versions can be pulled from PyPI instead. But if anaconda-based projects can restrict the distribution sources, dev versions should of course be carried over as well. Maybe you can comment on the issue in the ryvencore-feedstock issue above?

where is the source for 0.4? I do see it available on pypi, but pypi just points back to this repo and I don't see "v0.4" or "experimental" branches, and "dev" is only a single commit ahead of "main" with none of the changes mentioned here.

Yes sorry, it's not super clear in the repository itself. Development happens on dev; releases are published from main. Every time the main branch gets merged, a GitHub Action produces a new release (generating the wheel, testing on TestPyPI and on success pushing to PyPI). The version of the release is determined by what's given in setup.cfg of at the state of the merge. Hence, main holds the sources of the latest release (including alpha releases), and dev additionally holds all unpublished changes. Maybe I should create dedicated branches for the major versions...

sidenote: I realized the above publishing workflow is not optimal. I suggest package releases get built from explicit tags instead. This should make the origin of sources for published versions more clear. I just implemented that on the main branch but feel free to suggest changes to this. See #26

Maybe you can comment on the issue in the ryvencore-feedstock issue above?

Sure thing, I'll take a look at it next week. I hope to have more to say on this front in the next week or two as familiarize myself with the feedstock process for ironflow.

Yes sorry, it's not super clear in the repository itself. Development happens on dev; releases are published from main. ... See #26

Ah, ok, I see my mistake now. I was operating under the false assumption that main was the same as the latest conda release, so when I saw dev was only one commit ahead I assumed that there was more work hiding elsewhere. My bad, should have actually peeked at the code and seen that the Data class had already been introduced in main! I'm happy to take a peek at #26 next week too, although I can already say that I like the high-level philosophy of syncing the releases with repo tags, and that my technical expertise on the best way to implement that strategy is still pretty limited.