leon-thomm/Ryven

Need a system for managing external node libraries

ddevz opened this issue · 3 comments

ddevz commented

I took another look and you seem to have made substantial progress since the last time I looked at Ryven (I.E. since #74 ). Hopefully the following comments will be helpful.

This is going to talk about multiple related issues relating to node library management

Finding Node packages:
Trying to figure out which node package to import can be a issue.
(example workflow: person doesn't remember what package has the "print" node, import a node package, search, import another, search, import another, search find it and use it")
(obviously we could just grep through all the node.py files)

So a way to browse the node libraries before doing File->Add Nodes would be a good thing

Should encourage contributions to the node libraries:
(or alternatively, if peoples node packages are getting contributed & redistributed, its not obvious where those packages are.)

The README mentions that people can help by creating node packages, and example programs, but does not mention what to do once they write them.
Perhaps add a comment on where to upload node packages?

also:
The readme includes: "Now let's check out the small example projects: open a new Ryven window and load one of them. Take a closer look and understand what they do."
Perhaps add something like "User contributed example programs can be found at {location}"

Need a way to track the quality of node packages
Most node packages will be wrappers for specific python pip packages. For those types of node packages, I see the following information would be useful to track:

  • Who the maintainer is
  • What percentage of the python pip package functions are represented by a node (I.E. coverage)
  • If the nodes have been streamlined beyond autogen.py results or not. (or alternatively some kind of user-feedback score?)
  • If the nodes have been tested and confirmed to work, then what version of of the package was installed that they were confirmed against

The installed versions of the pip packages are relevant, because for example, doing:

  • pip3 install pandas==1.1.5
  • python3 ~/.local/lib/python3.9/site-packages/ryven/example_nodes/auto_generated/autogen.py 'pandas' '#00aadd' 'pandas'
  • Then doing File-Import Nodes of the generated does work

However doing File->Import Node on the generated files does not work ( meaning they do not show up in the pick list on the left) if using version 1.5.1 of pandas (I.E. if you did "pip3 install pandas" instead of "pip3 install pandas==1.1.5")
(i realize that 1.5.1 vs 1.1.5 is confusing)

Solution:
Unfortunately I don't have the solution thought out, I'm just highlighting the issue, and now I'll just give some some random thoughts on it.

  • One option might be to add a datastructure or function call to node.py
    ** one possible way might be to add something like a ryven.NENV.register_node_package(author, pip_package_name, pip_package_version?...
  • When autogen.py gives up and does not make a node for a function, add the node as a commented out name to export_nodes, so you can tell how many functions got wrapped into a node and how many got skipped?
  • automated generation of pytest files that confirm each thing in the node package can actually load?, then register the test results against the version installed of the package its wrapping?
  • perhaps have autogen.py add some kind of hashes to track if anyone bothered to clean up the node, or clean up the node descriptions or anything? (example: the pandas "read_csv" autogenerated node has tons of possible inputs, and there is probably a way to streamline the node by hand to be less unwieldy)

In general I'm trying to think of simple solutions short of making a full blown "rypip" node package manager. :)

There is also the issue of what to do with the packages that are not wrappers for python pip packages.

Sorry for making this so long, but as I said, I don't have the solution for it, but hopefully this can get ideas started.

Thanks a lot for sharing your thoughts. Some comments from my side:

Finding Node packages: [...]

good point, totally agree, shouldn't be too hard to add

Should encourage contributions to the node libraries: [...]

Yes, the readme also states: I would like to open a repository for maintaining particularly useful (frameworks of) nodes, and I would really like to have a central place / repository for all easily accessible node packages, which automatically tests them and tracks their versions. There are just many design choices here, and so far I didn't feel like there was really a need for it, but I'd like to see that happening.

Most node packages will be wrappers for specific python pip packages.

I am not sure about that, but this is a very important point, so let me elaborate in more detail: I think the system loses its purpose quickly when mapping python library functions 1:1 to visual nodes. I think the power of the paradigm lies in much stronger abstraction, as opposed to just wrapping something that's already simple into something much more complex in an attempt to make it a tiny bit more accessible, but inevitably making it much harder to scale and maintain at the same time (which is a more general issue imo). autogen.py was my attempt to aid the process of creating packages that try to wrap specific python libraries by creating a simple template from which one could easily start building an actual package. While this works well for super simple libraries, it completely fails to encode functionality that is not simply a top-level function, and I doubt that the compilation of such functionality can be fully streamlined to create useful node packages.

That being said, this is just what I'd expect and these points are very much up for discussion. A package system might give more insight into which types of packages work well in practice, and which don't. But I'm also afraid of packages suggesting usage of the visual paradigm in a way that doesn't scale. So far I tried to impose as little contraints and design suggestions on node packages as possible, to let the users decide what kinds of node packages are most useful for them. Comments on this are highly appreciated.

ryven.NENV.register_node_package(author, pip_package_name, pip_package_version

true, the export_nodes() function could be replaced by something that serves similar purposes of the setup() function from setuptools.

  • automated generation of pytest files that confirm each thing in the node package can actually load?, then register the test results against the version installed of the package its wrapping?

yes! One could also write a dynamic (randomized) tester that tries to squeeze any unhandled exceptions out of the packages. We could enforce for example any of the package to be stable under any valid sequence of ryvencore api actions on them, such as

  • creating, deleting, replacing, and connecting nodes
  • invoking the nodes' actions at any time
  • save and load, and verifying that the graph enters the same state after load

which could be streamlined with CI. Of course this should be extended by the package author by some deterministic semantic tests verifying that the outputs are actually correct.

ddevz commented

Yes, the readme also states: I would...

My bad. I end up skipping some parts thinking I know them from the last time I looked at Ryven. :)

I think the system loses its purpose quickly when mapping python library functions 1:1 to visual nodes. I think the power of the paradigm lies in much stronger abstraction, as opposed to just wrapping something that's already simple into something much more complex in an attempt to make it a tiny bit more accessible, but inevitably making it much harder to scale and maintain at the same time

Interesting. I understand abstraction in general, and I understand what the 1 to 1 version would look like. But I do not understand which types of abstraction the "stronger" ryven-only abstractions would be. Can you give a example description of a abstracted version that could replace a 1 to 1 pip package mapping to help me understand in what ways they would differ? (and then presumably, I'd be able to see how the rest of your argument follows)

(which is a more general issue imo)
I assume you are saying that scalability and maintenance are the general issues that we should be focused on? If so, then I agree. Including how things are maintained when individual people disappear . (For example, when you graduate you may suddenly get busy.)

But I'm also afraid of packages suggesting usage of the visual paradigm in a way that doesn't scale. So far I tried to impose as little constraints and design suggestions on node packages as possible, to let the users decide what kinds of node packages are most useful for them. Comments on this are highly appreciated.

I fully understand your intentions here. You want people with lots of Ryven packaging experience to debate the various points, while not influencing the discussion with your own assumptions in order to come up with the best answer. Of course the problem is that no one will have the experience to have those opinions until after you have some kind of packaging system for them to try (and discover what they don't want).
My best advice is to do it twice. Start with a document on what will be done differently in the second version, then do it the quick, wrong way, with the full intention of rewriting the entire thing after experience is gained, updating the document on the second version every time you learn something when dealing with the first version.

Actually the fastest way to start would probably be to say something like, "after you write your node package, please contribute it by forking https://github.com/leon-thomm/ryven-contributed-node-libraries , create your own subdirectory, then put your package in a subdirectory of that, then send a pull request."

While I'm on the topic:
When asking for tutorial contributions, it would probably be good to say something like: If you write a tutorial, please add it to a forked copy of https://github.com/leon-thomm/ryven-website-guide then create a new "Tutorials" page that links to your new page, and then add a new "Tutorials" menu at the top, then send a pull request.
Or are they supposed they add it to https://github.com/leon-thomm/Ryven-Website-2.0 ? Or are they supposed to create it as a document in https://github.com/leon-thomm/Ryven ?

(Also probably mention that if they create ryven example programs, where they should put them)

true, the export_nodes() function could be replaced by something that serves similar purposes of the setup() function from setuptools

Note: I am not qualified to talk about setuptools as I have never used it. Also, I have no idea how people upload the packages to the repository that pip uses. (for all I know it could be a web form that you log into and upload your packages to). However, whatever the package registration/upload procedure is for pip you may want to either clone it, or hijack it.
By hijacking it, I mean create any node packages that are wrapper packages as just new pip packages with a specific naming convention... like perhaps "ryven-nodes-{package}", as in ryven-nodes-numpy , or ryven-nodes-pandas .

Also note that I am not confident that I fully understand the Ryven flow yet (but I understand more of it this time around then when I investigated it previously)
(Question: The exec node has no inputs and no outputs, so how does it get triggered?)

which could be streamlined with CI. Of course this should be extended by the package author by some deterministic semantic tests verifying that the outputs are actually correct.

Agreed. I imagine that many package authors will not have semantic tests verifying the outputs, especially at first, so a way of the end user detecting if those tests are part of the package before they attempt to use a package could be helpful.

Interesting. I understand abstraction in general, and I understand what the 1 to 1 version would look like. But I do not understand which types of abstraction the "stronger" ryven-only abstractions would be. Can you give a example description of a abstracted version that could replace a 1 to 1 pip package mapping to help me understand in what ways they would differ? (and then presumably, I'd be able to see how the rest of your argument follows)

Sorry, I phrased it poorly; I totally agree with packages being wrappers for specific Python libraries (say, a 1:n library:ryven-package mapping), but I'm not sure about the automatic conversion part (which suggests a 1:1 interface mapping); The developer of a nodes package should put some thought into how to make the targeted library's functionality available in the framework of flow-based programming, which IMO is quite different from almost one-dimensional, text-based code. I meant stronger abstraction just in terms of doing more - as in declarative vs. explicit. Of course, when using libraries, functions like numpy.linalg.solve might very much qualify as nodes. Generally, though, I am imagining nodes like websocket, FFT, player-state, box, transform, and ExportSTL, as opposed to assignment, condition, if-branch, add, etc. Does that make sense?

the problem is that no one will have the experience to have those opinions until after you have some kind of packaging system for them to try

true

(and discover what they don't want)

good point

My best advice is to do it twice. Start with a document on what will be done differently in the second version, then do it the quick, wrong way, with the full intention of rewriting the entire thing after experience is gained, updating the document on the second version every time you learn something when dealing with the first version.

Interesting!

Actually the fastest way to start would probably be to say something like, "after you write your node package, please contribute it by forking https://github.com/leon-thomm/ryven-contributed-node-libraries , create your own subdirectory, then put your package in a subdirectory of that, then send a pull request."

However, whatever the package registration/upload procedure is for pip you may want to either clone it, or hijack it. By hijacking it, I mean create any node packages that are wrapper packages as just new pip packages with a specific naming convention... like perhaps "ryven-nodes-{package}", as in ryven-nodes-numpy , or ryven-nodes-pandas .

I would probably reserve the hijacking part for the "second approach", and just use a git repository initially. I am not sure if a full-blown package manager is necessary initially, one could also literally stick to git itself being the VCS.

the other points regarding the repo are noted, thanks.