pyiron/ironflow

Node-based pyiron-base

Opened this issue · 31 comments

JNmpi commented

Summary

Detailed Description

Further Information, Files, and Links

JNmpi commented

Fundamental concepts

Thanks @liamhuber for all the enhancements. At the IPAM workshop at UCLA, I had the chance to discuss with Jan concepts for bringing together various ideas from pyiron, ironflow and other workflow modules. These ideas are aimed to develop a more modular and easier-to-maintain pyiron. The basic building block will be nodes. The framework must therefore provide all tools to create, manage, and upscale them easily. In the following a brief summary of the concept is given:

        @pyiron.node
        def multiply(x=1, y=2):
               return x * y 

The main idea is that a node can be represented by a function with well-defined input, output and execution body. Rather than having to define explicitly the input and output when using a class this structure appears more pythonic and is used e.g. by dask. A main advantage would be the low entry barrier - all a new user would have to do is to write a common python function and decorate it with the pyiron.node. This approach can be also easily extended to provide type/ontology information via typing:

        from pyiron import onto
        
        @pyiron.node(register(onto.atomistic), log_input=True)
        def multiply(x: onto.types.atomistic.BulkModule=1 , y: int=2): -> float:
            return x * y 

The idea would be to provide ontological types like you introduced in the latest versions of ironflow. The nodes can be individually used and connected similar like in dask without having to explicitly define a workflow. A decorated node would be delayed and executed only once the highest level node is executed (similar like the delayed mode in dask):

          c = multiply(2, 3)    # no execution, only delayed object created
          d = multiply(c, 4)    # no execution, only delayed object created
          d.run() # only now c and d will be evaluated

Pyiron-related concepts

While the node-based pyiron could be used without explicitly defining a workflow, a common simulation should probably have one:

from pyiron import Workflow
        
        wf = Workflow(name='test', onto=onto.atomistic)
        
        Al = wf.create.structure.bulk('Al')
        job = wf.create.job.Lammps(name='Lammps_job', structure='Al')

The new syntax would be very similar to the existing one, so users should very easily adapt to it. A couple of things are to note:

  • Workflow provides arguments/functionality that is not in Project (i.e. definition of its core ontology/application)
  • The entire input to Lammps can be provided by function arguments (e.g. structure='Al'). For convenience and consistency, we should still leave our existing input notation (i.e. job.input['para'] = para) but the preferred style should be the new functional one.
  • The Workflow object provides on one hand autocompletion to easily access and browse through all available nodes, on the other hand this syntax appends the nodes to the workflow.

Serial execution

The following example is an extreme case, and may be a good test case for checking the concept and code:

        structure = wf.create.structure.bulk('Al')
        job_old = None
        for i_step in range(10):   # cannot be captured by workflow (write as macrocode)
              job = wf.create.job.Lammps(name=Lammps_job',
                                       structure=job.get_structure(-1))
              
              if job_old is not None:  # write decorated function to capture it in workflows
                   if np.abs(job.output.energy_tot - job_old.output.energy_tot) < eps:
                          break

Such a construct would fail in dask delay.

VASP example

Register nodes

In contrast to having a complex module like VASP in a single node we should have smaller and more flexible ones. This is sketched below together with ideas regarding node registration:

        @pyiron.node(register(onto.atomistic.code.vasp.exe))
        def VASP_exe(incar:FilePath, poscar:FilePath, potcar:FilePath, kpoints:FilePath):
            work = WorkingDirectory(path='.', 
                                    files=[incar, poscar, potcar, kpoints]
                                   )
            work.run('vasp.exe -f my_mode')
            return work

        @pyiron.node(register(onto.atomistic.code.vasp.parser_outcar))
        def VASP_parser_outcar(outcar:FilePath, select=[], exclude=[]):
            out_dict = my_parser(outcar, select)
            return out_dict  # or iodata object


        @pyiron.node(register(onto.atomistic.code.vasp.parser_incar))
        def VASP_parser_incar(incar:DataIO):
            incar_str = my_parser(DataIO)
            return incar_str


        @pyiron.node(register(onto.atomistic.code.vasp.parser_poscar))
        def VASP_parser_poscar(structure: onto.atomistic.structure):
            poscar_str = my_parser(structure)
            return poscar_str

Create VASP (macro node)

        @pyiron.node(register(onto.atomistic.code.vasp))
        def VASP(incar: DataIO, 
                 structure: onto.atomistic.structure,
                 calculator: onto.node.calculator
                ):
            
            onto_vasp = onto.atomistic.code.vasp
        
            vasp = onto_vasp.VASP_exe(incar=onto_vasp.VASP_parser_incar(incar=incar), 
                                      structure=onto_vasp.VASP_parser_structure(structure=structure)
                                     )
            out_dict = onto_vasp.VASP_parser_outcar(vasp.outcar, select=['energy_tot'], exclude=[])
            
            return out_dict

This is only a sketch of first ideas. Comments and suggestions are very welcome.

JNmpi commented

A few more ideas and pseudo code regarding node based pyiron. The examples demonstrate typical workflows we use.

Example workflows

Single

    wf = Workflow(name='test', 
                  onto=onto.atomistic,
                  sql=None, # True - use default, db_connector, ...
                  log_level = 'low', # 'debug', ...
                  store_working_directory = True,
                 )

Note that we could add many options in the workflow creation. Examples are whether to store the nodes in a database, whether and which input and output data to put into hdf5, etc.

    job = wf.create.code.VASP(name='Lammps_job',
                              structure=wf.create.structure.bulk(‘Al’),
                              calc=wf.create.calculator.MD(T=300)
                            )  

Note that the new nodes allow to provide all input via function parameters.

Parallel

Run over large number of structure stored in a structure container:

    jobs = wf.create.code.VASP(name='Lammps_job',
                               structure=wf.create.structure_container(‘Al_data’),
                               calc=wf.create.calculator.MD(T=300)
                               )

or run over a list of temperatures:

    structures = wf.create.structure_container(‘Al_data’)
    calculators = wf.create.calculator.MD(T=np.arange(100, 1000, 100))
    STRUC, CALC = wf.np.meshgrid(strctures, calculators)
    
    jobs = wf.create.code.VASP(name='Lammps_job',
                               structure=STRUC,
                               calc=CALC
                               )

Serial

    @pyiron.node
    def iter_energy(energy_convergence):
        job = wf.create.code.VASP(name='Lammps_job',
                                  structure=wf.create.structure.bulk(‘Al’),
                                  calc=wf.create.calculator.MD(T=300)
                                 )    
        
        for i in range(10):        
            job_new = wf.create.code.VASP(name='Lammps_job',
                                      structure=job.get_structure(-1))   
            if np.abs(job.output.energy_tot - job_new.output.energy_tot) < energy_convergence:
                break
                
            job = job_new    
            
        return job.get_structure(-1)   

and apply nodes:

    wf.iter_energy(energy_convergence=1e-4)
    wf.run()

Hi Jörg,

Super, yep, I'm on board with this. I'll be on vacation through Wednesday so I won't be able to look at this in depth, but a bunch of the earlier stuff you suggest is already implemented over in contrib, minus the syntactic sugar of doing it with a decorator (which should be easy).

One thing that is notably missing in your examples is labels for outputs. IMO these are absolutely critical for constructing complex graphs where inputs and outputs can be connected with any complexity -- the only way to get around this is to restrict these functions to only return a single output value, which I think is too harsh.

Concretely, look at your example setting a kwarg "structure=wf.create.structure.bulk('Al')" -- in this case "bulk" is a node, and in principal may return multiple values (although in this case it only returns one). So I really feel extremely strongly that we need something like "structure = wf.create.structure.bulk('Al').structure", and to require specifying labels for the output, eg in the decorator like "@pyiron.node('structure')" or "@pyiron.node(('energy_pot', 'forces')".

If you and Jan haven't yet, please run the contrib workflow notebook(s?) to see what parts of your plans the existing infrastructure already already covers. All the starting stuff I think is just a matter of adding decorator syntactic sugar to existing functionality. The later stuff with specifying parallelization, database connection, etc all still needs to be done.

On a first read through, the absence of output naming is really only thing that worries me here, the rest of it looks brilliant so far.

Very practically, I want to first prioritize some performance enhancements for ironflow (decoupling port status model logic from the draw call should be sufficient), but once that's done I am excited to go over to contrib and start integrating the ideas here into the graph infrastructure 👍👍👍

Actually I have a second concern: the use of a global variable ('wf') in the final example. I think this is a closely connected concern to my worries about Marvin's contrib work where he allows entire nodes to be passed as input to other nodes. I completely agree that we need this type of macro functionality, I just think we'll need to be a little out careful about the implementation.

JNmpi commented

Hi Liam,

Thanks for your super-quick and very positive reply. I am glad that we both see the advantages and potential of these formulations and that so much is already realized and implemented in ironflow. Once you are back from your vacation it would be good to have Zoom meeting to discuss the next steps in more detail.

With respect to your questions/topics some first thoughts below:

  • ... only return a single output value: This could be resolved by returning several output values and typing them. A simple example is sketched below:

        @pyiron.node()
        def myfunc(A: a, B: b) -> (O1: o1, O2: o2):
              o1, o2 = a, b
              return o1, o2
    

The types O1, O2 etc. can be ontologically enriched using concepts of the typing module link. In particular the Annotation function looks promising to provide metadata.

  • the use of a global variable ('wf') in the final example. This can be likely prevented by the following slight extension:

          @pyiron.node
          def iter_energy(wf, energy_convergence):
              job = wf.create.code.VASP(name='Lammps_job',    
              ...  
    

    The application of the node would be unchanged, i.e., wf behaves like self in a class:

          wf.create.node.iter_energy(energy_convergence=1e-4)  # wf is not in the function call but provided by the attached class object 
          wf.run()
    

One final question. In your comment you mentioned 'contrib workflow notebook'. Are these the notebooks in the ironflow repository (or the one in the pyiron_ontology)?

Hi Joerg,

A live chat sounds great!

Re annotations, I think that works nicely for adding onto typing in addition to data typing, although annotations currently do not support kwargs (like "o type=onto.foo"), so we would need to force a fixed ordering in annotations. For this reason I lean towards using a dict, eg as a kwarg in the decorator with keys matching the variable names, to provide this data. The big problem I see using annotations to label output data is that I want output labels to be absolutely mandatory, and forcing people to learn about annotations and use them seems harder to understand than adding a positional arg to the node decorator. Also we could probably enforce it as a requirement, but we'd need to add extra rail guards, where adding it as an arg to the decorator just right away makes it clear.

Re the demo notebook, this is actually not at all integrated with ironflow yet, it's over in contrib in notebooks/workflow_example.ipynb

Missed the global var discussion because of indentation.

It may indeed be possible to provide it by the scope of the class, but then you'd still need something like 'self.wf', and self will (at a minimum) look very strange in the context of a function definition (even if the decorator means we return a class instance). I bet we can find a solution, it will just take some thinking to get it both functional and intuitive.

JNmpi commented

Let's try to schedule a meeting on Thursday or Friday. I am presently at the DPG meeting in Dresden and will be back in Düsseldorf on Friday.

It would be definitely good to explore possible option to provide the labels for the output variables. Decorators may be a good option. We should only make sure that our solution is as close to standard python, so that the barrier for users is as low as possible.

Thanks also for the latest development on ironflow. The ontologic features are really great and it is exciting to play with it. The only issue is the sluggish behavior, which makes it often hard to know whether a click did not work or something is still happening. I am therefore looking very much forward to the next development on speeding things up. Then one can fully enjoy the really cool features and the great concepts that you have already implemented. Really great work!

JNmpi commented

Hi Liam, I had now a look at your workflow notebook in pyiron_contrib. Really very nice! I see also the strong links to my thoughts. An important task of the decorator would be to make the following statement more intuitive and python-like:

      def plus_minus_one(x: int | float = 1) -> tuple[int | float, int | float]:
          return x+1, x-1
      
      node = Node(plus_minus_one, ("p1", "m1"))

With the new formulation this could read like

      @pyiron.node(repository)   # repository where the node would be registered (could be local or global)
      def plus_minus_one(x: int | float = 1) -> tuple[int | float, int | float]:
          return x+1, x-1
      
      plus_minus.inputs.x.type_hint  

The last line is only an example to show that all constructions in your notebook should work, i.e., the decorator converted the function into a node object.

JNmpi commented

One more thought regarding your notebook. For code applications, it may be helpful to offer a lazy mode, i.e., the following statement should just build the workflow but not run it:

    wf.structure = nodes.BulkStructure(repeat=3, cubic=True, element="Al")
    wf.engine = nodes.Lammps(structure=wf.structure.outputs.structure)
    wf.calc = nodes.CalcMD(job=wf.engine.outputs.job)
    wf.plot = nodes.Scatter(
        x=wf.calc.outputs.steps, 
        y=wf.calc.outputs.temperature
    )

To actually run it one would have to call the following line:

  wf.run()

Here one could also specify where to run it, i.e., the queue, the number of cores etc. It would be also nice to have an option to convert the code into a graph and vice versa:

    wf.visualize() 
    wf.to_graph()   # alternative

While I am currently at the IPAM workshop, I would like to join the meeting to synchronize the discussions, so just keep me in the loop.

JNmpi commented

Hi Jan, great that you will join. We have not yet set up a meeting but a good choice may be Friday afternoon when I will be back at home.

I have a couple of recent developments from the IPAM workshop, which might also be helpful for this discussion:

  • structuretoolkit - move the structure analysis to a separate module so they can be used with ASE Atoms directly. The idea is that if people like this stand-alone module then they might also be more likely to give pyiron a try. The pull request to merge these changes back into pyiron_atomistics is available pyiron/pyiron_atomistics#994
  • pyiron_lammps - the pure python interface to LAMMPS now supports multiple ASE structures as well as multiple pandas DataFrames for interatomic potentials. Just like numpy if only a single ASE structure and a single potential are provided only a single set of elastic constants is calculated. If either multiple ASE structures or multiple potential dataframes are provided then multiple sets of elastic constants are calculated. Finally, if a list of ASE structures and a list of interatomic potential dataframes are provided with the same length - e.g. as they are generated from np.meshgrid then again a list of elastic constants is calculated. pyiron/pyiron_lammps#12
  • pympipool - now subtasks can use multiple MPI ranks. For running LAMMPS simulation this gives the user the full flexibility, either a single LAMMPS calculation is using all MPI ranks, or each LAMMPS simulation uses one MPI rank each or any other split of MPI ranks to LAMMPS simulations is possible, as long as all LAMMPS simulation use the same number of MPI ranks and all MPI ranks are used. pyiron/pympipool#28

While these developments focus more on the scalability of pyiron, I could see them being beneficial in simplifying the development of complex workflows and hopefully provide the ideal test bed for the developments discussed above.

I'm just on mobile so my responses are pretty limited in depth, sorry.

Re meeting: Friday sounds good. At present I can be free any time. We should keep @pmrv in the loop here too in case he wants to attend; there is both synergy and some conflict between the graph stuff and the tinybase stuff.

Re standard python/decorators/link: indeed, I think getting the existing graph stuff working with decorators should be super fast, then adding the fancier bits on top can be more iterative. I'm super excited about this direction.

Re lazy evaluation: there is some support for this! My existing node stuff has init flags for turning on/off the initial run and running automatically on update. Definitely not as smooth as your example, but the groundwork exists at least.

Re structuretoolkit pyiron_lammps, etc: I am super excited to get this, as the coupling between current pyiron jobs and nodes is a huge pain to manage! Personally I would be happy to only support Lammps forever, but we will need to think carefully about data storage and making sure we can still accommodate more expensive codes like vasp. But for now getting it all on-the-fly as facilitated by pyiron_lammps is super exciting

pmrv commented

On Friday I'll still be in the train by the time @liamhuber and @jan-janssen would be able to join, so I'd prefer Thursday.

From my side any time Thursday is also currently fine.

From my side any time Thursday is also currently fine.

Although if it's going to be first thing Thursday morning (pst) then I'll need to know inside the next five hours, which seems unlikely...

JNmpi commented

Today (Thursday) does not work for me since I have to attend several talks and committee meetings at DPG. Tomorrow afternoon would work for me.

Friday >=1500 CET is good for me.

Friday >=1500 CET is good for me.

For me as well.

I played around and implemented the decorator so that this now works:

from pyiron_contrib.workflow.node import node

@node("sum")
def adder(x: int|float, y: int|float) -> int|float:
    return x + y

I was thinking a bit about how to handle macros and had two concrete thoughts:

  • Currently node functions are always staticmethod, but nodes should be adapted s.t. the node function can reference self. I think it would be efficient to parse the function to see if the 0th arg is called self, and if so simply don't make an input channel for this, and pass self on execution.
  • It should be possible to make an @macro decorator somewhat like the @node decorator, but it would decorate a function that builds a little graph. The returned node class would then have this construction done after __init__, and the node function would just be updating the (sub)graph inputs. The constructor would probably need to rely on self.workflow, which isn't necessarily populated for all nodes, but I think it's a fair requirement of a macro that they exist inside workflows.

And one fuzzy thought:

  • We could consider merging Node and Workflow into the same class. What really pushes me in this direction is that then macro nodes effectively create their own namespace, like my_workflow.my_macro.some_node, because we can nest what are currently workflows. In contrast, if nodes have no true structural hierarchy, then macros are creating new nodes on the parent workflow directly and will need to do some mangling of the "sub"-node labels to avoid conflicts. Merging these two classes is trickier business than the above two points though, and I'm not yet 100% convinced this is wise.
pmrv commented

Friday >=1500 CET is good for me.

Sorry, I didn't write again yesterday. The earliest I can do tomorrow is 7pm CET, @JNmpi can do earlier he told me. I guess we can use the normal pyiron link.

So what is the actual time then? I will set an alarm for 1445 CET (0545 PST) and check for a concrete reply here, in case the time is 1500 CET... but at that point I would certainly be happy to roll over and go back to sleep until my kids wake me up. 1900 CET is fine for me.

pmrv commented

Let's say 1915 CET then, in case there's a train delay or so.

JNmpi commented

1915 CET works for me.

pmrv commented

I will be late.

JNmpi commented

I tried to summarize and sketch some of the ideas we had over the last few days, particularly with Jan at the IPAM workshop, in schematic graphs. They should serve to sharpen and focus the discussion rather than representing a fixed construction schema.

The first figure below shows the main components of the future node-based implementation of pyiron. An important aspect is the difference in the concepts/terms node and task. The node is the object that has all the information to translate input into a series of tasks. A simple example would be our Murnaghan object that creates for each fixed volume a separate Lammps or VASP job (task). Another example could be the Lammps library which creates for a structure container a series of jobs.

image

Below is a specification of the node repository (or node store), which locally or globally stores and provides all information to run a node on any computer. For the other parts, we should construct similar sketches and augment them by pseudo code.

image

Some notes from our discussion today:

@JNmpi and I chatted today about graph-based pyiron computations, including taking a look at @pmrv's tiny_base PR.
First, @jan-janssen has some concerns about actually providing a snakemake interface.
That's fine, we can always come back to a @snakemake meta-node decorator in the future; for now we can shelve it.

The current thrust is to make this sort of graph computation super easy to use and sufficiently powerful and performant for code users -- forget an actual graphical representation right now.
To that end there were a few topics

Rapid access for simple nodes

After talking a bit about pyiron objects knowing their own history, we came around to the idea of storing recipes for objects in the form of simple graphs.
This lead to thinking about how nodes might be more easily used in a text context, e.g. without always having to have this trailing .outputs.foo.value to get at things.

We came up with the following ideas for the case of very simple nodes that (a) initialize with valid input for all ports, (b) evaluate quickly, and (c) have a single output:

  • Update these nodes at instantiation and on each input change so that the output is always populated
  • Modify __getattr__ so that as a last resort it tries self.outputs[0].value.__getattr__
  • Add syntactic sugar to connections to avoid asking for the full output path
  • Add syntactic sugar to indexing, and maybe assignment to exploit the output too
  • Here's the fun/hard part: with some usages of the output, actually go under the hood and transform the node from a single node to a macro
    • In fact, it might have been a macro of one node all along, so we really just add to the macro

Then we might get an example like this:

from pyiron_contrib import Workflow

wf = Workflow("my_chain_example")

structure = wf.add.node.atomistics.bulk(element="Al", repeat=5)

structure.plot3d()  # == structure.outputs.structure.value.plot3d()

structure.visualize()  # Shows the graph vis for a single node

structure[:5]  # Creates a _new node slice under the hood_ 
structure.visualize()
# Now we see a two-node graph, with an internal connection

structure.plot3d()  # == structure.outputs.slice_sliced.value.plot3d()
# Shows a structure with just the first five atoms
# Note that we still have only one output, so getattr works fine
# The full path to that output, however, is changed to the dynamically-created
# macro path of {node_label}_{output_channel_label}

structure.undo()
# Pops off the last node in our macro chain
structure.plot3d()  # Shows the original, full-size structure

structure[:5] = "Cu" # ***Hard***
# By some magic, this adds a different new node, 
# that changes species and returns a structure, and its input is set
# to match the slice info
structure.plot3d()  # == structure.outputs.change_species_structure
# Shows the Al structure with 5 Cu atoms

structure.inputs.bulk_element = "Mg"
structure.inputs.change_species_element = "Ca"
structure.inputs.change_species_i_end = 6
structure.plot3d()
# Now we have an Mg structure with 5 Ca atoms!

Honestly, I'm not sure how we will get the magic line labeled ***Hard*** working, but I think it's a great goal.

What to do about control loops

We can currently make sophisticated graphs on-the-fly in the notebook, but since the python process and notebook are the ones aware of for/while loops, there is no way to serialize them as part of the graph a-priori.
Ultimately, I would like a fully-nodal graph to be the ground truth for workflows.
However, all the node-based for/while solutions we've seen are super ugly.
So, as previously mentioned it would be nice to offer pythonic for/while loops as a syntactic sugar for building these beasts.

Today Joerg shared some snippets from Lego Mindstorms, where these sort of flow control objects are offered graphically with a drag-and-drop interface.
This may actually work nicely with the above paradigm s.t. after we have for/while as true nodes, and then build pythonic sugar on top of those, we can build graphical sugar on top of that so that GUI users can place a sort of meta node.
E.g. for a for loop I imagine a sort of mega-node with IO like iterator, index, etc. and outputs like accumulated.
Then inside this you build a sub-graph, and connect, e.g. the lattice input of your bulk_structure node to the internal iterator connection point, and the structure output to the internal connection point of accumulator.
Then you pass in a linspace or whatever to the external connection for iterator, and can get a list of structures at the external connection to accumulator.

| o-iterator -o~~~~~~~|o- lattice : bulk_structure : structure -o|~~~~~~o- accumulator -o|

Or something like that.

At any rate, for now let's just jam loops inside the node functionality itself and keep going.

Key missing pieces

We want a stable and useful solution ASAP, that means prioritizing a few things while letting others fall by the wayside.

Priorities:

  • Node registries -- getting nodes from a python module, live objects from the current interpreter instance, and deserializing a cloudpickle dump (or similar) should be sufficient.
    • In the long run it would be nice to have some sort of global node store, including unique node identifiers and versioning, and even unwrapping URLs to download node collections, but that can all wait
    • Other than cloudpickle, this stuff is already in place in Ironflow so that can be used as a launching point
  • Asychronous execution -- Right now everything just happens on the interpreter head.
    • I think what we need to do is integrate with Marvin's work in tiny_base such that, when their executor is not None, nodes generate task lists from their functionality and work with an executor to get results from those tasks.
    • This is in pretty good agreement with Joerg's sketch of the node-task relationship from earlier.
  • Macros -- this is just a really critical node subclass.

Non-priorities:

  • Solving all the flow-control problems -- All the solutions we've seen that have node definitions of flow control are super ugly, and this is hard. Instead of getting totally hung-up on it, for now let's dump as much for/while/etc stuff into the node functionality itself
    • Maybe there is some nice synergy here with Marvin's more complex tasks, s.t. nodes-as-task-generators can be aware of whether they are generating individual tasks or some list of tasks.
  • Workflow serialization -- Ultimately this is necessary to dethrone the Jupyter notebook as the source of truth for the workflow and replace it with something better defined, but this may change substantially depending on how far we've gotten in representing flows with nodes at the most fundamental level, so just delay it and rely on re-running workflows for now.
JNmpi commented

@liamhuber, thanks for the excellent summary of our discussion. I fully agree with it. Only a few minor points/thoughts:

  • Regarding your code example:
    - To keep the code as close as possible to regular python I would alter the behaviour of a slice as follows:

            structure_5_atoms = structure[:5]  
            structure.visualize()   # visualizes the original structure
            structure_5_atoms.visualize()  # applies the node slice (contains the first 5 atoms)
    
            structure[:5] = "Cu" # ***Hard*** Like in 'normal' python this changes the first 5 elements, i.e., no changes 
    
            structure.inputs.bulk_element = "Mg"
            structure.inputs.change_species_element = "Ca"
            structure.inputs.change_species_i_end = 6
            structure.plot3d()
    
            # should be equivalent to the following code block
            structure[:] = "Mg"
            structure[:6] = "Ca"
    
  • Workflow serialization: Having the code in the nodes, which can be fully serialized, rather than in the Jupyter notebook is already a big step forward. Since the nodes, including their underlying code are part of the workflow and stored, we would have full serialization. Having also the code in the node expressed as a workflow would be a cool feature but is not mandatory to achieve serialization. It is therefore perfectly fine to have this feature as no-priority.

Notes from 2023.05.17 meeting with Joerg

Raw notes augmented and polished on the 18th.

@JNmpi, you had a nicely updated version of the sketch in this comment, could you upload it in this thread?

Core pyiron 1.0 features:

  • ease of use
  • logging (i.e. database interaction)
  • submission (especially HPC cluster support)

Executors and restarting workflows

Recently @jan-janssen has been getting into flux; I experimented with the most-primitive "executor" in workflow; @pmrv earlier made Executor classes for tinybase.

Joerg and I talked a bit about how to handle restarting workflows when (a) the python process controlling the workflow gets restarted and/or (b) the process handling the task execution gets restarted.
For (a) we need to be able to reconnect nodes to the task manager they were using beforehand and recover the status of their task.
For (b) we need to restart and recover the status of the task manager itself (if we have permissions to do so) and then recover our tasks.
E.g. you can imagine the case where something like a SLURM manager goes down; it must have data serialized for recovering itself, and in principle we should then be able to have our node reconnect with the restarted and recovered SLURM instance and find out what happened to its task -- although in this case the workflow runner probably doesn't have permission to restart SLURM -- some sysadmin needs to do this -- so we need to be able to let users know that the task manager is inaccessible and how to handle this (e.g. by cleaning the node state and restarting tasks with a fresh executor/task manager).

Joerg was also excited about the hierarchical approach of Flux, giving us the option to have something like per-worflow or even per-node task management.

Sitting down to write out these notes the day after the meeting, this is my thinking on the topic -- and it may all be "duh" stuff to Marvin and Jan who have been thinking about pyiron's interaction with task scheduling for longer.

I would define a "task manager" as some python-independent and permanent/recoverable process for executing computations, and an "executor" as a python object that executes tasks generated by nodes in our workflow.
The "executor" may wrap/communicate with one of these external "task managers" -- e.g. SLURM, Flux.
When the python process controlling a workflow is killed and restarted, and we de-serialize the workflow state, we'll need to re-instantiate the "executors" too.
In the case that these "executors" are thin wrappers for "task managers", the re-instantiated executor needs to recover all the statuses of its tasks from the "task manager", e.g. by having stored the PID for the "task manager" and recovering a connection from there.
Here it becomes very useful that the "executors" are also just python objects, because now we can cleanly register callbacks from "executor" to node.
In case the background "task manager" process has also died, we may

  • Restart this as well, e.g. by having the "executor" run some bash code to restart the other process and de-serialize the "task manager" state and reconnect as usual, in the case that we have permission to do so (maybe for workflow-specific Flux managers? I don't know)
  • Fail hard, clear, and clean in case we don't have these sort of privileges over the "task manager" process (e.g. a centralized SLURM deployment on some HPC)

When the "executor" very simply runs tasks modally on the main workflow python process, this is all trivial.
When the "executor" packages and submits tasks to a "task manager" the behaviour is fairly clear, as above.
I see one intermediate case, where the "executor" starts background tasks using something like concurrent.multiprocessing, or the node task is running some MPI shell script on local resources; here we probably need to handle resource management (i.e. scheduling multi-core tasks over our finite local number of cores) and "task manager"-restarting (i.e. (de)serialization of the manager state) entirely ourselves.

There is obviously some strong overlap with dask's resiliancy policies, although in my dream-behaviour above, we would find a way to handle (some things like) scheduler failure more robustly.
There is probably also some stuff to be learned from dask_jobqueue.
Ultimately I'm not too worried about overlap with dask capabilities here, as we are looking to be more tailored to HPC environments from the get-go, and to support possibly cyclic graphs (at the cost of graph-execution optimization).

In all cases, the end user should see an extremely similar interface for their Executor class, regardless of how the executor is handling tasks in the background.

GUI stuff

In terms of GUIs, ironflow's dependence on ipycanvas means that more complex features -- like connection lines that automatically bend to flow around objects, or meta-nodes with snappable slots for actual nodes, etc. -- are going to be a huge pain to implement from scratch.
I want a higher level GUI platform, but didn't know of anything that keeps our support for the jupyter environment (which is a nice feature, especially given pyiron's history and current deployment).
Joerg reminded me that in NFDI MatWerk there is some Java library for GUIs that is now accessible inside Jupyter notebooks.
We think we discussed it before, but it's been a while and the details are fuzzy for me already.
This maybe provides a jupyter-compliant route for implementing these more advanced graphical features though!
We did a nice meta-node sketch for loops in the graph, and agreed that under the hood these would probably still construct pure graphs (sort of like Unreal Blueprints flow management, which I like formally, but which are hell to read and write!).

Below is a sketch for what a slottable macro-node might look like.
Note:

  • Mapping is possible from an iterables onto what to iterate over in the slotted node instance
  • We should be able to change the number of iterables in and out
  • We still allow regular connections to the slotted node (these are held constant over the iterations)
  • Under the hood this creates an uglier pure-node macro like seen in Unreal blueprints for-loop
  • A similarly convenient macro-node interface should be available to code-based users when writing their workflows, although obviiously there we don't need to worry about the slotted node "snapping" nicely into place.

metanode_sketch

Maintainability and classes

We want a few interfaces like Executor and Serializer that may be replaced to have different behaviour -- e.g. SLURM vs multiprocess discussed above for the Executor or HDF5 vs S3 for serialization, whatever.
These should look as similar as possible to the end user.
These interfaces should have testing to make sure that what happens under the hood is staying compliant with the thing we're interfacing to!
We want to update the user-facing side of the interface as seldom as possible, but what we do under the hood we can update as much as we want.

In particular, a Workflow instance might hold a collection of Executor instances;
If a Node instance tries to create a new Executor instance, we should check if this executor interface has already been instantiated and is being held by the node and just use that (singleton-esque), otherwise make a new one and add it to the owning workflow's list.
Similarly, we should give convenience methods for updating what type of Serializer is used across an entire workflow at once (although in principle you should still be able to define per-node how serialization is handled).

Language power and usability

Joerg has recently looked into Julia a bit and was particularly keen on how it handles multiple dispatch.
Not that we should switch pyiron1.0 over to Julia, but just that there are some concepts here that we could pull in better to this graph-and-node framework.

E.g., we have talked about having hierarchically defined IO classes, like AtomisticOutput-MDOutuput(AtomisticOutput)-LammpsOutput(MDOutput)-VASPOutput(MDOutput,DFTOutput), etc.
We can imagine then a situation where we have a CalcMD node who simply has MDOutput.
Now suppose we had some FiniteTBulkModulus node that took a bunch of temperatures and pressures;
We could empower the node so that it accepts either different arrays to each of these inputs, or just (the same) MDOutput to each and it knows how to extract temperature and pressure from an MDOutput object.
There is also room to synergize with or replace our ontological typing here, as in my made-up example we clearly want NVT MD.

E.g. 2, the Plot3D node could support a Atoms with NGLView, or automatically provide a different viewer with electron iso-surfaces if you pass data with electron densities, or automatically support animation if you pass a list of structures or MDOutput.

Node packages

We also talked briefly about version control and node packages.

In principle, each serialized node will need to know which version of its node package it is from, but there is no problem mixing-and-matching nodes in a given workflow from different versions of the same node package -- as long as the IO connections are valid, the workflow shouldn't care if it uses nodes from multiple packages, and different versions of the same package is just equivalent to different packages.
We'll ultimately need to support versioning in our package loading/registration, and how to get the namespacing ok, but this is just an implementation detail and not a fundamental barrier.

The one catch is that node package versions will need to be consistent with the version of pyiron being used, e.g. in case we change something like the Executor interface and the node's internal instructions are no longer parseable.
But breaking backwards compatibility between pyiron1.0 and specific nodes should happen much less frequently than updates to a node's behaviour, so that's not a deal breaker.

Practicallity

We need to get @niklassiemer more deeply involved in these developments as he is the ideal person for long-term support!

Today I spent some time playing with the idea of macros. Nothing is running yet, but I have some very rough spec ideas and pseudocode.

As we've discussed before, I am thinking of a macro as a sort of crystalized workflow. As such,

  • It should have a nodes attribute just like a workflow, as well as the ability to add/create nodes onto itself
    • At least until the initialization is finished, then we may want to disable this
  • IO should default to the same thing as it does in workflows: unconnected channels from all the child nodes
  • But you should be able to provide a map to give additional convenient channels
    • Maybe to rename something, e.g. suppose some node "foo" has output "y", instead of "foo_y", we can make a map so it's just "y"
    • Maybe to provide sychronized setting for input, e.g. "node1_x" and "node2_x" can be updated simultaneously by updating the macro's "x" channel
    • Maybe to break in and provide special access to connected ports -- this one seems a bit dangerous (at least to allow for input, output is fine) given the assumption that macros are "crystalized"
  • They should be creatable from workflows or a decorator (or from a class instantiation, but that should be a boring and more verbose duplicate of the decorator pathway, so don't worry about it).

Here's the syntax I'm playing around with:

from pyiron_contrib.workflow import Workflow

@Workflow.wrap_as.single_value_node("sum")
def add(x: int = 0, y: int = 0) -> int:
    return x + y

macro = Workflow("plus_minus_one")
macro.p1 = add(y=1)
macro.m1 = add(y=-1)

# Choice 1) Use the default interface
plus_minus_one_default = wf.to_macro()

wf_default = Workflow("double_it_default")
wf_default.start = add()
wf_default.macro = plus_minus_one(
    p1_x=wf_default.start, 
    m1_x=wf_default.start
)
wf_default.end = add(
    x=wf_default.macro.outputs.p1_sum, 
    y=wf_default.macro.outputs.m1_sum
)

# Choice 2) Define a new interace
plus_minus_one_custom = wf.to_macro(
    # inputs={
    #     "x": (macro.p1.inputs.x, macro.m1.inputs.x)
    # },  # This way?
    inputs={
        macro.p1.inputs.x: "x",
        macro.m1.inputs.x: "x"
    }  # Or this way? For linking two inputs to a single channel
    outputs={
        "p1": macro.p1,
        "m1": macro.m1
    }
)

wf_custom = Workflow("double_it_custom")
wf_custom.start = add()
wf_custom.macro = plus_minus_one_custom(x=wf_custom.start)
wf_custom.end = add(
    x=wf_custom.macro.outputs.p1, 
    y=wf_custom.macro.outputs.m1
)

# Choice 3) With a decorator
@Workflow.wrap_as.macro()
def plus_minus_one_deco(macro):
    """
    Macro wrapped functions take the macro as the first and only argument 
    (which is a lot like a workflow), create nodes and make connections,
    and return inputs and outputs maps for giving special access
    """
    macro.p1 = add(y=1)
    macro.m1 = add(y=-1)
    return {macro.p1.inputs.x: "x", macro.m1.inputs.x: "x"}, {}
    
wf_deco = Workflow("double_it_deco")
wf_deco.start = add()
wf_deco.macro = plus_minus_one_deco(x=wf_deco.start)
wf_deco.end = add(
    x=wf_deco.macro.outputs.p1_sum, # We didn't map these
    y=wf_deco.macro.outputs.m1_sum # So use the default
)

for wf in [wf_default, wf_custom, wf_deco]:
    for i in range(5):
        # All cases should do the same boring thing, and should do it
        # automatically since the children are SVNodes
        assert(2 * wf.inputs.start_x.value == wf.outputs.end_y)

The fact that the macro's child node is defined in the notebook really pushes at the question of how to best store macros. Of course I'd love if under the hood they just stored class names and connection lists and re-instantiated nodes from known libraries... but perhaps sometimes we will really need to cloudpickle node instances and reinstantiate by unpicking the whole thing.