vatlab/sos

Confusion caused by parameter + multiple execution of global section.

BoPeng opened this issue · 19 comments

A user reported a case in gitter which boils down to

[global]
parameter: A=[1, 2]
parameter: B=[]
B.extend(A)

[default]
print(B)
input: for_each=dict(i=range(2))
print(B)

and the output is

[1, 2]
[1, 2, 1, 2]
[1, 2, 1, 2]

We should investigate the behavior of parameter in global section more closely.

gaow commented

I think I've raised the concern a few times in the past that global section got executed multiple times. It is expected that global section should only be executed once. But I recall you gave some good reasons why it executes multiple times ... clearly this now causes troubles. Maybe let's revisit those reasons?

The easiest example is

[global]
import time

[1]
input: for_each=dict(i=range(2))
time.sleep(0)

because time module is not pickleable and the substep would not get time unless the global section is executed by the substep each time.

Cannot recall immediately the reason for ignoring parameter in some cases though.

gaow commented

I see, now I recall the import this exception. But it looks this is an exception we can possibly hack around? Are there other exceptions?

Yes, we can hack and separate import statements, but things can get complicated. this document shows what can and what cannot be pickled.

For example

import some_class from some_module
a = some_class()

and a might not be pickleable.

We can enforce the pickleable requirement for global section though. Basically, we can say, other than module, which we can import separately, everything else needs to be pickleable....

gaow commented

Basically, we can say, other than module, which we can import separately, everything else needs to be pickleable....

This is a more acceptable behavior than the example above that B get accumulated. I do not see it very much limiting.

For global objects pickled, are they shared to substeps? In other words each substep will use the environment loaded by the step rather than loading and unpickling saved object on their own? If objects are shared, are they read-only?

Right now we send global statement to each substep and execute their, and there is also a background variable obtained from the step level...

Let me revisit this after I completes #1218

Use the following to test the pickleability of things:

import pickle

stmt = '''
from time import sleep

def a():
    print('hi')
'''

bundle = {}

exec(stmt, bundle. bundle)

print(bundle.keys())

pickle.dumps(bundle)

So

  1. import time is not, but can be easily separated.
  2. from time import time is ok, but other objects might not.
  3. def a() cannot, and this can be a problem.
  4. class () cannot.
gaow commented

def and class can be huge limitations ...

Can we then instead only pickle parameters and variables, and run the rest of the code -- I guess this is not easy to do?

We can do something complicated but I am not sure how reliable it can be... Basically, we can parse the global section and

  1. Pick out import, def, and class from the statement.
  2. Run these statements and get a dictionary A
  3. Run the entire global section and get another dictionary B
  4. Remove the elements in A from B, and we require the rest of the stuff to be pickleable.

Now the global section is separated into import/def/class statements and a pickleable dictionary. Now for each place when global statement is needed, we

  1. Execute the imports and definitions.
  2. Restore from the pickled dictionary.

This should work most of the time but

  1. It assumes the order of imports and definitions do not matter, which I am sure is not sure for some corner cases.
  2. It will prevent users from the creation of non-pickleable objects using something like
A = function_call()

but I guess we can tolerate such limitations.

Not sure if you are still using it but I will have to remove the %include and %from magic from the syntax because it includes global section from another script to the current one and makes the handling of this ticket even more troublesome.

gaow commented

I do not use these any more. I think you managed to convinced me to stop using these at some point.

Yes, I think I was under the influence of snakemake which can manage multiple files, but frowned upon the complexity and the implementation and especially the using of these features, especially because they are not in line with our design philosophy ...

I am removing them now.

Done. This also resolves #1155 (execution of global statement on remote host) because now the global statement will only be executed once (locally) and be pickled to all steps, substeps, and tasks.

gaow commented

Great! Since it is pickled, how does each substep use this pickled data? Do they each load it on their own, or the step loads it and copies to all substeps -- by reference or by value?

As far as I can tell, each substep will load it because everything is mixed up on the same worker. This could be improved but right now correctness is more important.

The trunk is not ready as there are still randomly failed tests and I can now reproduce the DAG problem but I believe the key parts are done.

gaow commented

I can now reproduce the DAG problem

This can be reproduced reliably now on your end with some example? I do not think I get it every time, at least I did not see it in versions before 0.18.7.

No. Only with the complicated example you have with 20+k jobs and many nested workflows. I can possibly try to trim down the workflow to create a test case though.