Workflow for two-level GLM (nilearn tutorial)

Question

Workflow for two-level GLM (nilearn tutorial)

yibeichan opened this issue 3 years ago · 9 comments

Hello, I'm working on doing a two-level GLM nilearn tutorial for Pydra. But I've been having problems with a node in the workflow for more than two weeks (@djarecka, @htwangtw, and I have discussed it over past weeks). We've tried a couple ways to debug but haven't solved the problem yet. Here is a summary about what's happening:

In this two-level GLM tutorial, the analytical logic is:

download data (a task function)
run first level for each subject (a workflow)
starting the second level estimation using results from the 2nd step (a task function)
multiple statistical testings.... (other tasks...)
step 3&4 together can be set up as a workflow, but let's talk about task function first for the sake of the current issue.

I write the whole workflow using nilearn and make sure the code itself is error-free. Each pydra task function can run successful as a standalone task. When I connect tasks into workflow, errors come up.

The workflow works fine to get first level estimation
The problem happens at secondlevel_estimation see here cell 18.
2.1 The input of secondlevel_estimation is a list of firstlevel model, the outputs are (a) secondlevel mask from SecondLevelModel() and (b) secondlevel stats estimations
2.2 If run secondlevel_estimation as a task (outside of workflow), it prints results fine.
2.3 If run secondlevel_estimation as a node in the workflow, it runs and can print out outcomes at every step, but it CANNOT return results. Here (last line) is the error 'NoneType' object has no attribute 'errored'
2.4 Since secondlevel_estimation can't return results, it has problems linking to the next node, so we will get error as graph is not empty, but not able to get more tasks - something is wrong (e.g. with the filesystem) see the last cell output here

Now the question is why this secondlevel_estimation node can't return results even if the output at each of its step can be print out.

Answer 1 · 2022-08-22T01:33:13.000Z

some comments to the notebook (before I forget):

I still had issue with confounds files, and I had to add:

conf_list = glob.glob(os.path.join(fmriprep_path, '*', 'func', '*_desc-confounds_timeseries.tsv'))
conf_list.sort()

it might be better to use the same notation for subj_id as BIDS does and starts from 1, I really got confused that when subj_id=1 I need sub-02 files

Answer 2 · 2022-08-22T02:35:04.000Z

Thank you Dorota! Yes, you're right. I just checked load_confounds_strategy. It says

As long as the image file, confound related tsv and json are in the same directory with BIDS-compliant names, nilearn.interfaces.fmriprep.load_confounds can retrieve the relevant files correctly.

So load_confounds_strategy automatically detects other files. I downloaded confound files in the previous version of my code (removed now), I guess this function found them. Okay, I'll add confounds in the code.

Re subj_id, I'll better document it in the tutorial! (I didn't put any documentation in this test notebook... sorry

Answer 3 · 2022-08-22T14:14:30.000Z

To use load confounds correctly, the easiest way is to keep the fmriprep output untouched.
If that's not how you set up the workflow, we can simply review the confound regressors you use.

Answer 4 · 2022-08-22T20:03:56.000Z

Hi haoting, the confound file problem solved. I downloaded them through datalad before I use load_confounds_strategy, so it works fine on my laptop. Dorota was testing my notebook where I removed the code for downloading confounds (I thought it's not needed), so she got some errors. She added & downloaded now. I'll add the datalad command for downloading confounds in the tutorial (for the general use).
The error we have now is not from confounds but from the worklow itself.

Answer 5 · 2022-08-22T21:39:58.000Z

I see! Glad that's solved!

Answer 6 · 2022-08-23T14:58:43.000Z

I'm still debugging it, but I think there is something wrong with calculating the hash value of inputs when pandas.DF is involved...

Answer 7 · 2022-08-26T05:30:26.000Z

Just some notes:
Tuesday (08/23/22):

Dorota and I found that we used the same code, same data, but got different results. Dorota had some errors in the first level set contrasts. I don't have such errors. She uses python 3.8, I use 3.7
Dorota couldn't get the same error as I do at the second level because her workflow hadn't pass the first level yet

Thursday (08/25/22)

Dorota exported the notebook to .py and her errors for the firstlevel contrasts seems have gone.
So I exported my notebook to .py too, but my error at the second level still exists.
Dorota once mentioned that checksum can help identify whether there is something wrong. So I print checksum (I changed some pydra file on my test branch) and find that secondlevel_estimation, the problematic node, has different checksum before and after running. (this information probably not very useful since we've known this node has problems?)
one thing I don't understand is why only secondlevel_estimation triggers expand_workflow, while other nodes don't. expand_workflow is only reached by Workflow._run_task

Answer 8 · 2022-08-26T11:33:31.000Z

@yibeichan - expand_workflow should be run only when the Workflow is run, not for every node. If you run wf_firstlevel only it should expand as well.

regarding the checksum - it is possible that the node can have a different checksum if the input is not retrieved yet. Once the full input is set the checksum should not change.

Answer 9 · 2022-08-30T06:15:16.000Z

more notes:
I think the problem resides in pydra workflow. The following is a sketch of the problem.
Let's call our problematic node PN

PN is a task, not a workflow, but is a node in a workflow
PN is the 3rd node in the workflow wf, where the 1st node is a task and the 2nd node is a workflow wf-1.
The inputs of PN are the outputs of wf-1 (wf-1 used split and combine, this can be important)
PN works fine as a standalone task, producing outputs
PN can also produce outputs/results when it's running in wf at the node level (I can print its results right after it's executed as a node)
PN has problems when (1) passing its output to the next node in wf or (2) passing its outputs to wf as the final outputs. (1)&(2) essentially are the same- the results of PN become None at this point`

My guess is that something is wrong with the connections/edges in the workflow/graph that PN can't properly connect to its next node (or the final step). This is highly likely related to the fact that wf-1 uses split and combine because:

if I put a test node-TN, which doesn't use outputs from wf-1 as its inputs, right after wf-1, this TN works okay, no problem.
if wf-1 doesn't use split & combine, and PN uses wf-1's outputs, PN will work fine. (The 6th tutorial-first_level glm is an example)

So my hypothesis is that the usage of split and combine in a node A may cause connection problems for other nodes which use A's outputs. Need more tests.

to Dorota:
re expand_workflow, I guess I haven't fully understood async/await so I got confused by await expand_workflow. I'm not sure who is waiting for whom....
re checksum, make sense, I noticed that PN has two checksums (pre & post), pointing to two folders. However, only the pre checksum folder exists. The post one doesn't.... I guess it's what you said the input is not retrieved yet
See you tomorrow!