Workflow for two-level GLM (nilearn tutorial)
yibeichan opened this issue · 9 comments
Hello, I'm working on doing a two-level GLM nilearn tutorial for Pydra. But I've been having problems with a node in the workflow for more than two weeks (@djarecka, @htwangtw, and I have discussed it over past weeks). We've tried a couple ways to debug but haven't solved the problem yet. Here is a summary about what's happening:
In this two-level GLM tutorial, the analytical logic is:
- download data (a task function)
- run first level for each subject (a workflow)
- starting the second level estimation using results from the 2nd step (a task function)
- multiple statistical testings.... (other tasks...)
- step 3&4 together can be set up as a workflow, but let's talk about task function first for the sake of the current issue.
I write the whole workflow using nilearn and make sure the code itself is error-free. Each pydra task function can run successful as a standalone task. When I connect tasks into workflow, errors come up.
- The workflow works fine to get first level estimation
- The problem happens at
secondlevel_estimation
see here cell 18.
2.1 The input ofsecondlevel_estimation
is a list of firstlevel model, the outputs are (a) secondlevel mask from SecondLevelModel() and (b) secondlevel stats estimations
2.2 If runsecondlevel_estimation
as a task (outside of workflow), it printsresults
fine.
2.3 If runsecondlevel_estimation
as a node in the workflow, it runs and can print out outcomes at every step, but it CANNOTreturn
results. Here (last line) is the error'NoneType' object has no attribute 'errored'
2.4 Sincesecondlevel_estimation
can't return results, it has problems linking to the next node, so we will get error asgraph is not empty, but not able to get more tasks - something is wrong (e.g. with the filesystem)
see the last cell output here
Now the question is why this secondlevel_estimation
node can't return results even if the output at each of its step can be print out.
some comments to the notebook (before I forget):
- I still had issue with confounds files, and I had to add:
conf_list = glob.glob(os.path.join(fmriprep_path, '*', 'func', '*_desc-confounds_timeseries.tsv'))
conf_list.sort()
- it might be better to use the same notation for
subj_id
as BIDS does and starts from 1, I really got confused that whensubj_id=1
I needsub-02
files
Thank you Dorota! Yes, you're right. I just checked load_confounds_strategy
. It says
As long as the image file, confound related tsv and json are in the same directory with BIDS-compliant names, nilearn.interfaces.fmriprep.load_confounds can retrieve the relevant files correctly.
So load_confounds_strategy
automatically detects other files. I downloaded confound files in the previous version of my code (removed now), I guess this function found them. Okay, I'll add confounds in the code.
Re subj_id
, I'll better document it in the tutorial! (I didn't put any documentation in this test notebook... sorry
To use load confounds correctly, the easiest way is to keep the fmriprep output untouched.
If that's not how you set up the workflow, we can simply review the confound regressors you use.
Hi haoting, the confound file problem solved. I downloaded them through datalad before I use load_confounds_strategy
, so it works fine on my laptop. Dorota was testing my notebook where I removed the code for downloading confounds (I thought it's not needed), so she got some errors. She added & downloaded now. I'll add the datalad command for downloading confounds in the tutorial (for the general use).
The error we have now is not from confounds but from the worklow itself.
I see! Glad that's solved!
I'm still debugging it, but I think there is something wrong with calculating the hash value of inputs when pandas.DF is involved...
Just some notes:
Tuesday (08/23/22):
- Dorota and I found that we used the same code, same data, but got different results. Dorota had some errors in the first level set contrasts. I don't have such errors. She uses python 3.8, I use 3.7
- Dorota couldn't get the same error as I do at the second level because her workflow hadn't pass the first level yet
Thursday (08/25/22)
- Dorota exported the notebook to
.py
and her errors for the firstlevel contrasts seems have gone. - So I exported my notebook to
.py
too, but my error at the second level still exists. - Dorota once mentioned that
checksum
can help identify whether there is something wrong. So I printchecksum
(I changed some pydra file on my test branch) and find thatsecondlevel_estimation
, the problematic node, has differentchecksum
before and after running. (this information probably not very useful since we've known this node has problems?) - one thing I don't understand is why only
secondlevel_estimation
triggersexpand_workflow
, while other nodes don't.expand_workflow
is only reached byWorkflow._run_task
@yibeichan - expand_workflow
should be run only when the Workflow
is run, not for every node. If you run wf_firstlevel
only it should expand as well.
regarding the checksum - it is possible that the node can have a different checksum if the input is not retrieved yet. Once the full input is set the checksum should not change.
more notes:
I think the problem resides in pydra workflow. The following is a sketch of the problem.
Let's call our problematic node PN
PN
is a task, not a workflow, but is a node in a workflowPN
is the 3rd node in the workflowwf
, where the 1st node is a task and the 2nd node is a workflowwf-1
.- The inputs of
PN
are the outputs ofwf-1
(wf-1
used split and combine, this can be important) PN
works fine as a standalone task, producing outputsPN
can also produce outputs/results when it's running inwf
at the node level (I can print its results right after it's executed as a node)PN
has problems when (1) passing its output to the next node inwf
or (2) passing its outputs towf
as the final outputs. (1)&(2) essentially are the same- the results ofPN
become None at this point`
My guess is that something is wrong with the connections/edges in the workflow/graph that PN
can't properly connect to its next node (or the final step). This is highly likely related to the fact that wf-1
uses split
and combine
because:
- if I put a test node-
TN
, which doesn't use outputs fromwf-1
as its inputs, right afterwf-1
, thisTN
works okay, no problem. - if
wf-1
doesn't use split & combine, andPN
useswf-1
's outputs,PN
will work fine. (The 6th tutorial-first_level glm is an example)
So my hypothesis is that the usage of split
and combine
in a node A
may cause connection problems for other nodes which use A
's outputs. Need more tests.
to Dorota:
re expand_workflow
, I guess I haven't fully understood async/await
so I got confused by await expand_workflow
. I'm not sure who is waiting for whom....
re checksum
, make sense, I noticed that PN
has two checksums (pre & post), pointing to two folders. However, only the pre
checksum folder exists. The post one doesn't.... I guess it's what you said the input is not retrieved yet
See you tomorrow!