Workflow for two-level GLM (nilearn tutorial)
yibeichan opened this issue · 9 comments
Hello, I'm working on doing a two-level GLM nilearn tutorial for Pydra. But I've been having problems with a node in the workflow for more than two weeks (@djarecka, @htwangtw, and I have discussed it over past weeks). We've tried a couple ways to debug but haven't solved the problem yet. Here is a summary about what's happening:
In this two-level GLM tutorial, the analytical logic is:
- download data (a task function)
- run first level for each subject (a workflow)
- starting the second level estimation using results from the 2nd step (a task function)
- multiple statistical testings.... (other tasks...)
- step 3&4 together can be set up as a workflow, but let's talk about task function first for the sake of the current issue.
I write the whole workflow using nilearn and make sure the code itself is error-free. Each pydra task function can run successful as a standalone task. When I connect tasks into workflow, errors come up.
- The workflow works fine to get first level estimation
- The problem happens at
secondlevel_estimationsee here cell 18.
2.1 The input ofsecondlevel_estimationis a list of firstlevel model, the outputs are (a) secondlevel mask from SecondLevelModel() and (b) secondlevel stats estimations
2.2 If runsecondlevel_estimationas a task (outside of workflow), it printsresultsfine.
2.3 If runsecondlevel_estimationas a node in the workflow, it runs and can print out outcomes at every step, but it CANNOTreturnresults. Here (last line) is the error'NoneType' object has no attribute 'errored'
2.4 Sincesecondlevel_estimationcan't return results, it has problems linking to the next node, so we will get error asgraph is not empty, but not able to get more tasks - something is wrong (e.g. with the filesystem)see the last cell output here
Now the question is why this secondlevel_estimation node can't return results even if the output at each of its step can be print out.
some comments to the notebook (before I forget):
- I still had issue with confounds files, and I had to add:
conf_list = glob.glob(os.path.join(fmriprep_path, '*', 'func', '*_desc-confounds_timeseries.tsv'))
conf_list.sort()
- it might be better to use the same notation for
subj_idas BIDS does and starts from 1, I really got confused that whensubj_id=1I needsub-02files
Thank you Dorota! Yes, you're right. I just checked load_confounds_strategy. It says
As long as the image file, confound related tsv and json are in the same directory with BIDS-compliant names, nilearn.interfaces.fmriprep.load_confounds can retrieve the relevant files correctly.
So load_confounds_strategy automatically detects other files. I downloaded confound files in the previous version of my code (removed now), I guess this function found them. Okay, I'll add confounds in the code.
Re subj_id, I'll better document it in the tutorial! (I didn't put any documentation in this test notebook... sorry
To use load confounds correctly, the easiest way is to keep the fmriprep output untouched.
If that's not how you set up the workflow, we can simply review the confound regressors you use.
Hi haoting, the confound file problem solved. I downloaded them through datalad before I use load_confounds_strategy, so it works fine on my laptop. Dorota was testing my notebook where I removed the code for downloading confounds (I thought it's not needed), so she got some errors. She added & downloaded now. I'll add the datalad command for downloading confounds in the tutorial (for the general use).
The error we have now is not from confounds but from the worklow itself.
I see! Glad that's solved!
I'm still debugging it, but I think there is something wrong with calculating the hash value of inputs when pandas.DF is involved...
Just some notes:
Tuesday (08/23/22):
- Dorota and I found that we used the same code, same data, but got different results. Dorota had some errors in the first level set contrasts. I don't have such errors. She uses python 3.8, I use 3.7
- Dorota couldn't get the same error as I do at the second level because her workflow hadn't pass the first level yet
Thursday (08/25/22)
- Dorota exported the notebook to
.pyand her errors for the firstlevel contrasts seems have gone. - So I exported my notebook to
.pytoo, but my error at the second level still exists. - Dorota once mentioned that
checksumcan help identify whether there is something wrong. So I printchecksum(I changed some pydra file on my test branch) and find thatsecondlevel_estimation, the problematic node, has differentchecksumbefore and after running. (this information probably not very useful since we've known this node has problems?) - one thing I don't understand is why only
secondlevel_estimationtriggersexpand_workflow, while other nodes don't.expand_workflowis only reached byWorkflow._run_task
@yibeichan - expand_workflow should be run only when the Workflow is run, not for every node. If you run wf_firstlevel only it should expand as well.
regarding the checksum - it is possible that the node can have a different checksum if the input is not retrieved yet. Once the full input is set the checksum should not change.
more notes:
I think the problem resides in pydra workflow. The following is a sketch of the problem.
Let's call our problematic node PN
PNis a task, not a workflow, but is a node in a workflowPNis the 3rd node in the workflowwf, where the 1st node is a task and the 2nd node is a workflowwf-1.- The inputs of
PNare the outputs ofwf-1(wf-1used split and combine, this can be important) PNworks fine as a standalone task, producing outputsPNcan also produce outputs/results when it's running inwfat the node level (I can print its results right after it's executed as a node)PNhas problems when (1) passing its output to the next node inwfor (2) passing its outputs towfas the final outputs. (1)&(2) essentially are the same- the results ofPNbecome None at this point`
My guess is that something is wrong with the connections/edges in the workflow/graph that PN can't properly connect to its next node (or the final step). This is highly likely related to the fact that wf-1 uses split and combine because:
- if I put a test node-
TN, which doesn't use outputs fromwf-1as its inputs, right afterwf-1, thisTNworks okay, no problem. - if
wf-1doesn't use split & combine, andPNuseswf-1's outputs,PNwill work fine. (The 6th tutorial-first_level glm is an example)
So my hypothesis is that the usage of split and combine in a node A may cause connection problems for other nodes which use A's outputs. Need more tests.
to Dorota:
re expand_workflow, I guess I haven't fully understood async/await so I got confused by await expand_workflow. I'm not sure who is waiting for whom....
re checksum, make sense, I noticed that PN has two checksums (pre & post), pointing to two folders. However, only the pre checksum folder exists. The post one doesn't.... I guess it's what you said the input is not retrieved yet
See you tomorrow!