jmschrei/tfmodisco-lite

Running tfmodisco-lite for multiple tasks?

ioansarr opened this issue · 4 comments

Hi @jmschrei

Thank you for your great work! I was wondering whether modisco-lite is able to handle multiple tasks (i.e., multiple instances of attribution scores) for the same sequences. This was still possible with the original modisco package and a key motivation for me to shift to the more efficient implementation. However, after browsing the code, it's not clear to me whether this is possible at all with this rewrite. Apologies in advance if I missed something. If so, could you please provide an example of how modisco-lite can be used with multiple tasks?

Thank you again for making the code slimmer and more efficient!

Hi @ioansarr

Unfortunately, I took out that functionality because no one in our group was using it and it significantly complicated the code. Potentially, you can concatenate the attributions along the channel axis, e.g. for two axes you'll have 8 channels instead of 4. I haven't tried it though.

Hi @jmschrei

Thank you for the very quick reply! Pity that this was taken off.

The main motivation of doing a single run for multiple tasks would be to summarise the activity of the same metaclusters and patterns across classes. I need to think more about your suggestion (and potentially give it a go), but wouldn't concatenating the channels lead to different patterns being discovered for each class? (e.g., the same GATA motif would appear as 3,1,4,1 in task1 and 7,5,8,5 in task2).

Sorry for the late reply. Yes, that's true -- my idea might not work in practice because it would turn clusters into combinations of tasks, e.g. if a seqlet was active in both tasks it would belong to one cluster, but if it was only active in one task it might belong to a different cluster. If you want to use tfmodisco-lite it might be best to independently run modisco for each task and then cross-reference the seqlets. Unfortunately, adding in multi-task support would be too much work for us to do so I don't think that will be added back in.

Thank you for the reply @jmschrei! Indeed, I am currently trying to run modisco separately for each task and then identify shared seqlets across tasks in a second step. Thank you again for developing a great tool!