jmschrei/tfmodisco-lite

Doesn't work for the the sequences in variable lengths

ruizhideng opened this issue · 2 comments

Hi!

Thanks for the amazing work!

I am testing the original notebook from https://github.com/kundajelab/tfmodisco/blob/master/examples/simulated_TAL_GATA_deeplearning/TF_MoDISco_TAL_GATA.ipynb, it seems the original GitHub is deprecated.

In the original notebook, you mentioned the TF-modisco pipeline also works for the sequences in different lengths. However, when I was testing the notebook with input data with shape of [100, length, 4], the length ranges from 500-1000 bp, it raised the ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (100,) + inhomogeneous part.
Then if I crop the sequences and contribution scores to the same length, it works again. I am wondering is there any version available for the sequences with different length?
https://github.com/kundajelab/tfmodisco/blob/master/examples/simulated_TAL_GATA_deeplearning/TF_MoDISco_TAL_GATA.ipynb

Thank you again!

Best wishes,
Ruizhi

Hi Ruizhi

I didn't write that notebook or the original repository. This repo is a rewrite of that code that simplifies things significanty. One of the simplifications is that this only works on sequences of fixed length. Sorry about that.

Hi Jacob, thank you for the prompt reply!