Doesn't work for the the sequences in variable lengths
ruizhideng opened this issue · 2 comments
Hi!
Thanks for the amazing work!
I am testing the original notebook from https://github.com/kundajelab/tfmodisco/blob/master/examples/simulated_TAL_GATA_deeplearning/TF_MoDISco_TAL_GATA.ipynb, it seems the original GitHub is deprecated.
In the original notebook, you mentioned the TF-modisco pipeline also works for the sequences in different lengths. However, when I was testing the notebook with input data with shape of [100, length, 4], the length ranges from 500-1000 bp, it raised the ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (100,) + inhomogeneous part.
Then if I crop the sequences and contribution scores to the same length, it works again. I am wondering is there any version available for the sequences with different length?
https://github.com/kundajelab/tfmodisco/blob/master/examples/simulated_TAL_GATA_deeplearning/TF_MoDISco_TAL_GATA.ipynb
Thank you again!
Best wishes,
Ruizhi
Hi Ruizhi
I didn't write that notebook or the original repository. This repo is a rewrite of that code that simplifies things significanty. One of the simplifications is that this only works on sequences of fixed length. Sorry about that.
Hi Jacob, thank you for the prompt reply!