instadeepai/nucleotide-transformer

The length of input sequence for SegmentNT

Closed this issue · 1 comments

hezt commented

Hello Team,

I'm trying to use your model to predict splice sites on custom sequences. Would you please share if there's any limit for input sequence, such as length and context. Like for SpliceAI, it needs 5000bp contexts on each side, do you have any requirement?

Thanks
Zitong

Hello @hezt ,

The SegmentNT model we released has been trained on sequences of 30,000 bp and has been evaluated on {10kbp, 20kbp,.., 100kbp}. The resulting performance is shown in Fig3, Panel d. There is no limit or constraint on the input sequence length (except the accelerator memory that is going to run out eventually), however from the figure you can see that the model reaches its best performance when evaluated on sequences of 50,000 bp so I would advise you use this sequence length to get optimal results!

Hope this helps,
Hugo