calico/basenji

question about clip threshold in basenji/enformer.

Closed this issue · 2 comments

Hi, @davek44. I've noticed the clip_soft and clip_extreme options in the basenji_data_read.py. I'm wondering is there any standrad ways to define the threshold(for I'm using the chip-seq bigwig from plant). And whether the clip matters a lot?

clip_extreme is meant to protect the training process from truly weird genomic regions with super high coverage. I don't touch it much. clip_soft probably doesn't matter a ton for ChIP-seq where the dynamic range isn't as large as RNA abundance statistics. I generally set it somewhere from 32-128, but it depends on how deeply sequenced your samples are. You could try a couple of values and make sure the Spearman correlations are robust.

okay,got it. thanks for your reply.