possible discrepancy in count filtering for bias model?
cmlakhan opened this issue · 1 comments
cmlakhan commented
in the FAQ when choosing the bias threshold factor it states the following
Non peak regions used in bias model training are filtered based on The bias_threshold_factor which is used as follows.
The regions with total counts greater than 0.1_quantile(total counts in peaks)*bias_threshold_factor are filtered out.
However when I look at the code for find_bias_hyperparams code
I see the following:
# step 2 filtering: filter nonpeaks that have counts less than a threshold_factor (minimum of peak counts)
peak_cnts, _ = param_utils.get_seqs_cts(genome, bw, peaks, args.inputlen, args.outputlen)
nonpeak_cnts, _ = param_utils.get_seqs_cts(genome, bw, nonpeaks, args.inputlen, args.outputlen)
assert(len(peak_cnts) == peaks.shape[0])
assert(len(nonpeak_cnts) == nonpeaks.shape[0])
final_cnts = nonpeak_cnts
counts_threshold = np.quantile(peak_cnts,0.01)*args.bias_threshold_factor
assert(counts_threshold > 0) # counts threshold is 0 - all non peaks will be filtered!
final_cnts = final_cnts[final_cnts < counts_threshold]
Is it possible that there is a bug in the code or that you meant 0.01 in the FAQ?
panushri25 commented
Ah thats a typo in the FAQ, thanks will fix this.