kundajelab/chrombpnet

possible discrepancy in count filtering for bias model?

cmlakhan opened this issue · 1 comments

in the FAQ when choosing the bias threshold factor it states the following

Non peak regions used in bias model training are filtered based on The bias_threshold_factor which is used as follows. 
The regions with total counts greater than 0.1_quantile(total counts in peaks)*bias_threshold_factor are filtered out.

However when I look at the code for find_bias_hyperparams code

I see the following:

    # step 2 filtering: filter nonpeaks that have counts less than a threshold_factor (minimum of peak counts)
    peak_cnts, _ = param_utils.get_seqs_cts(genome, bw, peaks, args.inputlen, args.outputlen)
    nonpeak_cnts, _ = param_utils.get_seqs_cts(genome, bw, nonpeaks, args.inputlen, args.outputlen)    
    assert(len(peak_cnts) == peaks.shape[0])
    assert(len(nonpeak_cnts) == nonpeaks.shape[0])

    final_cnts = nonpeak_cnts
    counts_threshold = np.quantile(peak_cnts,0.01)*args.bias_threshold_factor
    assert(counts_threshold > 0) # counts threshold is 0 - all non peaks will be filtered!
   
    final_cnts = final_cnts[final_cnts < counts_threshold]




Is it possible that there is a bug in the code or that you meant 0.01 in the FAQ?

Ah thats a typo in the FAQ, thanks will fix this.