kundajelab/chrombpnet

Input for training cell type-specific models

Closed this issue · 2 comments

Hi! I would like to train models for each cell type found in my scATAC-seq dataset, and I am a bit unsure what is best to use for input peaks and non-peaks. I have created a fragment file for each cell type. Should I call peaks and create non-peaks from each of these, or should I use a set of peaks called on the whole dataset? I tried training models for both options and it looks like the latter, with the common set of peaks, gives me slightly better results (R = .735 vs R = .714). But is it the right way to go? Or is it more appropriate to call peaks on each cell type fragment file separately?

Hey, for a cell-type specific model, call peaks on each cell type file separately.

Please reopen this if you have any more questions!