aertslab/cisTopic

The choice of the bed file

ytang0831 opened this issue · 5 comments

Hi!
The tutorial writes "For initializing the cisTopic object:
Starting from the bam files and predefined regions [Reference running time: 0.4 sec/cell]
pathToBams <- 'data/bamfiles/'
bamFiles <- paste(pathToBams, list.files(pathToBams), sep='')
regions <- 'data/regions.bed' "
and your paper said"a BED file with candidate regulatory regions (for example, from peak calling on the aggregate or the bulk profile)."

So, if my single cell data is marked by H3K36me3, should I use bulk H3K36me3 WT data to call peaks for region bed file ? Or use aggregated single cell data?

Hi @ytang0831 !

You can use any of them, if you have bulk I would use those. Another option that we have seen gives more resolution is to use a set of predefined cis-regulatory regions, such as SCREEN regions (https://screen.encodeproject.org) or cisTarget regions (available in this package for hg19, dm3, dm6 and mm9, e.g. data(hg19_CtxRegions)).

Hope this helps!

C

Thanks for your quick reply!
By the way, when I use createcisTopicObjectFromBAM funciton, region bed file is bulk data peaks, and I get a very low Successfully assigned alignments rates, e.g
||
|| Annotation : R data.frame ||
|| Dir for temp files : . ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Paired-end : yes ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| Read reduction : to 5' end ||
|| ||
|| Chimeric reads : counted ||
|| Both ends mapped : not required

Total alignments : 18289051 ||
|| Successfully assigned alignments : 105211 (0.6%) ||
|| Running time : 0.88 minutes

Is this normal?

mmmm not really... What is the correlation between your single cell aggregate and your bulk?

C

emmm, This should be the reason for the low correlation between single cell data and bulk data.Since I use the build-in data mm9_CtxRegions, the result improved to 7%
Thanks for help me a lot !

I'm also confused about the command path_to_signatures <- 'data/ChIP-seq_signatures/'
Which files should be signatures? Histone marker bed?(eg. use encode H3K36me3 to call peaks)? While do you have build-in data of signatures?😂

I have known the signatures meaning, a very basic question!