estimating promoter activity from given bam files
ninjaxfy opened this issue · 4 comments
Hi,
I've ran the whole workflow with the given junction files. And I was quiet confused that when I use the given bam file and the given code to run the flow, why I just got the null result(eg. 0 , na and null).
Hi @ninjaxfy thanks for your interest in proActiv! Could you provide a minimal reproducible example / the code you used, along with session information, so that we can see what the problem is?
@jleechung thx a lot for reply!!!
I am a student and just began to learn bioinformatics systematically, hence I might made mistakes, and I hope it won't take you too much time.
I have another question in the function preparePromoterAnnotation, there is a argument "species", does it means that ProActiv can only run those 12 species? and whether it can run other animals' rna-seq data?(eg. chicken, chimp ...)
And the end of the letter is my code in R(mostly copy in the homepage of proactiv ).
##test workflow with bamfile in package
##A complete workflow to identify alternative promoter usage
library(proActiv)
From BAM files - genome parameter must be provided
files <- list.files(system.file('extdata/testdata/bam', package = 'proActiv'), full.names = TRUE)
##Preparing promoter annotations from a gtf file or txdb file
From GTF file path
gtf.file <- system.file('extdata/vignette/annotations/gencode.v34.annotation.subset.gtf.gz',
package = 'proActiv')
promoterAnnotation.gencode.v34.subset <- preparePromoterAnnotation(file = gtf.file,
species = 'Homo_sapiens')
From TxDb object
txdb.file <- system.file('extdata/vignette/annotations/gencode.v34.annotation.subset.sqlite',
package = 'proActiv')
txdb <- loadDb(txdb.file)
promoterAnnotation.gencode.v34.subset <- preparePromoterAnnotation(txdb = txdb,
species = 'Homo_sapiens')
result <- proActiv(files = files,
promoterAnnotation = promoterAnnotation.gencode.v34.subset,
genome = 'hg38')
filter result
Removes single-exon transcripts / promoters by eliminating promoter counts that are NA
result <- result[complete.cases(assays(result)$promoterCounts),
bamtestcode.txt
]
Hi @ninjaxfy, no worries.
For preparePromoterAnnotation
for other species, please see issue #25.
For the bam result: the bam files provided with the package are really for testing, and so we subset the bam files to very small regions (for file size reasons). Most of the promoters are thus not quantified. You can extract those that are expressed as follows, and find that there are seven promoters found active in at least one of the samples.
> files <- list.files(system.file('extdata/testdata/bam',
+ package = 'proActiv'),
+ full.names = TRUE)
>
> annotation <- promoterAnnotation.gencode.v34.subset
>
> result <- proActiv(files = files,
+ promoterAnnotation = annotation,
+ genome = 'hg38')
> result <- result[complete.cases(assays(result)$promoterCounts), ]
> promoterCounts <- assays(result)$promoterCounts
> promoterCounts <- promoterCounts[rowSums(promoterCounts) > 0, ]
> promoterCounts
SGNEx_A549_Illumina_replicate1.run1_genome_subset
136 157
137 50
138 20
139 3
141 2
177 109
709 0
SGNEx_MCF7_Illumina_replicate2.run1_genome_subset
136 35
137 7
138 2
139 0
141 1
177 231
709 2
session info:
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=English_Singapore.1252 LC_CTYPE=English_Singapore.1252
[3] LC_MONETARY=English_Singapore.1252 LC_NUMERIC=C
[5] LC_TIME=English_Singapore.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] BSgenome.Hsapiens.UCSC.hg38_1.4.3 BSgenome_1.58.0
[3] rtracklayer_1.49.5 Biostrings_2.58.0
[5] XVector_0.30.0 GenomicRanges_1.42.0
[7] GenomeInfoDb_1.26.7 IRanges_2.24.1
[9] S4Vectors_0.28.1 BiocGenerics_0.36.1
[11] proActiv_1.3.4 testthat_3.0.4
loaded via a namespace (and not attached):
[1] bitops_1.0-7 matrixStats_0.60.0
[3] fs_1.5.0 usethis_2.0.1
[5] devtools_2.4.2 bit64_4.0.5
[7] RColorBrewer_1.1-2 progress_1.2.2
[9] httr_1.4.2 rprojroot_2.0.2
[11] tools_4.0.3 utf8_1.2.2
[13] R6_2.5.1 KernSmooth_2.23-17
[15] DBI_1.1.1 colorspace_2.0-2
[17] withr_2.4.2 tidyselect_1.1.1
[19] prettyunits_1.1.1 processx_3.5.2
[21] DESeq2_1.30.1 curl_4.3.2
[23] bit_4.0.4 compiler_4.0.3
[25] cli_3.0.1 Biobase_2.50.0
[27] xml2_1.3.2 desc_1.3.0
[29] DelayedArray_0.16.3 caTools_1.18.2
[31] scales_1.1.1 genefilter_1.72.1
[33] callr_3.7.0 askpass_1.1
[35] rappdirs_0.3.3 Rsamtools_2.6.0
[37] stringr_1.4.0 pkgconfig_2.0.3
[39] sessioninfo_1.1.1 MatrixGenerics_1.2.1
[41] dbplyr_2.1.1 fastmap_1.1.0
[43] rlang_0.4.11 rstudioapi_0.13
[45] RSQLite_2.2.7 generics_0.1.0
[47] gtools_3.9.2 BiocParallel_1.24.1
[49] dplyr_1.0.7 RCurl_1.98-1.3
[51] magrittr_2.0.1 GenomeInfoDbData_1.2.4
[53] Matrix_1.3-4 Rcpp_1.0.7
[55] munsell_0.5.0 fansi_0.5.0
[57] lifecycle_1.0.0 stringi_1.7.3
[59] SummarizedExperiment_1.20.0 zlibbioc_1.36.0
[61] gplots_3.1.1 pkgbuild_1.2.0
[63] BiocFileCache_1.14.0 grid_4.0.3
[65] blob_1.2.2 crayon_1.4.1
[67] lattice_0.20-41 splines_4.0.3
[69] GenomicFeatures_1.42.3 annotate_1.68.0
[71] hms_1.1.0 locfit_1.5-9.4
[73] ps_1.6.0 pillar_1.6.2
[75] geneplotter_1.68.0 biomaRt_2.46.3
[77] pkgload_1.2.1 XML_3.99-0.6
[79] glue_1.4.2 data.table_1.14.0
[81] remotes_2.4.0 vctrs_0.3.8
[83] gtable_0.3.0 openssl_1.4.4
[85] purrr_0.3.4 assertthat_0.2.1
[87] cachem_1.0.5 ggplot2_3.3.5
[89] xtable_1.8-4 survival_3.2-7
[91] tibble_3.1.3 GenomicAlignments_1.26.0
[93] AnnotationDbi_1.52.0 memoise_2.0.0
[95] ellipsis_0.3.2
>
@jleechung sorry to reply late. your advice indeed do me a great favor! thx again!
Best wishs!