why multiple .bam files(>=7) cannot work
Opened this issue · 14 comments
I'm putting multiple .bam files as inputs for reads, but I'm getting an error. When I take less than 10 or so files as readings for reads, it runs successfully, don't know why, A little anxious, very eager and looking forward to your reply!
my code:se1 <- bambu(reads = list.files(pattern="GV"), annotations = bambuAnnotations, genome = fa.file)
error:Error in checkInputs(annotations, reads, readClass.outputDir = rcOutDir, :
Reads should either be: a vector of paths to .bam files, a vector of paths to Bambu RCfile .rds files, or a list of loaded Bambu RCfiles
my code:se1 <- bambu(reads = "/public/home/ymwang/PacBio/6_bambu/bam/.bam", annotations = bambuAnnotations, genome = fa.file)
error:Error: BiocParallel errors
1 remote errors, element index: 1
0 unevaluated and other errors
first remote error:
Error in value[3L]: failed to open BamFile: file(s) do not exist:
‘/public/home/ymwang/PacBio/6_bambu/bam/.bam’
my code:se1 <- bambu(reads = c("GV10O_10.align.bam","GV11O_11.align.bam","GV12O_12.align.bam","GV13O_13.align.bam","GV14O_14.align.bam","GV15O_15.align.bam","GV16O_16.align.bam","GV1O_1.align.bam","GV20O_20.align.bam","GV22O_22.align.bam","GV23O_23.align.bam","GV2O_2.align.bam","GV3O_3.align.bam","GV4O_4.align.bam","GV5O_5.align.bam","GV7O_7.align.bam","GV8O_8.align.bam","GV9O_9.align.bam"), annotations = bambuAnnotations, genome = fa.file)
error:
Hi,
For the first code could you show me what list.files(pattern="GV") outputs, is it possible you have another file in your working directory with GV in the name?
The second code is an expected error because I assume there is no file called .bam
The third looks fine to me, but unfortunately the error is missing in your post. Would you be able to add that in so I might see what could be happening?
Kind Regards,
Andre Sim
Hi,
Unfortunately I cannot see it. Was it in your last comment or did you send it another way. Is it possible some syntax might be formatting it in a way that makes in invisible on github? Perhaps a screenshot of the error might work?
Below is what I see (the "..." in the final comment makes the email you replied to appear)
Hi,
I have not seen this issue before so I am not exactly sure how to solve it, however here are somethings to try that might circumvent the problem, or provide me a clue on if/where a bug might be in the code.
- add rcOutDir = "/path/somewhere/" to bambu()
The only difference when running with 10 files in the code is that it outputs to a tmp directory. By setting a manual output directory that might skip that issue - Try a different combination of 11 rc files and see if that works. One issue cause be that 1 of the bam files from the 18 is not compatible with bambu for some reason. Use the 10 that worked + 1 that was missing.
- Run each of the bam files individually for read class construction (set quant = FALSE, discovery = FALSE). Then combine them for the subsequent steps.
i.e (fill in the ... with the rest of the files)
se1 <- bambu(reads = "GV10O_10.align.bam", annotations = bambuAnnotations, genome = fa.file, quant = FALSE, discovery = FALSE)
se2 <- bambu(reads = "GV11O_11.align.bam"), annotations = bambuAnnotations, genome = fa.file, quant = FALSE, discovery = FALSE)
...
se_final = bambu(reads = c(se1, se2, se3, ....), annotations = bambuAnnotations, genome = fa.file)
I also notice some strange warnings in the run that did work which might be a hint. Are you expecting a very low number < 50 known transcripts to be in your bam files? Are you able to check that the chromosome names in your gtf file match that of your fasta file, and that you are using the same genome fasta file as the bambu input that use used to align your reads to. A common issue is a gtf file has "Chr1" as a name and the genome is "1".
For the run with <10 bam files that did work. Could you share the output from metadata(se1)$warnings
I hope to solve this for you and future users so please let me know what from the above works and doesn't work.
Hi,
All the gene counts being named bambu means it has predicted a lot of novel genes. I believe this is related to the warning I flagged above. As in standard runs of Bambu this warning should not appear.
"I also notice some strange warnings in the run that did work which might be a hint. Are you expecting a very low number < 50 known transcripts to be in your bam files? Are you able to check that the chromosome names in your gtf file match that of your fasta file, and that you are using the same genome fasta file as the bambu input that use used to align your reads to. A common issue is a gtf file has "Chr1" as a name and the genome is "1"."
Double check that the gtf file you are using matches the fasta file as this is the most common cause, and that the fasta file you are using is the exact same file used for the genome (not transcriptome) alignments.
If neither of the above is the case, I will need to look at some of the outputs to try and figure out what is happening.
So that I can be of the most help please include:
- The script you used to run bambu including the prepareAnnotations step
- The first 10 lines of the gtf file you are using
- The warnings and output from one of these lines
se1 <- bambu(reads = "GV10O_10.align.bam", annotations = bambuAnnotations, genome = fa.file, quant = FALSE, discovery = FALSE, verbose = TRUE)
print(metadata(se1)$warnings)
- Bonus would be if you could attach the output of this line.
saveRDS(se1.rds, "outputPath/se1.rds")
Kind Regards,
Andre Sim
Hi,
The formatting of your last comment is a bit hard for me to parse (attached picture).
From what I gather you are surprised at the results that all the marker genes you are detecting are novel genes. As I mentioned in some of my earlier comments I think there may be something going wrong with you run as there were some concerning warnings messages which is the likely reason only bambu genes are being identified. For me to be able to help you please attach the following.
- The script you used to run bambu including the prepareAnnotations step
- The first 10 lines of the gtf file you are using
- The warnings and output from one of these lines
se1 <- bambu(reads = "GV10O_10.align.bam", annotations = bambuAnnotations, genome = fa.file, quant = FALSE, discovery = FALSE, verbose = TRUE)
print(metadata(se1)$warnings)
- Bonus would be if you could attach the output of this line.
saveRDS(se1.rds, "outputPath/se1.rds")
Kind Regards,
Andre Sim
Hi,
I am not sure I understand your question. Are you able to attach in image or an example of what you mean to better describe it?