unknown base "N" error.
Closed this issue · 6 comments
Hello,
Wonderful tool, it's been working very well with some of my subseted fastq file.
I tried running it on my NGS data. However, I'm getting this error:
Error: BiocParallel errors
1 remote errors, element index: 1
0 unevaluated and other errors
first remote error:
Error in eval(expr, envir, enclos): unknown base 'N'
I'm thinking it has to do with some N bases in my sequences. Interestingly, a smaller fastq file with similar
sequences work.
Let me know of any thoughts on how to fix this. Thanks for developing this tool.
William
Hm. I thought I fixed this in the last release cycle:
library(screenCounter)
# Creating an example dual barcode sequencing experiment.
known.pool <- c("AGAGAGAGA", "CTCTCTCTC",
"GTGTGTGTG", "CACACACAC")
# Adding some N's to the sequence data.
N <- 1000
barcodes <- sprintf("CAGCTANNCGTACG%sCCAGCTCGANNTCG",
sample(known.pool, N, replace=TRUE))
names(barcodes) <- seq_len(N)
library(Biostrings)
tmp <- tempfile(fileext=".fastq")
writeXStringSet(DNAStringSet(barcodes), filepath=tmp, format="fastq")
# Counting the combinations.
countSingleBarcodes(tmp, choices=known.pool,
template="CGTACGNNNNNNNNNCCAGCTC")
## DataFrame with 4 rows and 2 columns
## choices counts
## <character> <integer>
## 1 AGAGAGAGA 270
## 2 CTCTCTCTC 224
## 3 GTGTGTGTG 262
## 4 CACACACAC 244
Make sure you're running the latest version (1.2.0) from Bioconductor.
One last question: is it possible to extract the read ID for each barcode?
Currently not, it's all aggregated in the underlying C++ libraries.
I suppose we could report the read names associated with each barcode, but that could use an awful lot of memory for a deeply sequenced experiment. There may or may not be a better way to do what you actually want to do.
Hello,
Just to let you know that this fixed it for me.
Works like a charm.
William
Ok, great.
As for the other question: when you have more clarity on the nature of the problem, make another issue and we can see what we can do. It may be possible to adapt the C++ code underlying the countCombinatorialBarcodes
function so that it captures the combination of genotype with a random barcode (assuming that we're dealing with a simple SNP).