addGeneAnnotation module can not annotate the genes on the minus strand
Closed this issue · 3 comments
Hi authors,
My studied organism is Schizosaccharomyces pombe . The latest GFF annotation file is from the Pombase website, not from Ensembl. So I construct the TxDb object by giving the GFF file. The BSgenome is also self constructed by using the BSgenome package. I want to find the spacer sequences targeting a specific gene guided by the website Design_CRISPRko_Cas9.
It is strange that the program worked very well for all genes on the plus strand, but it promotes an error for genes on the minus strand ( 'start' or 'end' cannot contain NAs).
The detailed procession is shown below.
> pombe_gff_file <- "./DY47073.gff3"
> pombe_txdb <- getTxDb(pombe_gff_file, organism=NA)
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
> pombe_grList <- TxDb2GRangesList(pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> library(BSgenome.Spombe.DY47073.4CrisprVerse)
> mybsgenome <- BSgenome.Spombe.DY47073.4CrisprVerse
> # gene on the plus strand
> pombe_ade6_gr <- queryTxObject(txObject=pombe_txdb,
+ featureType="cds",
+ queryColumn="gene_id",
+ queryValue="SPCC1322.13")
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> pombe_ade6_guideSet <- findSpacers(pombe_ade6_gr,
+ bsgenome=mybsgenome,
+ crisprNuclease=SpCas9)
>
> pombe_ade6_guideSet <- addGeneAnnotation(pombe_ade6_guideSet,
+ txObject=pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> # gene on the minus strand
> pombe_vtc4_gr <- queryTxObject(txObject=pombe_txdb,
+ featureType="cds",
+ queryColumn="gene_id",
+ queryValue="SPCC1322.14c")
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> pombe_vtc4_guideSet <- findSpacers(pombe_vtc4_gr,
+ bsgenome=mybsgenome,
+ crisprNuclease=SpCas9)
>
> pombe_vtc4_guideSet <- addGeneAnnotation(pombe_vtc4_guideSet,
+ txObject=pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
Error in.new_IRanges_from_start_end(start, end):
'start' or 'end' cannot contain NAs
Below are the part of two genes from gff file.
chrIII Liftoff gene 1363196 1364854 . + . ID=SPCC1322.13;Name=ade6;so_term_name=protein_coding_gene;coverage=1.0;sequence_ID=1.0;valid_ORFs=1;extra_copy_number=0;copy_num_ID=SPCC1322.13_0
chrIII Liftoff mRNA 1363196 1364854 . + . ID=SPCC1322.13.1;Parent=SPCC1322.13;product=phosphoribosylaminoimidazole carboxylase Ade6;matches_ref_protein=True;valid_ORF=True;extra_copy_number=0
chrIII Liftoff exon 1363196 1364854 . + . ID=SPCC1322.13.1:exon:1;Parent=SPCC1322.13.1;product=phosphoribosylaminoimidazole carboxylase Ade6;extra_copy_number=0
chrIII Liftoff CDS 1363196 1364854 . + 0 ID=SPCC1322.13.1:CDS:1;Parent=SPCC1322.13.1;product=phosphoribosylaminoimidazole carboxylase Ade6;extra_copy_number=0
chrIII Liftoff gene 1365284 1367449 . - . ID=SPCC1322.14c;Name=vtc4;so_term_name=protein_coding_gene;coverage=1.0;sequence_ID=1.0;valid_ORFs=1;extra_copy_number=0;copy_num_ID=SPCC1322.14c_0
chrIII Liftoff mRNA 1365284 1367449 . - . ID=SPCC1322.14c.1;Parent=SPCC1322.14c;product=vacuolar transporter chaperone (VTC) complex subunit;matches_ref_protein=True;valid_ORF=True;extra_copy_number=0
chrIII Liftoff exon 1365284 1367449 . - . ID=SPCC1322.14c.1:exon:1;Parent=SPCC1322.14c.1;product=vacuolar transporter chaperone (VTC) complex subunit;extra_copy_number=0
chrIII Liftoff CDS 1365284 1367449 . - 0 ID=SPCC1322.14c.1:CDS:1;Parent=SPCC1322.14c.1;product=vacuolar transporter chaperone (VTC) complex subunit;extra_copy_number=0
I don't know how to debug this. I am looking forward to your reply.
Hi all,
With the help of my colleague,this bug is now fixed. When constructing the TxDb from the GFF file, I forgot to include the chrominfo parameter, which includes the chromosome length. This is necessary for the genes on the minus strand. All in all, it is a powerful tool.
Hi all,
I also want to show my script for others to use.
> library(BSgenome.Spombe.DY47073.4CrisprVerse)
> mybsgenome <- BSgenome.Spombe.DY47073.4CrisprVerse
> data(SpCas9, package="crisprBase")
> gfffile <- "./DY47073.gff3"
> chrominfo <- data.frame(chrom=c("chrI","chrII","chrIII","chrrDNA_distal_contig1","chrrDNA_distal_contig2","chrMT"),
length=c(5623850,4644548,2508823,16591,8860,19433),
is_circular=c(FALSE,FALSE,FALSE,FALSE,FALSE,TRUE))
> pombe_txdb <- getTxDb(file=gfffile,organism=NA,chrominfo=chrominfo)
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
>
> pombe_grList <- TxDb2GRangesList(pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> GenomeInfoDb::genome(pombe_grList) <- "Spombe"
> GenomeInfoDb::seqlengths(pombe_grList)
chrI chrII chrIII
5623850 4644548 2508823
chrrDNA_distal_contig1 chrrDNA_distal_contig2 chrM
16591 8860 19433
###### test gene on the minus strand
> pombe_gene_gr <- queryTxObject(txObject=pombe_txdb,
+ featureType="transcript",
+ queryColumn="gene_id",
+ queryValue="SPCC1322.14c")
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> pombe_gene_guideSet <- findSpacers(pombe_gene_gr,
+ bsgenome=mybsgenome,
+ crisprNuclease=SpCas9)
Warning:
In .merge_two_Seqinfo_objects(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chrMT
- in 'y': chrM
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
> pombe_gene_guideSet <- addGeneAnnotation(pombe_gene_guideSet,
+ txObject=pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns