addGeneAnnotation module can not annotate the genes on the minus strand

Question

addGeneAnnotation module can not annotate the genes on the minus strand

Closed this issue 4 months ago · 3 comments

Hi authors,

My studied organism is Schizosaccharomyces pombe . The latest GFF annotation file is from the Pombase website, not from Ensembl. So I construct the TxDb object by giving the GFF file. The BSgenome is also self constructed by using the BSgenome package. I want to find the spacer sequences targeting a specific gene guided by the website Design_CRISPRko_Cas9.

It is strange that the program worked very well for all genes on the plus strand, but it promotes an error for genes on the minus strand ( 'start' or 'end' cannot contain NAs).

The detailed procession is shown below.

> pombe_gff_file <- "./DY47073.gff3"
> pombe_txdb <- getTxDb(pombe_gff_file, organism=NA)
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
> pombe_grList <- TxDb2GRangesList(pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> library(BSgenome.Spombe.DY47073.4CrisprVerse)
> mybsgenome <- BSgenome.Spombe.DY47073.4CrisprVerse



> # gene on the plus strand
> pombe_ade6_gr <- queryTxObject(txObject=pombe_txdb,
+                                featureType="cds",
+                                queryColumn="gene_id",
+                                queryValue="SPCC1322.13")
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> pombe_ade6_guideSet <- findSpacers(pombe_ade6_gr,
+                              bsgenome=mybsgenome,
+                              crisprNuclease=SpCas9)
> 
> pombe_ade6_guideSet <- addGeneAnnotation(pombe_ade6_guideSet,
+                                    txObject=pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns


> # gene on the minus strand
> pombe_vtc4_gr <- queryTxObject(txObject=pombe_txdb,
+                                featureType="cds",
+                                queryColumn="gene_id",
+                                queryValue="SPCC1322.14c")
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> pombe_vtc4_guideSet <- findSpacers(pombe_vtc4_gr,
+                                    bsgenome=mybsgenome,
+                                    crisprNuclease=SpCas9)
> 
> pombe_vtc4_guideSet <- addGeneAnnotation(pombe_vtc4_guideSet,
+                                          txObject=pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
Error in.new_IRanges_from_start_end(start, end): 
  'start' or 'end' cannot contain NAs

Below are the part of two genes from gff file.

chrIII	Liftoff	gene	1363196	1364854	.	+	.	ID=SPCC1322.13;Name=ade6;so_term_name=protein_coding_gene;coverage=1.0;sequence_ID=1.0;valid_ORFs=1;extra_copy_number=0;copy_num_ID=SPCC1322.13_0
chrIII	Liftoff	mRNA	1363196	1364854	.	+	.	ID=SPCC1322.13.1;Parent=SPCC1322.13;product=phosphoribosylaminoimidazole carboxylase Ade6;matches_ref_protein=True;valid_ORF=True;extra_copy_number=0
chrIII	Liftoff	exon	1363196	1364854	.	+	.	ID=SPCC1322.13.1:exon:1;Parent=SPCC1322.13.1;product=phosphoribosylaminoimidazole carboxylase Ade6;extra_copy_number=0
chrIII	Liftoff	CDS	1363196	1364854	.	+	0	ID=SPCC1322.13.1:CDS:1;Parent=SPCC1322.13.1;product=phosphoribosylaminoimidazole carboxylase Ade6;extra_copy_number=0

chrIII	Liftoff	gene	1365284	1367449	.	-	.	ID=SPCC1322.14c;Name=vtc4;so_term_name=protein_coding_gene;coverage=1.0;sequence_ID=1.0;valid_ORFs=1;extra_copy_number=0;copy_num_ID=SPCC1322.14c_0
chrIII	Liftoff	mRNA	1365284	1367449	.	-	.	ID=SPCC1322.14c.1;Parent=SPCC1322.14c;product=vacuolar transporter chaperone (VTC) complex subunit;matches_ref_protein=True;valid_ORF=True;extra_copy_number=0
chrIII	Liftoff	exon	1365284	1367449	.	-	.	ID=SPCC1322.14c.1:exon:1;Parent=SPCC1322.14c.1;product=vacuolar transporter chaperone (VTC) complex subunit;extra_copy_number=0
chrIII	Liftoff	CDS	1365284	1367449	.	-	0	ID=SPCC1322.14c.1:CDS:1;Parent=SPCC1322.14c.1;product=vacuolar transporter chaperone (VTC) complex subunit;extra_copy_number=0

I don't know how to debug this. I am looking forward to your reply.

Answer 1 · 2024-06-13T01:05:53.000Z

Hi all,

With the help of my colleague，this bug is now fixed. When constructing the TxDb from the GFF file, I forgot to include the chrominfo parameter, which includes the chromosome length. This is necessary for the genes on the minus strand. All in all, it is a powerful tool.

Answer 2 · 2024-06-13T01:58:41.000Z

Hi all,

I also want to show my script for others to use.

> library(BSgenome.Spombe.DY47073.4CrisprVerse)
> mybsgenome <- BSgenome.Spombe.DY47073.4CrisprVerse
> data(SpCas9, package="crisprBase")

> gfffile <- "./DY47073.gff3"
> chrominfo <- data.frame(chrom=c("chrI","chrII","chrIII","chrrDNA_distal_contig1","chrrDNA_distal_contig2","chrMT"),
                        length=c(5623850,4644548,2508823,16591,8860,19433),
                        is_circular=c(FALSE,FALSE,FALSE,FALSE,FALSE,TRUE))
> pombe_txdb <- getTxDb(file=gfffile,organism=NA,chrominfo=chrominfo)
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
> 
> pombe_grList <- TxDb2GRangesList(pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> GenomeInfoDb::genome(pombe_grList) <- "Spombe"
> GenomeInfoDb::seqlengths(pombe_grList)
                  chrI                  chrII                 chrIII 
               5623850                4644548                2508823 
chrrDNA_distal_contig1 chrrDNA_distal_contig2                   chrM 
                 16591                   8860                  19433 


###### test gene on the minus strand
> pombe_gene_gr <- queryTxObject(txObject=pombe_txdb,
+                                featureType="transcript",
+                                queryColumn="gene_id",
+                                queryValue="SPCC1322.14c")
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> pombe_gene_guideSet <- findSpacers(pombe_gene_gr,
+                                    bsgenome=mybsgenome,
+                                    crisprNuclease=SpCas9)
Warning:
In .merge_two_Seqinfo_objects(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chrMT
  - in 'y': chrM
  Make sure to always combine/compare objects based on the same reference
  genome (use suppressWarnings() to suppress this warning).
> pombe_gene_guideSet <- addGeneAnnotation(pombe_gene_guideSet,
+                                          txObject=pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns

Answer 3 · 2024-07-17T18:34:41.000Z

Hi @fsnibs10, glad you could figure it out!