Using several databases

Question

Using several databases

Closed this issue 3 years ago · 1 comments

We are using TransPi to annotate transcriptome of plants but we also want to retrieve mitochondrial and plastome (chloroplast) genes/sequences from them. Those sequences are present in the assemblies made by all 5 program by not in transdecoder results ".combined.okay.fa.transdecoder.cds". Is this because DB (uniprot) does not include those genes encoded in the organelles or because transcecoder does not recognize genes (ORFs) encoded in mitochondrial and chloroplast?
How could include the genes encoded in chloroplast and mitochondria to be annotated in TransPi pipeline?
Thanks,
Joan

Answer 1 · 2021-09-12T09:39:37.000Z

Hello @dbajpp0,

Transdecoder is by default using the universal genetic code. This could explain why mitochondrial ORFs are not found in the output. There is an option for changing the genetic code for a run. I just added this option to the pipeline (see 5ca851b). To change from the default use the --genCode (e.g. --genCode Mitochondrial-Invertebrates).

Options available from transdecoder documentation:

Acetabularia
Candida
Ciliate
Dasycladacean
Euplotid
Hexamita
Mesodinium
Mitochondrial-Ascidian
Mitochondrial-Chlorophycean
Mitochondrial-Echinoderm
Mitochondrial-Flatworm
Mitochondrial-Invertebrates
Mitochondrial-Protozoan
Mitochondrial-Pterobranchia
Mitochondrial-Scenedesmus_obliquus
Mitochondrial-Thraustochytrium
Mitochondrial-Trematode
Mitochondrial-Vertebrates
Mitochondrial-Yeast
Pachysolen_tannophilus
Peritrich
SR1_Gracilibacteria
Tetrahymena
Universal

Also, if you run the long transdecoder (i.e. with homology search to UniProt and PFAM) you may want to use as the custom UniProt database the RefSeq of mitochondrial sequences. Just add this database to the DBs/uniprot_db/ directory and when running the precheck select it.

Remember to run the precheck again to generate the new config file with the new option.

Anything let me know.

Best,
Ramon