WGLab/doc-ANNOVAR

Issue in creating Database using ANNOVAR

BiodeB opened this issue · 7 comments

Dear Sir,

I'm first time using this software for annotation of SNPs of Covid genomes. First I started creating a database following (https://github.com/WGLab/Workshop_Annotation and https://annovar.openbioinformatics.org/en/latest/user-guide/gene/#what-about-gff3-file-for-new-species) documentation of ANNOVAR. I've used reference genome of SARS COV-2 (NC_045512.2), and downloaded gtf file from ENSEMBL (https://covid-19.ensembl.org/Sars_cov_2/Info/Index).

Commands are

./gtfToGenePred -genePredExt Sars_cov_2.ASM985889v3.101.gtf Cov19_refGene3.txt

perl /home/imdeb/Softs/annovar/retrieve_seq_from_fasta.pl --format refGene --seqfile Cov19_Ref_NC_045512.2.fasta Cov19_refGene3.txt --out Cov19_refGene.fa

It gives me following message and Cov19_refGene.fa is empty

NOTICE: Reading region file Cov19_refGene3.txt ... Done with 12 regions from 1 chromosomes WARNING: Unable to retrieve regions at MN908947.3 due to lack of sequence information WARNING: Cannot identify sequence for ENSSAST00005000002 (starting from MN908947.3:265) WARNING: Cannot identify sequence for ENSSAST00005000003 (starting from MN908947.3:265) WARNING: Cannot identify sequence for ENSSAST00005000004 (starting from MN908947.3:21562) WARNING: Cannot identify sequence for ENSSAST00005000006 (starting from MN908947.3:25392) WARNING: Cannot identify sequence for ENSSAST00005000010 (starting from MN908947.3:26244) WARNING: Cannot identify sequence for ENSSAST00005000007 (starting from MN908947.3:26522) WARNING: Cannot identify sequence for ENSSAST00005000011 (starting from MN908947.3:27201) WARNING: Cannot identify sequence for ENSSAST00005000009 (starting from MN908947.3:27393) WARNING: Cannot identify sequence for ENSSAST00005000012 (starting from MN908947.3:27755) WARNING: Cannot identify sequence for ENSSAST00005000008 (starting from MN908947.3:27893) WARNING: Cannot identify sequence for ENSSAST00005000005 (starting from MN908947.3:28273) WARNING: Cannot identify sequence for ENSSAST00005000013 (starting from MN908947.3:29557) NOTICE: Finished writting FASTA for 0 genomic regions to Cov19_refGene.fa

So, this is a humble request that please furnish some valuable suggestion to overcome this issue.

The Problem might have solved.

Thank you

Thank you.

Hi @kaichop
I am having issues during the preparation of annovar database with the SARS-CoV-2 genome. May I ask how can I access the reference file which you prepared for this purpose? Thanks a lot.
Best,
gc

Dear Sir,

I'm first time using this software for annotation of SNPs of Covid genomes. First I started creating a database following (https://github.com/WGLab/Workshop_Annotation and https://annovar.openbioinformatics.org/en/latest/user-guide/gene/#what-about-gff3-file-for-new-species) documentation of ANNOVAR. I've used reference genome of SARS COV-2 (NC_045512.2), and downloaded gtf file from ENSEMBL (https://covid-19.ensembl.org/Sars_cov_2/Info/Index).

Commands are

./gtfToGenePred -genePredExt Sars_cov_2.ASM985889v3.101.gtf Cov19_refGene3.txt

perl /home/imdeb/Softs/annovar/retrieve_seq_from_fasta.pl --format refGene --seqfile Cov19_Ref_NC_045512.2.fasta Cov19_refGene3.txt --out Cov19_refGene.fa

It gives me following message and Cov19_refGene.fa is empty

NOTICE: Reading region file Cov19_refGene3.txt ... Done with 12 regions from 1 chromosomes WARNING: Unable to retrieve regions at MN908947.3 due to lack of sequence information WARNING: Cannot identify sequence for ENSSAST00005000002 (starting from MN908947.3:265) WARNING: Cannot identify sequence for ENSSAST00005000003 (starting from MN908947.3:265) WARNING: Cannot identify sequence for ENSSAST00005000004 (starting from MN908947.3:21562) WARNING: Cannot identify sequence for ENSSAST00005000006 (starting from MN908947.3:25392) WARNING: Cannot identify sequence for ENSSAST00005000010 (starting from MN908947.3:26244) WARNING: Cannot identify sequence for ENSSAST00005000007 (starting from MN908947.3:26522) WARNING: Cannot identify sequence for ENSSAST00005000011 (starting from MN908947.3:27201) WARNING: Cannot identify sequence for ENSSAST00005000009 (starting from MN908947.3:27393) WARNING: Cannot identify sequence for ENSSAST00005000012 (starting from MN908947.3:27755) WARNING: Cannot identify sequence for ENSSAST00005000008 (starting from MN908947.3:27893) WARNING: Cannot identify sequence for ENSSAST00005000005 (starting from MN908947.3:28273) WARNING: Cannot identify sequence for ENSSAST00005000013 (starting from MN908947.3:29557) NOTICE: Finished writting FASTA for 0 genomic regions to Cov19_refGene.fa

So, this is a humble request that please furnish some valuable suggestion to overcome this issue.

@BiodeB
now I have encountered the same problem as you, may I ask how you solved it then? If you can reply me in your busy schedule, thank you very much!