vanheeringen-lab/genomepy

Error: no exon in id-IGHV3-64D-2 contains CDS 555851-556197

schmucr1 opened this issue · 2 comments

Hello !

When I try to install the human genome GRCh38.p12 or GRCH38.p11 from NCBI including annotations, I obtain the following errors, related to "missing exon" and the tool "gff3ToGenePred":

genomepy install --provider NCBI --threads 4 --annotation  --genomes_dir /projects/site/pred/beda/genomepy_genomes/NCBI/  GRCh38.p12
09:44:04 | INFO | Downloading assembly summaries from NCBI, this will take a while...
genbank_historical: 31.1k genomes [00:00, 33.6k genomes/s]
refseq_historical: 39.7k genomes [00:01, 38.0k genomes/s]
genbank: 1.07M genomes [00:11, 96.3k genomes/s]
refseq: 237k genomes [00:02, 81.0k genomes/s] 
09:44:38 | INFO | Downloading genome from NCBI. Target URL: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.38_GRCh38.p12/GCF_000001405.38_GRCh38.p12_genomic.fna.gz...
09:45:09 | INFO | Genome download successful, starting post processing...                                                                                                      | 305k/916M [00:00<17:51, 896kB/s]
Processing NCBI Fasta: 40.7M lines [00:18, 2.18M lines/s]
09:45:44 | INFO | name: GRCh38.p12
09:45:44 | INFO | local name: GRCh38.p12
09:45:44 | INFO | fasta: /projects/site/pred/beda/genomepy_genomes/NCBI/GRCh38.p12/GRCh38.p12.fa
Filtering Fasta: 40.7M lines [00:16, 2.45M lines/s]
09:46:59 | INFO | Downloading annotation from NCBI. Target URL: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.38_GRCh38.p12/GCF_000001405.38_GRCh38.p12_genomic.gff.gz...
Error: no exon in id-IGHV3-64D-2 contains CDS 555851-556197                                                                                                                          | 0.00/45.5M [00:00<?, ?B/s]
Error: no exon in id-IGLC7 contains CDS 22922593-22922913
Error: no exon in id-IGLC3 contains CDS 22906341-22906661
Error: no exon in id-IGLV2-8 contains CDS 22822981-22823289
Error: no exon in id-IGHV3-64D contains CDS 106088082-106088428
5 errors converting GFF3 file: /projects/site/pred/beda/genomepy_genomes/NCBI/GRCh38.p12/tmpqaoyzmbm/GRCh38.p12.annotation.gff
Traceback (most recent call last):
  File "/projects/site/pred/beda/envs/genomepy/lib/python3.6/site-packages/genomepy/providers/base.py", line 290, in download_annotation
    download_annotation(genomes_dir, link, localname)
  File "/projects/site/pred/beda/envs/genomepy/lib/python3.6/site-packages/genomepy/providers/base.py", line 447, in download_annotation
    sp.check_call(cmd.format(annot_file, pred_file), shell=True)
  File "/projects/site/pred/beda/envs/genomepy/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'gff3ToGenePred -geneNameAttr=gene /projects/site/pred/beda/genomepy_genomes/NCBI/GRCh38.p12/tmpqaoyzmbm/GRCh38.p12.annotation.gff /projects/site/pred/beda/genomepy_genomes/NCBI/GRCh38.p12/tmpqaoyzmbm/GRCh38.p12.annotation.gp' returned non-zero exit status 255.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/projects/site/pred/beda/envs/genomepy/bin/genomepy", line 10, in <module>
    sys.exit(cli())
  File "/projects/site/pred/beda/envs/genomepy/lib/python3.6/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/projects/site/pred/beda/envs/genomepy/lib/python3.6/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/projects/site/pred/beda/envs/genomepy/lib/python3.6/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/projects/site/pred/beda/envs/genomepy/lib/python3.6/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/projects/site/pred/beda/envs/genomepy/lib/python3.6/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/projects/site/pred/beda/envs/genomepy/lib/python3.6/site-packages/genomepy/cli.py", line 278, in install
    **kwargs,
  File "/projects/site/pred/beda/envs/genomepy/lib/python3.6/site-packages/genomepy/functions.py", line 254, in install_genome
    provider.download_annotation(name, genomes_dir, localname=localname, **kwargs)
  File "/projects/site/pred/beda/envs/genomepy/lib/python3.6/site-packages/genomepy/providers/base.py", line 294, in download_annotation
    f"An error occured while installing the gene annotation for {name} from {self.name}.\n"
genomepy.exceptions.GenomeDownloadError: An error occured while installing the gene annotation for GRCh38.p12 from NCBI.
If you think the annotation should be there, please file a bug report at: https://github.com/vanheeringen-lab/genomepy/issues

Error: 255

I am using this version of genomepy

genomepy, version 0.10.0

Other NCBI human genomes and annotations, e.g. GRCh38.p10 etc work fine.

The tools were installed with Anaconda and the following recipe:

name=genomepy
envdir=/projects/site/pred/beda/envs
conda create -y -p ${envdir}/${name}
conda activate ${envdir}/${name}
conda config --set use_only_tar_bz2 True
conda install -y -c bioconda genomepy
conda install -y -c bioconda tabix
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda install -y ucsc-genepredtogtf 
conda install -y ucsc-genepredtobed
conda install -y ucsc-bedtogenepred
conda install -y ucsc-gtftogenepred
conda install -y ucsc-gff3togenepred
conda clean --all

Thanks for any help and suggestions in advance!

Roland

hey Roland,

it seems there is something wrong with those 5 lines in the annotation file. I've added a flag to skip these.
You can get this code by running

pip install git+https://github.com/vanheeringen-lab/genomepy.git@develop

Dear Siebren

Thank you for the rapid workaround. Now, it works to generated the bed and gtf annotations files.

Best wishes,
R.