lmdu/pyfastx

SystemError: <class 'Fasta'> returned a result with an error set; on loading gzipped fasta

Opened this issue · 2 comments

When loading e.g. the below gzipped fasta file:
hgdownload.cse.ucsc.edu/goldenPath/dp3/bigZips/dp3.fa.gz

I get:

fasta = pyfastx.Fasta('./data/genomes/dp3.fa.gz')
RuntimeError: get seq count and length error

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  Cell In[61], line 1
    fasta = pyfastx.Fasta('./data/genomes/dp3.fa.gz')

SystemError: <class 'Fasta'> returned a result with an error set

I am in a conda environment, and installed pyfastx 0.9.1 using pip.

  • Ubuntu 22.04.2 LTS
  • conda 23.1.0
  • python 3.8.16
  • pyfastx 0.9.1
lmdu commented

First, delete the previous generated index file dp3.fa.gz.fxi, and then use pyfastx.Fasta to reindex it. If this does not work, please let me known.

Still fails.

The issue persists, even after deleting the index file (fxi file)
curl https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz -O.
It seems to be related to the index though:

import pyfastx; pyfastx.Fasta("chm13v2.0.fa.gz", build_index=False)

works, but

import pyfastx; pyfastx.Fasta("chm13v2.0.fa.gz")

does not.

I am using pyfastx version 1.1.0