sacdallago/bio_embeddings

Protocol prottrans_bert_bfd: BadZipFile: File is not a zip file

pskvins opened this issue · 2 comments

Hi,
I am trying to get embeddings of proteins, but I'm encountering an error as following

## Metadata
|key|value|
|--|--|
|**version**|0.2.2|
|**cuda**|False|

## Parameter
|key|value|
|--|--|
type|embed
protocol|prottrans_bert_bfd
reduce|True

## Traceback
Traceback (most recent call last):
  File "/home/sukhwan/.local/lib/python3.7/site-packages/bio_embeddings/utilities/pipeline.py", line 284, in execute_pipeline_from_config
    stage_output_parameters = stage_runnable(**stage_parameters)
  File "/home/sukhwan/.local/lib/python3.7/site-packages/bio_embeddings/embed/pipeline.py", line 400, in run
    embedder: EmbedderInterface = embedder_class(**result_kwargs)
  File "/home/sukhwan/.local/lib/python3.7/site-packages/bio_embeddings/embed/prottrans_bert_bfd_embedder.py", line 30, in __init__
    super().__init__(**kwargs)
  File "/home/sukhwan/.local/lib/python3.7/site-packages/bio_embeddings/embed/embedder_interfaces.py", line 60, in __init__
    model=self.name, directory=directory
  File "/home/sukhwan/.local/lib/python3.7/site-packages/bio_embeddings/utilities/remote_file_retriever.py", line 93, in get_model_directories_from_zip
    with zipfile.ZipFile(file_name, "r") as zip_ref:
  File "/home/sukhwan/miniconda3/lib/python3.7/zipfile.py", line 1258, in __init__
    self._RealGetContents()
  File "/home/sukhwan/miniconda3/lib/python3.7/zipfile.py", line 1325, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

## More info

I installed bio_embeddings by pip install bio-embeddings[all] and ran bio_embeddings --overwrite embed.yml.
The content of the embed.yml is as following
`global:
sequences_file: /home/sukhwan/cluster_idr/sample.fasta
prefix: sample_result
simple_remapping: True

prottrans_t5_bfd_embeddings:
type: embed
protocol: prottrans_bert_bfd
reduce: True`

I tried again after removing the cache file of the bio_embeddings, but got the same error message.
Do you know what can be the cause of this problem?

Hey, I'm unsure 🤔 a corrupted zip download? You can download the ZIP manually from here: http://data.bioembeddings.com/public/embeddings/embedding_models/bert/

Then unzip it using whatever system software you have.

last, in the config you just need to add a parameter model_directory in the prottrans_t5_bfd_embeddings stage: https://github.com/sacdallago/bio_embeddings/blob/develop/examples/parameters_blueprint.yml#L95

Let me know if this works