Protocol prottrans_bert_bfd: BadZipFile: File is not a zip file
pskvins opened this issue · 2 comments
Hi,
I am trying to get embeddings of proteins, but I'm encountering an error as following
## Metadata
|key|value|
|--|--|
|**version**|0.2.2|
|**cuda**|False|
## Parameter
|key|value|
|--|--|
type|embed
protocol|prottrans_bert_bfd
reduce|True
## Traceback
Traceback (most recent call last):
File "/home/sukhwan/.local/lib/python3.7/site-packages/bio_embeddings/utilities/pipeline.py", line 284, in execute_pipeline_from_config
stage_output_parameters = stage_runnable(**stage_parameters)
File "/home/sukhwan/.local/lib/python3.7/site-packages/bio_embeddings/embed/pipeline.py", line 400, in run
embedder: EmbedderInterface = embedder_class(**result_kwargs)
File "/home/sukhwan/.local/lib/python3.7/site-packages/bio_embeddings/embed/prottrans_bert_bfd_embedder.py", line 30, in __init__
super().__init__(**kwargs)
File "/home/sukhwan/.local/lib/python3.7/site-packages/bio_embeddings/embed/embedder_interfaces.py", line 60, in __init__
model=self.name, directory=directory
File "/home/sukhwan/.local/lib/python3.7/site-packages/bio_embeddings/utilities/remote_file_retriever.py", line 93, in get_model_directories_from_zip
with zipfile.ZipFile(file_name, "r") as zip_ref:
File "/home/sukhwan/miniconda3/lib/python3.7/zipfile.py", line 1258, in __init__
self._RealGetContents()
File "/home/sukhwan/miniconda3/lib/python3.7/zipfile.py", line 1325, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
## More info
I installed bio_embeddings by pip install bio-embeddings[all]
and ran bio_embeddings --overwrite embed.yml
.
The content of the embed.yml is as following
`global:
sequences_file: /home/sukhwan/cluster_idr/sample.fasta
prefix: sample_result
simple_remapping: True
prottrans_t5_bfd_embeddings:
type: embed
protocol: prottrans_bert_bfd
reduce: True`
I tried again after removing the cache file of the bio_embeddings, but got the same error message.
Do you know what can be the cause of this problem?
Hey, I'm unsure 🤔 a corrupted zip download? You can download the ZIP manually from here: http://data.bioembeddings.com/public/embeddings/embedding_models/bert/
Then unzip it using whatever system software you have.
last, in the config you just need to add a parameter model_directory
in the prottrans_t5_bfd_embeddings
stage: https://github.com/sacdallago/bio_embeddings/blob/develop/examples/parameters_blueprint.yml#L95
Let me know if this works
P.S.: why ProtBert? It's not the best performing model! ProtT5 is: https://github.com/agemagician/ProtTrans/blob/master/README.md#-comparison-to-other-protein-language-models-plms