something wrong when doing emdedding with my sequences rather than UniRef90

Question

something wrong when doing emdedding with my sequences rather than UniRef90

daisykuma22 opened this issue 3 months ago · 1 comments

Dear Developers,

I hope you're doing well. I want to use do_embedding.py script to generate embeddings for my own sequences instead of UniRef90. To test this, I used the provided file (input_test.tsv), with the following using the command:

python3 do_embedding.py trainer.ur90_path=$SEQDB_PATH model.ckpt_path=$MODEL_PATH hydra.run.dir=$OUTPUT_PATH

However, I encountered the following error related to Hydra and pyarrow:
`Global seed set to 1234
/public/home/software/Dense-Homolog-Retrieval/checkpoint
[2024-09-11 01:03:15,427][torch.distributed.nn.jit.instantiator][INFO] - Created a temporary directory at /tmp/tmp9j7rhfrz
[2024-09-11 01:03:15,428][torch.distributed.nn.jit.instantiator][INFO] - Writing /tmp/tmp9j7rhfrz/_remote_module_non_sriptable.py
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
Error executing job with overrides: ['trainer.ur90_path=/public/home/software/Dense-Homolog-Retrieval/input_test.tsv', 'model.ckpt_path=/public/home/software/Dense-Homolog-Retrieval/checkpoint']
An error occurred during Hydra's exception formatting:
AssertionError()
Traceback (most recent call last):
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/hydra/_internal/utils.py", line 252, in run_and_report
assert mdl is not None
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "do_embedding.py", line 73, in
main()
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/hydra/main.py", line 48, in decorated_main
_run_hydra(
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/hydra/_internal/utils.py", line 377, in _run_hydra
run_and_report(
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/hydra/_internal/utils.py", line 294, in run_and_report
raise ex
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/hydra/_internal/utils.py", line 378, in
lambda: hydra.run(
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 111, in run
_ = ret.return_value
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "do_embedding.py", line 70, in main
trainer.predict(model, datamodule=dm)
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 993, in predict
return self._call_and_handle_interrupt(
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1037, in _predict_impl
results = self._run(model, ckpt_path=self.predicted_ckpt_path)
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1138, in _run
self._call_setup_hook() # allow user to setup lightning_module in accelerator environment
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1438, in _call_setup_hook
self.datamodule.setup(stage=fn)
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 474, in wrapped_fn
fn(*args, **kwargs)
File "/public/home/software/Dense-Homolog-Retrieval/mydpr/dataset/cath35.py", line 95, in setup
self.pd_set = ArrowDataset(self.path)
File "/public/home/software/Dense-Homolog-Retrieval/mydpr/dataset/cath35.py", line 72, in init
self.records = csv.read_csv(data_path,
File "pyarrow/_csv.pyx", line 798, in pyarrow._csv.read_csv
File "pyarrow/_csv.pyx", line 807, in pyarrow._csv.read_csv
File "pyarrow/error.pxi", line 141, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: CSV parse error: Expected 2 columns, got 3
`
Could you kindly provide some guidance on resolving this issue?

Thank you for your time, and I look forward to your assistance.

Answer 1 · 2024-09-12T05:37:38.000Z

Hello, it seems that your tsv file has more than two columns. Please check your file format.