Train problem

Question

Train problem

Closed this issue a year ago · 15 comments

Hello! The following issues were encountered during the training process. Could you give me some guidance？

Answer 1 · 2023-06-26T15:32:02.000Z

It seems that you haven't correctly linked the dataset to the required place. You can create a link as suggested in the readme.md.

Answer 2 · 2023-06-27T02:47:40.000Z

Thank you for your reply! The dataset soft link seems to be successful because I can preprocess the dataset.

Answer 3 · 2023-06-27T02:57:59.000Z

According to the information in your screenshot, the model fails to read the sign language images and raises an error "list index out of range". To locate the issue, you can check the type of the input data in the dataloader_video.py by 'print(type(video))' after line 107. You will mostly get a None output. Call me if you have further problems.

Answer 4 · 2023-06-28T02:52:45.000Z

Thank you for your reply! The problem has been resolved. I used an RTX3080 with 10GB memory for training, and in order to prevent OOM, I set the batchsize to 1. However, after the first epoch ended, the following error occurred, and I suspect it was OOM again.

Answer 5 · 2023-06-28T02:56:51.000Z

It seems that you don't have enough space on the disk.

…

---Original--- From: ***@***.***> Date: Wed, Jun 28, 2023 10:52 AM To: ***@***.***>; Cc: ***@***.******@***.***>; Subject: Re: [hulianyuyy/SEN_CSLR] Train problem (Issue #7) Thank you for your reply! The problem has been resolved. I used an RTX3080 with 10GB memory for training, and in order to prevent OOM, I set the batchsize to 1. However, after the first epoch ended, the following error occurred, and I suspect it was OOM again. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

Answer 6 · 2023-06-29T09:27:02.000Z

Thank you for your reply! Not due to insufficient disk space. The problem has been resolved as follows:
https://github.com/parlance/ctcdecode/issues/124

Answer 7 · 2023-06-29T09:45:48.000Z

Many thanks. Hope you can enjoy the work!

…

---Original--- From: ***@***.***> Date: Thu, Jun 29, 2023 17:27 PM To: ***@***.***>; Cc: ***@***.******@***.***>; Subject: Re: [hulianyuyy/SEN_CSLR] Train problem (Issue #7) Thank you for your reply! Not due to insufficient disk space. The problem has been resolved as follows: parlance/ctcdecode#124 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

Answer 8 · 2023-07-19T10:42:43.000Z

Hello! I am here again. I have replaced a new device to run this project. But I have encountered some inexplicable errors. The error message is as follows.

Could you give me some guidance？

Answer 9 · 2023-07-19T13:13:04.000Z

I don't have exact answers, but i figure that this may be attributed to that you may set a different number of classes with the target datasets? This issus may be related with the number of classes.

Answer 10 · 2023-07-21T06:22:54.000Z

Thank you for your reply. The problem has been resolved. This is the problem with sclite. I have another question, can I execute the following command to continue training with the saved weights when my training process unexpectedly terminates.

python main.py --load-weights work_dir/baseline_res18/dev_23.60_epoch15_model.pt

Answer 11 · 2023-07-21T06:25:38.000Z

You may use "python main.py --load-checkpoints  work_dir/baseline_res18/dev_23.60_epoch15_model.pt"

…

---Original--- From: ***@***.***> Date: Fri, Jul 21, 2023 14:23 PM To: ***@***.***>; Cc: ***@***.******@***.***>; Subject: Re: [hulianyuyy/SEN_CSLR] Train problem (Issue #7) Thank you for your reply. The problem has been resolved. This is the problem with sclite. I have another question, can I execute the following command to continue training with the saved weights when my training process unexpectedly terminates. python main.py --load-weights work_dir/baseline_res18/dev_23.60_epoch15_model.pt — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

Answer 12 · 2023-07-21T06:38:20.000Z

Haha. Thank you. I used the wrong command.

Answer 13 · 2023-11-30T09:27:35.000Z

Thank you for your reply! Not due to insufficient disk space. The problem has been resolved as follows: https://github.com/parlance/ctcdecode/issues/124

Hello, I have also encountered such a problem. The link is invalid. How was it resolved

Answer 14 · 2023-11-30T13:26:17.000Z

You

Thank you for your reply! Not due to insufficient disk space. The problem has been resolved as follows: https://github.com/parlance/ctcdecode/issues/124

Hello, I have also encountered such a problem. The link is invalid. How was it resolved

You may refer to this issue and this.

Answer 15 · 2023-11-30T13:53:58.000Z

You

Thank you for your reply! Not due to insufficient disk space. The problem has been resolved as follows: https://github.com/parlance/ctcdecode/issues/124

Hello, I have also encountered such a problem. The link is invalid. How was it resolved

You may refer to this issue and this.

Thank you very much for taking the time to reply to me. I will try it out