training details

Question

training details

kuaileqipaoshui opened this issue 10 months ago · 11 comments

RuntimeError: probability tensor contains either inf, nan or element < 0

There was an error during the evaluation. I felt that there was a problem with the installed package version. Could you provide the version of your installed package?

Answer 1 · 2024-03-17T14:12:06.000Z

Please try building the environment with the following order:

Set up conda environment.

conda create -n ll3da python=3.8
conda activate ll3da

Install PyTorch:

pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

Install other packages:

pip install h5py scipy cython plyfile 'trimesh>=2.35.39,<2.35.40' transformers>=4.37.0

build pointnet++ and giou support:

cd third_party/pointnet2
python setup.py install
cd ../../utils
python cython_compile.py build_ext --inplace

If the issue still appears, please let me know.

Answer 2 · 2024-03-18T03:19:42.000Z

conda activate ll3da

Thanks, I will try. I have a question, when I look at the script, the evaluation is also based on only the trained checkpoint (--test_ckpt ./ckpts/opt-1.3b/ll3da-generalist/checkpoint.pth ), the tuned checkpoints are not used. What is the use of the tuned checkpoint?

Answer 3 · 2024-03-18T04:46:12.000Z

We train our model on the combination of Nr3D and ScanRefer for describing objects. However, these two datasets are annotated in different styles. Thus it is required to tune on each dataset, respectively.

Answer 4 · 2024-03-18T05:14:51.000Z

I'm sorry, I don't understand. Can you tell me more about it? Like how it's done. I don't see any difference with scanqa.

Answer 5 · 2024-03-18T05:22:52.000Z

Since LL3DA is a 3D generalist, it can distinguish different tasks given human interactions. You can directly evaluate on ScanQA with the generalist checkpoint, or try fine-tuning it.

Answer 6 · 2024-03-18T07:43:35.000Z

Since LL3DA is a 3D generalist, it can distinguish different tasks given human interactions. You can directly evaluate on ScanQA with the generalist checkpoint, or try fine-tuning it.

----------------------Evaluation-----------------------
INFO: iou@0.5 matched proposals: [1525 / 2068],
[BLEU-1] Mean: 0.6246, Max: 1.0000, Min: 0.0000
[BLEU-2] Mean: 0.5269, Max: 1.0000, Min: 0.0000
[BLEU-3] Mean: 0.4311, Max: 1.0000, Min: 0.0000
[BLEU-4] Mean: 0.3519, Max: 1.0000, Min: 0.0000
[CIDEr] Mean: 0.5911, Max: 5.4976, Min: 0.0000
[ROUGE-L] Mean: 0.5407, Max: 1.0000, Min: 0.1015
[METEOR] Mean: 0.2519, Max: 1.0000, Min: 0.0448

When I directly evaluate on ScanQA with the generalist checkpoint, I got the above result. I found that the result of C@0.5 is very different from that in the paper, and other metrics are similar to those in the paper. Why is this?

Answer 7 · 2024-03-18T08:31:36.000Z

It seems the result you listed comes from the ScanRefer dataset for 3D dense captioning.

The results differ mainly because of 1. The randomness in data pre-processing (point down sampling), 2. Different PyTorch versions, and 3. randomness in training.

Please refer to: ch3cook-fdu/Vote2Cap-DETR#12 for more information.

Also, you are encouraged to check out the training log to see whether the performance aligns.

Additionally, the performance of 3D dense captioning might differ a little, since we do not distinguish ScanRefer from Nr3D during training. Maybe you should tune the model on each dataset for 3D dense captioning only.

Answer 8 · 2024-03-23T12:07:18.000Z

Hi, I tried the train.generalist.sh, but I can't reproduce a close performance as reported in the paper. the only change is the 24 batch size instead of 4 to speedup training

here are the eval logs on ScanQA, Nr3D and ScanRefer, at 20th epoch
----------------------Evaluation-----------------------

[BLEU-1] Mean: 0.3028, Max: 1.0000, Min: 0.0000
[BLEU-2] Mean: 0.1904, Max: 1.0000, Min: 0.0000
[BLEU-3] Mean: 0.1283, Max: 1.0000, Min: 0.0000
[BLEU-4] Mean: 0.0875, Max: 1.0000, Min: 0.0000
[CIDEr] Mean: 0.4818, Max: 8.0511, Min: 0.0000
[ROUGE-L] Mean: 0.2636, Max: 1.0000, Min: 0.0000
[METEOR] Mean: 0.1058, Max: 1.0000, Min: 0.0000
Evaluate [19/32]; Batch [0/1]; Evaluating on iter: 12999; Iter time 261.13; Mem 70618.97MB

----------------------Evaluation-----------------------
INFO: iou@0.5 matched proposals: [712 / 1214],
[BLEU-1] Mean: 0.5626, Max: 1.0000, Min: 0.0006
[BLEU-2] Mean: 0.3753, Max: 0.8165, Min: 0.0000
[BLEU-3] Mean: 0.2223, Max: 0.6583, Min: 0.0000
[BLEU-4] Mean: 0.1339, Max: 0.5756, Min: 0.0000
[CIDEr] Mean: 0.0945, Max: 1.2465, Min: 0.0000
[ROUGE-L] Mean: 0.4495, Max: 0.8299, Min: 0.1843
[METEOR] Mean: 0.2157, Max: 0.5162, Min: 0.0783
Evaluate [19/32]; Batch [0/1]; Evaluating on iter: 12999; Iter time 262.18; Mem 70618.97MB

----------------------Evaluation-----------------------
INFO: iou@0.5 matched proposals: [1506 / 2068],
[BLEU-1] Mean: 0.6056, Max: 1.0000, Min: 0.0000
[BLEU-2] Mean: 0.4881, Max: 1.0000, Min: 0.0000
[BLEU-3] Mean: 0.3775, Max: 0.9410, Min: 0.0000
[BLEU-4] Mean: 0.2926, Max: 0.8654, Min: 0.0000
[CIDEr] Mean: 0.3024, Max: 3.1209, Min: 0.0000
[ROUGE-L] Mean: 0.4990, Max: 0.9412, Min: 0.1015
[METEOR] Mean: 0.2349, Max: 0.5416, Min: 0.0448

the training log is here

it would be nice if the pretrained checkpoints/pre-processed point clouds can be downloaded to minimize the randomness

Answer 9 · 2024-03-23T14:26:55.000Z

The actual batch size of our original configure is 4 x 8 gpus = 32 per iteration. To re-produce our results, we encourage you to train with the exact same config as we provided.

Please track the training process on the number of iterations rather than epoch numbers. Based on our experience, training LL3DA with only 13k iterations is far from convergence.

We are actively working on packing the pre-trained weights, please stay tuned.

Answer 10 · 2024-03-26T09:21:18.000Z

The actual batch size of our original configure is 4 x 8 gpus = 32 per iteration. To re-produce our results, we encourage you to train with the exact same config as we provided.

Please track the training process on the number of iterations rather than epoch numbers. Based on our experience, training LL3DA with only 13k iterations is far from convergence.

We are actively working on packing the pre-trained weights, please stay tuned.

when I use actual batch size of the original configure is 4 x 8 gpus = 32 per iteration, I find the training log:
Epoch [2/32]; Iter [11990/127936]; Loss 1.51; LR 9.79e-05; Iter time 0.46; ETA 14:48:34; Mem 18615.49MB
Loss in not finite. Skip this training step.
Loss in not finite. Skip this training step.
Loss in not finite. Skip this training step.
Loss in not finite. Skip this training step.
Loss in not finite. Skip this training step.
Loss in not finite. Skip this training step.
Loss in not finite. Skip this training step.
Loss in not finite. Skip this training step.
Epoch [3/32]; Iter [12000/127936]; Loss 1.51; LR 9.79e-05; Iter time 0.48; ETA 15:23:59; Mem 18615.49MB
What happened?

Answer 11 · 2024-03-26T10:15:57.000Z

Because of the mixed precision training, the training process might not be that stable. As long as the model training continues, you can just ignore this message.