fgnt/padertorch

Segmentation Fault when Training Model

samly97 opened this issue · 2 comments

I ran into an issue with training a model using Padertorch [https://github.com/fgnt/ham_radio/issues/1](Ham radio). I get a segmentation fault in the training loop function. When if place a pdb debug statement in train_step I get the loss, summary, etc but I get the segmentation fault upon return. Location:

  • padertorch>train>trainer.py>Trainer>step . In the if block with len(device) == 1

I am using a single RTX 4090 on Python 3.10 with PyTorch 2.1.0.

Thank you.

Hmm. If I remember correctly, there is only python code from us involved. Could you try to find which operation causes this segmentation fault? The train steps moves the data to the GPU, executes the model forward and review and the optimiser step. I guess one torch operation causes this segmentation fault.

Thanks for the reply. I uninstalled PyTorch 2.1.0 and used PyTorch 1.13.0 and the model started training. I'll close this issue.