taichi-dev/difftaichi

Mass spring loss is `nan` after repeatedly running

Opened this issue · 2 comments

I am running taichi 1.2.1 with arm64 arch and Python 3.10.8. I've written some code that repeatedly calls the mass_spring simulation with different mass spring layouts and the loss of mass spring becomes nan when this happens. However, this behavior is not deterministic, sometimes it happens after 1 iteration other times after 4-5 with it not failing on a particular mass_spring layout.
Before and after each call to mass_spring main, I am reloading the mass_spring file to re-initialize the variables and after I am tearing down with ti.reset().
Could someone please shed some light on this error?

erizmr commented

Hi @alansun17904 , if it is possible to share a minimal repro code?

Thanks for the quick response. I modified main by adding the following lines

clear()
forward(None, visualize=False)
clear()
return loss[None]

Then, I am calling main in another module through

from importlib import 
for _ in range(n):
    reload(mass_spring)
    mass_spring.main()
    ti.reset()