taichi-dev/difftaichi

why using GPU is much slower than CPU in this diffmpm.py example?

Opened this issue · 1 comments

In my mac m1, using CPU(arm) diffmpm can running in 14FPS, but using GPU(metal) running much slower, only less then 2 FPS, also in 3080(cuda) ,is there any problem this compiler do optimization in IR level?

Also reproduced on my Intel + nvidia GPU workstation.

CPU: i9-11900k
GPU: RTX3080

with ti.cpu: 13 FPS
with ti.cuda: 10 FPS

Script: examples/diffmpm.py