feizc/DiT-MoE

numerical precision

Closed this issue · 2 comments

Thanks for your nice work! What's the numerical precision used during training? I didn't find it in the paper. Is it bf16, fp16 or fp32? Or some mixed precision approach?

Hi, I'm very sorry for missing important information.
We training XL and G version for FP16 and others are used FP32.
It can also be checked in the sample.py in line.

Thanks for the information!