[BUG] inference ops unit tests are failing
oelayan7 opened this issue · 4 comments
oelayan7 commented
It was seen that tests under unit/ops/transformer/inference are not being run in any CI job.
Some tests are failing in that directory (examples will be provided below), I have talked to @loadams about it and he tried running them on a V100 setup.
The results he got were 440 failed, 2598 passed, 8 skipped
for those tests.
Example for the tests we saw them failing were:
- unit/ops/transformer/inference/test_bias_geglu.py::test_gated_silu and the failure was on different results than the reference.
- unit/ops/transformer/inference/test_layer_norm.py::test_layer_norm and the failure was
Feature '.bf16' requires .target sm_80 or higher
A hint that could help, those tests are permutated over the supported dtypes, and the failures are always in dtype2 (I assume it is bf16).
oelayan7 commented
test_layer_norm_residual, test_residual_add, test_bias_geglu, test_moe_residual_matmul, test_pre_norm, test_rms_norm