astramind-ai/BitMat

Tests and Performance Tests

Closed this issue · 1 comments

Some ideas for testing:

  • It would be great to add some functional tests, e.g. generating some range of random shapes {-1,0,1} and verify function. I'd assume that that due to the packing, the range of shapes might have some limitations (e.g. must be multiple of 4 before packing, etc.)
  • It would be great to check if it is actually faster for all of these cases then torch.tensor(A) @ B -- it might require some additional options for @triton.autotune

Bonus question: I assume the L2 Cache Optimizations work like in this example?
https://triton-lang.org/main/getting-started/tutorials/03-matrix-multiplication.html#l2-cache-optimizations?

We have a test file we're planning to release pretty soon.

At the moment we're still in a beta phase, we're conducting benchmarks while developing to find the best configurations

As for the L2 hit rate optimization you're right!