Are the paper's benchmarks in this repo?

Question

Are the paper's benchmarks in this repo?

Opened this issue 4 months ago · 6 comments

Chillee commented 4 months ago

^

Answer 1 · 2024-05-09T19:44:47.000Z

The benchmarks are available in cpp_examples. We are working on the Python version of the benchmarks.

Answer 2 · 2024-05-09T23:27:23.000Z

Do the python version of the benchmarks include how the baselines are being run?

Answer 3 · 2024-05-10T01:57:39.000Z

Also, is there example generated code for any of the kernels somewhere?

Answer 4 · 2024-05-10T02:02:37.000Z

@Chillee Mirage's generated code can be executed using two backends: (1) a native cutlass-based backend, and (2) a Triton backend. While some of the evaluation results are measured using the cutlass backend (due to some operators not supported by Triton), I think the Triton backend is easier to try. If you are looking for generated code, you can try the docker image and run python demo/demo_group_query_attention_spec_decode.py --checkpoint demo/checkpoint_group_query_attn_spec_decode.json, which will generate all Triton programs discovered by Mirage for group query attention.

Answer 5 · 2024-05-10T02:07:24.000Z

Imo it would help understanding of the paper significantly if you include some of the generated triton programs in the github repo. The paper reports that each benchmark seems to take up to 6 hours, so it's a nontrivial task to run it just to see what kind of kernels it's generating :)

Answer 6 · 2024-05-10T02:15:07.000Z

Thanks for the suggesting! We will consider including some Triton examples in the repo to help users understand Mirage and know what to expect from Mirage. One challenge we need to address is that the search procedure itself may be non-deterministic --- users may get different Triton programs from each search procedure. Thanks again for the suggestion.