thuml/depyf

[help wanted] Why does `torch.compile` dump each Triton kernel?

imShZh opened this issue · 1 comments

All stuff in depyf works fine.

After I ran the example in README with depyf, there are multiple files in target directory.

├── __compiled_fn_1 AFTER POST GRAD 0.py
├── __compiled_fn_1 Captured Graph 0.py
├── __compiled_fn_1 Forward graph 0.py
├── __compiled_fn_1 kernel 0.py
├── __compiled_fn_1 kernel 1.py
├── __compiled_fn_1 kernel 2.py
├── __compiled_fn_5 AFTER POST GRAD 0.py
├── __compiled_fn_5 Captured Graph 0.py
├── __compiled_fn_5 Forward graph 0.py
├── __compiled_fn_5 kernel 0.py
├── __compiled_fn_5 kernel 1.py
├── full_code_for_toy_example_0.py
├── __transformed_code_0_for_torch_dynamo_resume_in_toy_example_at_9.py
└── __transformed_code_0_for_toy_example.py

Why does torch.compile dump __compiled_fn_1 kernel 1.py and __compiled_fn_1 kernel 2.py while dumping __compiled_fn_1 kernel 0.py? Since the latter already contains the string form of the first two Triton kernels?

thanks for your interest!

these are the intermediate steps of torch.compile. possibly torch.compile generates two kernels first, and then merge then into a single file 🤔