Is there any way for users to inspect the connections between cuda kernels compiled from operators?

Question

Is there any way for users to inspect the connections between cuda kernels compiled from operators?

VincentXWD opened this issue 10 months ago · 8 comments

I would like to see the connections between all cuda kernels generated by hidet in .cache folder. It seems that the directory is organized by operator categories without any other information about the connection only contains operator named directory and hash-named operator cuda kernel instances.

I wonder if I could get enough information(some config or some in-code position to add some prints to help me get the map of hash-named kernels and operators) so that I can construct some NN by myself using these kernels.

Answer 1 · 2023-12-04T18:21:43.000Z

Hi @VincentXWD ,

There is a text file named task.txt indicating which task (you can think it is the computation of the operator) the operator cache stores. The task defines the interface of the compiled kernel.

You can learn more about how we hash the task or load the cache in this function: https://github.com/hidet-org/hidet/blob/main/python/hidet/drivers/build_task.py#L213

Answer 2 · 2023-12-05T04:26:24.000Z

Thank you @yaoyaoding for your replying and the reference .py code. Yes this is the way to get the task-operator mapping but I still cannot decide the positions of compiled kernels in network.

I wonder know if I can get the connections of these kernels.

For example, suppose I have some tasks:

dde7be8d40f119f8 : fused(y=float32(2304,), x=float32(1, 40, 2304), y=float32(3, 12, 40, 64), fused_ops='reshape add reshape rearrange', anchor='rearrange')

4742a24e5f0cf34e : fused(b=float32(8, 96, 2304), data=float32(40, 768), y=float32(1, 8, 40, 2304), fused_ops='broadcast reshape rearrange batch_matmul reshape', anchor='batch_matmul')

8315bfe1649dc27f : fused(b=float32(1, 768, 50257), data=float32(40, 768), y=float32(40, 50257), fused_ops='broadcast batch_matmul reshape', anchor='batch_matmul')

003b329f9684cc83 : fused(b=float32(8, 96, 3072), data=float32(40, 768), y=float32(1, 8, 40, 3072), fused_ops='broadcast reshape rearrange batch_matmul reshape', anchor='batch_matmul')

69b07ec79b8aaf99 : fused(y=float32(40, 40), x=float32(12, 40, 40), y=float32(96, 40, 5), fused_ops='reshape divs add softmax rearrange reshape rearrange', anchor='softmax')

289abe506ef9afa1 : fused(y=float32(40, 768), x=float32(40, 768), x=float32(1, 40, 768), z=float32(40, 768), fused_ops='reshape add add', anchor='add')

I would like to get a DAG-like description so that I can construct the network using these generated kernels hash strings(or using the function name and parameters):

dde7be8d40f119f8 -> 4742a24e5f0cf34e
         |                  L---------------------------|
         v                                              v
003b329f9684cc83 -> 69b07ec79b8aaf99 -> 289abe506ef9afa1 -> 8315bfe1649dc27f

Can I get such a graph description during the task compiling process?

Answer 3 · 2023-12-05T17:18:26.000Z

Hi @VincentXWD,

I see. You can dump the compiled model like https://github.com/hidet-org/hidet/blob/main/tests/unit_tests/test_compiled_model.py to a single file. After you run the demo code, you will get a model.hidet file, which is a zip file and you can open it directly with zip file brower.

It looks like

All the kernels are stored in /kernels directory. The graph_string.txt stores the textual representation of the model while you can check the graph_execution.json to know the relationship between the operators and compiled kernels (the task_idx in each instructions entry).

See https://github.com/hidet-org/hidet/blob/main/python/hidet/runtime/compiled_graph.py#L437 to know how we load and run the model stored in model.hidet.

Answer 4 · 2023-12-05T17:21:33.000Z

When you load a model.hidet, we will also extract all the contents (except weights.npz) to hidet's cache directory.

If you are using pytorch frontend, you might not be able to get the compiled model dumped to the disk. (We can add an option to support this functionality though)

Answer 5 · 2023-12-05T17:45:40.000Z

Thanks @yaoyaoding. I found the cache/graphs/ directory and learned about the file graph_execution.json. It described all information that I want.

It seems that hidet finishes the load process in python frontend. Actually I want to try something using another heterogeneous computing framework to optimize the inference process of networks. So the weights is quite important for me, too. It's great if you could provide a dump method to save these parameters to disks. But I think I can dump it by myself at present.

Hidet is such a great project with clear architecture and high performance, also many instructions for users to reachout the intermediate data. And it's really a good choice to be one of a compiler backend of pytorch. Looking forward to more powerful features!

Answer 6 · 2023-12-06T05:42:36.000Z

It's great if you could provide a dump method to save these parameters to disks

If you use the save method of compile model (e.g., https://github.com/hidet-org/hidet/blob/main/tests/unit_tests/test_compiled_model.py#L35), the weights will be stored in the dumped file (the weights.npz file in the figure above, you can load it using numpy's loading method. The graph_execution.json also describes how to use these weights).

Hidet is such a great project with clear architecture and high performance, also many instructions for users to reachout the intermediate data. And it's really a good choice to be one of a compiler backend of pytorch. Looking forward to more powerful features!

Thanks for your kind words and we will try to make hidet better!

Answer 7 · 2023-12-06T05:47:12.000Z

If you want to dump the weights to disk, you can add the relative logic to this function https://github.com/hidet-org/hidet/blob/main/python/hidet/drivers/build_graph.py#L285

Answer 8 · 2023-12-06T06:26:33.000Z

Thank you! Your response to this question is clear and timely. I have another question but not related to this issue. I will open another one :-)