Question

Question

Question

MalaJeans opened this issue a year ago · 1 comments

hi @yaoyaoding ，
I recently tested matrix multiplication using hidet. See the following compilation content, have a question to consult. The contents are as follows:
Compling cuda task batch_matmul (a=float32(1,4352,4096),b=float32(1,4096,4096),c=float32(1,4352,4096),batch_size=1,m_size=4352,n_size=4096,k_size=4096,mma='mma')...
Compling cpu task cast(x=float64(1,4352,4096),y=float32(1,4352,4096))...
Compling cpu task cast(x=float64(1,4096,4096),y=float32(1,4096,4096))...
Compling:100%...

my question is :
In hidet, CUDA and CPU tasks were compiled separately, did they complete matrix multiplication separately, or did they complete part of each?

Thank you very much again ！

Answer 1 · 2023-06-06T05:41:52.000Z

Hi @MalaJeans,

The task "cast" is not part of matrix multilication. When we create a randn tensor, we will use numpy to create the random tensor on cpu and then cast it to float32, which is done by a "cast" operator.

After that, we move the tensor from cpu to gpu for the matrix multiplication. Thus, the tasks you see are separate, and "cast" is not related to the matrix multiplication.