Question
MalaJeans opened this issue · 1 comments
hi @yaoyaoding ,
I recently tested matrix multiplication using hidet. See the following compilation content, have a question to consult. The contents are as follows:
Compling cuda task batch_matmul (a=float32(1,4352,4096),b=float32(1,4096,4096),c=float32(1,4352,4096),batch_size=1,m_size=4352,n_size=4096,k_size=4096,mma='mma')...
Compling cpu task cast(x=float64(1,4352,4096),y=float32(1,4352,4096))...
Compling cpu task cast(x=float64(1,4096,4096),y=float32(1,4096,4096))...
Compling:100%...
my question is :
In hidet, CUDA and CPU tasks were compiled separately, did they complete matrix multiplication separately, or did they complete part of each?
Thank you very much again !
Hi @MalaJeans,
The task "cast" is not part of matrix multilication. When we create a randn tensor, we will use numpy to create the random tensor on cpu and then cast it to float32, which is done by a "cast" operator.
After that, we move the tensor from cpu to gpu for the matrix multiplication. Thus, the tasks you see are separate, and "cast" is not related to the matrix multiplication.