ptillet/torch-blocksparse

"RuntimeError: could not compile kernel" when using MultiHeadattention

Closed this issue · 2 comments

Hi, I think I have configured the environment well and run several test demos in test/ successfully.
However, except for the test_attention.py, which I added a function call test_op() at the last line, compilation error occurs:

(torch_blocksparse) $ python test_attention.py 
/home3/liyz/my_torch_blocksparse/torch-blocksparse/torch_blocksparse/matmul.py:403: UserWarning: This overload of nonzero is deprecated:
        nonzero()
Consider using one of the following signatures instead:
        nonzero(*, bool as_tuple) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
  nnz = layout.nonzero()
Traceback (most recent call last):
  File "test_attention.py", line 64, in <module>
    test_op()
  File "test_attention.py", line 53, in test_op
    sparse_out, _ = sparse_mha(query, key, value, key_padding_mask=add_mask, need_weights=False)
  File "/home3/liyz/miniconda3/envs/torch_blocksparse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home3/liyz/my_torch_blocksparse/torch-blocksparse/torch_blocksparse/attention.py", line 319, in forward
    key_padding_mask_mode=self.key_padding_mask_mode, attn_mask_mode=self.attn_mask_mode)
  File "/home3/liyz/my_torch_blocksparse/torch-blocksparse/torch_blocksparse/attention.py", line 194, in multi_head_attention_forward
    attn_output_weights = sparse_dot_sdd_nt(q, k)
  File "/home3/liyz/my_torch_blocksparse/torch-blocksparse/torch_blocksparse/matmul.py", line 685, in __call__
    db_lut, db_num_locks, db_width, db_packs, self.bench, time_db)
  File "/home3/liyz/my_torch_blocksparse/torch-blocksparse/torch_blocksparse/matmul.py", line 564, in forward
    c_lut, c_num_locks, c_width, c_packs, c_bench, c_time)
  File "/home3/liyz/my_torch_blocksparse/torch-blocksparse/torch_blocksparse/matmul.py", line 359, in _sdd_matmul
    bench = bench)
  File "/home3/liyz/my_torch_blocksparse/src/triton/python/triton/kernel.py", line 194, in __call__
    self.fw_op(self.op_id, device, bench, bench_id, *args)
RuntimeError: could not compile kernel

This is a little bit confusing, could you help me out of this?

This is odd. The test seems to pass on my machine. What GPU do you have?

PS: I am very busy right now (wrapping up my PhD) but I will look into it more thoroughly as soon as I have more time

Hi, the GPU I am using is 1080 Ti.
I think the reason of this bug is that I didn't use `sudo apt-get install llvm-9-dev' to install the dependency. Instead, I just download an llvm program and add it to PATH.
I solved this problem by using the docker environment back then.
Thanks.