derivative for aten::_scaled_dot_product_efficient_attention_backward is not implemented
Darius888 opened this issue · 3 comments
Hello,
When trying to apply the Sine Wave example approach to a transformer based model I get the following output:
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 767, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: derivative for aten::_scaled_dot_product_efficient_attention_backward is not implemented
Regression task setup. Multiple sequences.
Is it possible to somehow work around this ?
Thank you,
I think this happens when you set first_order = False
, so the simplest way is to set first_order = True
If you really want to do second order, check this pytorch/pytorch#117974
This was exactly it, thank you so much! @JingminSun
This was exactly it, thank you so much! @JingminSun
How to modify it specifically?
I think this happens when you set
first_order = False
, so the simplest way is to setfirst_order = True
If you really want to do second order, check this pytorch/pytorch#117974