Question / clarification regarding heap allocations
emchristiansen opened this issue · 2 comments
Asking here because I didn't see this explicitly covered in the docs.
In what conditions can a user know that a given piece of dfdx
code won't perform any heap allocations?
E.g., will I be good if i simply ensure all my tensors have static shapes?
I don't have a perfect apples-to-apples comparison, but I have the impression my dfdx
code is maybe 2x to 4x slower than equivalent code written in JAX and compiled to XLA.
In the XLA version, the full data flow graph including the tensor shapes is statically known, so there's no need for dynamic allocation, and I'm wondering if that might be the source of the apparent difference in speed.
Yeah it's likely due to allocations - the only thing dfdx does for reducing allocations is for unary/binary operations where one of the inputs only has 1 owner & the output has the same shape/strides as the input. In this case dfdx will reuse the input allocation for the output. But if the input has multiple owners or a different shape it can't do that. In every other case there will be an allocation
If you want to add details about this to the docs feel free to open a PR! Probably good at the mod level tensor documentation