Consider having different convolution implementations available and choosing the fastest one at runtime

Question

Consider having different convolution implementations available and choosing the fastest one at runtime

Dobiasd opened this issue 3 years ago · 1 comments

Different convolution implementations might perform differently depending on the convolution settings (input size/depth, kernel size/count) and depending on the hardware (mostly CPU/memory) used.

Right now, for example, we have a special implementation used for 2D convolutions in case strides = (1, 1) (which utilized not only by the Conv2D layer, but also by DepthwiseConv2D, and SeparableConv2D).

I wonder if it would make sense to provide a function to the user, that when called on a model, tries out different implementations and remembers which one performed best for future calls of model.predict. (Maybe in some settings, event a naive non-im2col convolution is the fastest one.)

Pros:

potentially faster forward passes

Cons:

increased code complexity
potentially wrong settings in case the background load on the user's machine varies too much during the evaluation

Answer 1 · 2023-09-12T05:55:19.000Z

Closing this one because of the cons listed above.