d-li14/involution

Fast and generic implementation using OpenMP and CUDA

shikishima-TasakiLab opened this issue · 3 comments

I have implemented a module using OpenMP and CUDA that runs faster while maintaining the memory efficiency of your CuPy implementation.

shikishima-TasakiLab/Involution-PyTorch

It also supports TorchScript and 16-bit float.

shikishima-TasakiLab/Involution-PyTorch#1

Great work! It will help a lot in practice!
As I have mentioned in the README, would you please make a PR to contribute to this repo? Just to be on the safe side, I will run some experiments to double-check the reimplementation's correctness before merging it into the main branch. Thanks.

I made a PR.
I did not merge the conflicting parts of the README, so please add module descriptions accordingly.

OK, I will verify and merge it as soon as I could.