nevinbaiju/transformer_cpp_ITCS-5182
Optimization of Attention layers for efficient inferencing on the CPU and GPU. It covers optimizations for AVX and CUDA also efficient memory processing techniques.
C++
No issues in this repository yet.
Optimization of Attention layers for efficient inferencing on the CPU and GPU. It covers optimizations for AVX and CUDA also efficient memory processing techniques.
C++
No issues in this repository yet.