/transformer_cpp_ITCS-5182

Optimization of Attention layers for efficient inferencing on the CPU and GPU. It covers optimizations for AVX and CUDA also efficient memory processing techniques.

Primary LanguageC++

No issues in this repository yet.