Performance analysis on FlashAttention2 and PagedAttention compared to baseline as TorchSDPA-Math via inference on llama-2-7b
with various sequence length
- Metrics:
- Latency : sec
- MaxMemoryAllocated : Torch profiler
- MaxMemoryReserved : Torch profiler