/attention_analysis

analysis on attention implementation

Primary LanguagePython

Attention Performance Analysis

Performance analysis on FlashAttention2 and PagedAttention compared to baseline as TorchSDPA-Math via inference on llama-2-7b with various sequence length

  • Metrics:
    • Latency : sec
    • MaxMemoryAllocated : Torch profiler
    • MaxMemoryReserved : Torch profiler