NVIDIA/FasterTransformer

Incomplete explanation

lix19937 opened this issue · 1 comments

Branch/Tag/Commit

v5.3_tag

Docker Image Version

22.08

GPU name

RTX 3070

CUDA Driver

470.129.06

Reproduced Steps

https://github.com/NVIDIA/FasterTransformer/blob/release/v5.3_tag/src/fastertransformer/layers/attention_layers/FusedAttentionLayer.h#L28

// This class is only used when we satisfy the following conditions:
// 1. FP16
// 2. Temporally add seqlen <= 512 limitation because the
template<typename T>

Incomplete explanation

dup