Issues
- 0
- 0
How to implement a gemm with FP16 and INT4 using kernel in FasterTransformer/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemm
#794 opened by AkatsukiChiri - 0
An error occurred for the specific cuda version
#793 opened by CSLiuPeng - 0
can be used in diffusion models,like sd and sdxl? how?where is the demos?tks
#792 opened by henbucuoshanghai - 0
error: ‘CUDNN_DATA_BFLOAT16’ was not declared in this scope; did you mean ‘CUDNN_DATA_FLOAT’
#789 opened by johnson-magic - 0
- 0
How to know the correspondence between versions vcr.io/nvidia/pytorch:xx.xx-py3 and pytorch?
#788 opened by johnson-magic - 0
what is the mean of EFF-FT?
#787 opened by johnson-magic - 0
- 0
- 0
- 1
How to serving multi-gpu inference?
#772 opened by Alone-wl - 2
- 1
multi_block_mode performance issue
#782 opened by akhoroshev - 0
- 0
error You need C++17 to compile PyTorch
#779 opened by ranggihwang - 0
- 1
core dumped of swin model
#746 opened by chiemon - 1
repetition_penalty logic in FT has bug
#777 opened by hezeli123 - 3
- 0
Sparsity support
#775 opened by zhang662817 - 0
How to get started?
#774 opened by turbobuilt - 1
- 7
flashattention only enabled for gpt-styled models
#754 opened by flexwang - 14
[Long seq length] GPT Seq length constrain
#752 opened by zhen-jia - 2
- 0
- 0
Whether fastertransformer supports gpt-2 classification model, such as GPT2ForSequenceClassification?
#768 opened by cabbagetalk - 1
cuSPARSELt is slower?
#767 opened by BDHU - 1
OSError: lib/libth_transformer.so: cannot open shared object file: No such file or directory
#735 opened by arnabmanna619 - 0
Incorrect inline ptx device assembly code usage
#766 opened by zhiweij1 - 0
src/fastertransformer/kernels/decoder_masked_multihead_attention /decoder_masked_multihead_attention_template.hpp:36 open this macro definition, it'll find a build error
#763 opened by pengl - 0
CUDA code compile error with clang: function template partial specialization is not allowed
#765 opened by zhiweij1 - 0
How to calculate local batch size?
#764 opened by fotstrt - 1
- 0
- 0
- 0
Does FasterTransformer use FlashAttention?
#758 opened by niyunsheng - 0
Which part should I modify to achieve inference pipeline schedule (like micro-batch)?
#757 opened by dannyxiaocn - 0
- 0
specify the recognition language for Whisper
#751 opened by echodjx - 0
- 0
[feature request] transformer on orin
#748 opened by superpigforever - 0
Can't run translate_example.py found in T5 model
#745 opened by EmanElrefai12 - 0
GPT-MoE supports for expert parallel
#743 opened by YJHMITWEB - 0
decoupled model with non-streaming mode
#741 opened by flexwang - 0
decoupled mode not working when beam_width > 1
#740 opened by flexwang - 0
Limit cuda memory growth
#739 opened by coderchem - 0
- 0