NVIDIA/FasterTransformer

Transformer related optimization, including BERT, GPT

C++Apache-2.0

Issues

Can FasterTransformer support SAM2 ( Meta segment anything model 2)?
#795 opened 4 months ago by jackwei86
0
How to implement a gemm with FP16 and INT4 using kernel in FasterTransformer/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemm
#794 opened 5 months ago by AkatsukiChiri
0
An error occurred for the specific cuda version
#793 opened 8 months ago by CSLiuPeng
0
can be used in diffusion models,like sd and sdxl? how?where is the demos?tks
#792 opened 8 months ago by henbucuoshanghai
0
error: ‘CUDNN_DATA_BFLOAT16’ was not declared in this scope; did you mean ‘CUDNN_DATA_FLOAT’
#789 opened 9 months ago by johnson-magic
0
bug: memory of position_encoding_table is not malloced correctly.
#790 opened 9 months ago by johnson-magic
0
How to know the correspondence between versions vcr.io/nvidia/pytorch:xx.xx-py3 and pytorch?
#788 opened 10 months ago by johnson-magic
0
what is the mean of EFF-FT?
#787 opened 10 months ago by johnson-magic
0
Are `fuseQKV masked attention` and Flash Attention the same?
#786 opened a year ago by likejazz
0
on H800 can not exec nvidia/pytorch:23.09-py3 container success
#784 opened a year ago by chenglinjun
0
Confidence is not returned in the decoding example？
#783 opened a year ago by liuzhuang1024
0
How to serving multi-gpu inference？
#772 opened a year ago by Alone-wl
1
Support for "no_repeat_ngram_size" parameter for generation
#759 opened a year ago by shreysingla11
2
multi_block_mode performance issue
#782 opened a year ago by akhoroshev
1
Does FasterTransformer support multi-stream pipeline parallelism ?
#781 opened a year ago by FlyingPotatoZ
0
error You need C++17 to compile PyTorch
#779 opened a year ago by ranggihwang
0
can support decoder only bart? such as MBartForCausalLM
#778 opened a year ago by sjtu-cz
0
core dumped of swin model
#746 opened a year ago by chiemon
1
repetition_penalty logic in FT has bug
#777 opened a year ago by hezeli123
1
Failed building t5 model in FastTransformer (Reached 82% then stopped)
#744 opened a year ago by EmanElrefai12
3
Sparsity support
#775 opened a year ago by zhang662817
0
How to get started?
#774 opened a year ago by turbobuilt
0
Is llama2 70b supported? Do you know minimal configuration?
#771 opened a year ago by ChristineSeven
1
flashattention only enabled for gpt-styled models
#754 opened a year ago by flexwang
7
[Long seq length] GPT Seq length constrain
#752 opened a year ago by zhen-jia
14
Using faster transformers to infer the bloom model, the accuracy rate is 0
#736 opened a year ago by hurun
2
Supporting for expert parallelism in MoE inference
#769 opened a year ago by iteratorlee
0
Whether fastertransformer supports gpt-2 classification model, such as GPT2ForSequenceClassification？
#768 opened a year ago by cabbagetalk
0
cuSPARSELt is slower?
#767 opened a year ago by BDHU
1
OSError: lib/libth_transformer.so: cannot open shared object file: No such file or directory
#735 opened a year ago by arnabmanna619
1
Incorrect inline ptx device assembly code usage
#766 opened a year ago by zhiweij1
0
src/fastertransformer/kernels/decoder_masked_multihead_attention /decoder_masked_multihead_attention_template.hpp:36 open this macro definition, it'll find a build error
#763 opened a year ago by pengl
0
CUDA code compile error with clang: function template partial specialization is not allowed
#765 opened a year ago by zhiweij1
0
How to calculate local batch size?
#764 opened a year ago by fotstrt
0
How to get a npz file that satisfy the input requirement?
#753 opened a year ago by jy00161yang
1
terminate called after throwing an instance of 'std::runtime_error'
#761 opened a year ago by HalFTeen
0
fastertransformer/utils/nccl_utils.cc:62 'unhandled cuda error'
#760 opened a year ago by wangweiwei1188
0
Does FasterTransformer use FlashAttention?
#758 opened a year ago by niyunsheng
0
Which part should I modify to achieve inference pipeline schedule (like micro-batch)?
#757 opened a year ago by dannyxiaocn
0
How to run multi_gpu_gpt_examples.py without mpirun/mpiexe
#747 opened a year ago by ZZWHU
0
specify the recognition language for Whisper
#751 opened a year ago by echodjx
0
Is it possible to serve GPT-NeoX ONNX exported through optimum?
#749 opened a year ago by sonientaegi
0
[feature request] transformer on orin
#748 opened a year ago by superpigforever
0
Can't run translate_example.py found in T5 model
#745 opened a year ago by EmanElrefai12
0
GPT-MoE supports for expert parallel
#743 opened a year ago by YJHMITWEB
0
decoupled model with non-streaming mode
#741 opened a year ago by flexwang
0
decoupled mode not working when beam_width > 1
#740 opened a year ago by flexwang
0
Limit cuda memory growth
#739 opened a year ago by coderchem
0
The int8 model saved by run_squad can't import by effective_transformer
#738 opened a year ago by modkzs
0
is_return_output_log_probs doesn't return logits for T5 model
#737 opened a year ago by swairshah
0