NVIDIA/FasterTransformer

Transformer related optimization, including BERT, GPT

C++Apache-2.0

Issues

[Compile] Compilation Failed when CMake using CUDA 11.8 on Windows.
#688 opened a year ago by FdyCN
3
Compatibility issue with CUDA 12.2
#730 opened a year ago by MinghaoYan
6
build failed with tf-op
#701 opened a year ago by jackzhou121
1
GPTNeox decoding argumentation
#713 opened a year ago by w775739733
3
Support for Falcon models
#669 opened 2 years ago by ankit201
1
Are there plans to support INT8 PTQ for other models (GPTNeox)
#674 opened 2 years ago by aitorormazabal
1
when fastertransformer support continuous batching and PagedAttention ?
#696 opened a year ago by ppppppppig
9
# feature request # GPT-Q 4 bit support
#715 opened a year ago by Xingxiangrui
6
Are MQA and GQA in development?
#727 opened a year ago by ljayx
8
llama support inference？
#729 opened a year ago by double-vin
2
TP=2， Loss of accuracy
#734 opened a year ago by coderchem
2
[Question] Is it possible to use my own pretrained weights for ViT QAT
#728 opened a year ago by proevgenii
3
docker/Dockerfile.torch occurs errors
#720 opened a year ago by b3y0nd
4
infer_visiontransformer_op.py error
#680 opened 2 years ago by macrocredit
2
How to transfer glm2 model to fastertransformer
#726 opened a year ago by AndreWanga
4
Does gpt_gemm still useful when use sm_80 and newer GPU architectures
#717 opened a year ago by tingshua-yts
1
Serve Deberta using FasterTransformer in Triton
#691 opened a year ago by sfc-gh-zhwang
1
Which version of cutlass was adopted？
#723 opened a year ago by Liu-xiandong
0
Why FT and torch gives bit-wise non-identical results?
#722 opened a year ago by frankxyy
0
cuda 11.7 and cuda 11.8 gives different results for decoder self-attention?
#702 opened a year ago by frankxyy
0
I tried your way, but still get this error
#721 opened a year ago by pangr
0
ParallelGPT stop_words_list
#677 opened a year ago by cpm0722
0
Incomplete explanation
#719 opened a year ago by lix19937
1
Incomplete explanation
#718 opened a year ago by lix19937
0
undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
#708 opened a year ago by sfc-gh-zhwang
5
Does Fused Multi-head Attention support self-defined attention-mask?
#714 opened a year ago by zhanghaoie
1
GPTNeox decoding argumentation
#712 opened a year ago by w775739733
0
Can FasterTransformer only deploy via Triton server？
#711 opened a year ago by jxcomeon
0
smoothquant result error on opt-6.7b model
#710 opened a year ago by sitabulaixizawaluduo
0
opt-6.7b smoothquant result error
#709 opened a year ago by sitabulaixizawaluduo
0
Running deberta gives me different result for fastertransformer vs huggingface
#707 opened a year ago by sfc-gh-zhwang
2
Where can I find config.json for GPT
#706 opened a year ago by htang2012
0
[bug] CustomAllReduceComm swapInternalBuffer is not safe (modifying const pointer).
#671 opened a year ago by rkindi
5
[bug] gptneox decouped wrong output length
#704 opened a year ago by RobotGF
1
Trouble Passing 'num_return_sequences' Parameter for GPT-J Model Deployment
#703 opened a year ago by BaiMoHan
0
Is FT thread-safe?
#700 opened a year ago by sleepwalker2017
0
Could NOT find NCCL
#676 opened 2 years ago by arnavdixit
1
LongT5 Support
#694 opened a year ago by kjtaed
0
Convert deberta pytorch model to fastertranformer
#693 opened a year ago by sfc-gh-zhwang
0
Support for MSVC on windows ?
#692 opened a year ago by FdyCN
0
GptOp interface for pytorch need update
#690 opened a year ago by lygztq
0
Why does Fused MultiHeadAttention only exist for FP16 but not FP32 for ViT
#689 opened a year ago by macrocredit
0
Swin-T 224x224 QAT: not able to reproduce 81.00% accuracy
#685 opened 2 years ago by gcunhase
1
Rotary embedding output different from that of huggingface
#687 opened a year ago by frankxyy
0
OSError: libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory
#686 opened a year ago by Quang-elec44
0
support for no_repeat_ngram_size parameter ?
#684 opened 2 years ago by parinaya-007
0
gptneox_example error
#670 opened 2 years ago by DesperadoDQY
1
New repo ownership?
#681 opened 2 years ago by Chris113113
0
Empty outputs when using FT with HuggingFace checkpoints
#679 opened 2 years ago by arnavdixit
0
What's the differences between TransformerEngine and FasterTransformer ?
#673 opened 2 years ago by butterluo
0