Issues
- 3
- 6
Compatibility issue with CUDA 12.2
#730 opened by MinghaoYan - 1
build failed with tf-op
#701 opened by jackzhou121 - 3
GPTNeox decoding argumentation
#713 opened by w775739733 - 1
Support for Falcon models
#669 opened by ankit201 - 1
- 9
- 6
# feature request # GPT-Q 4 bit support
#715 opened by Xingxiangrui - 8
Are MQA and GQA in development?
#727 opened by ljayx - 2
llama support inference?
#729 opened by double-vin - 2
TP=2, Loss of accuracy
#734 opened by coderchem - 3
- 4
docker/Dockerfile.torch occurs errors
#720 opened by b3y0nd - 2
infer_visiontransformer_op.py error
#680 opened by macrocredit - 4
How to transfer glm2 model to fastertransformer
#726 opened by AndreWanga - 1
- 1
Serve Deberta using FasterTransformer in Triton
#691 opened by sfc-gh-zhwang - 0
Which version of cutlass was adopted?
#723 opened by Liu-xiandong - 0
- 0
- 0
I tried your way, but still get this error
#721 opened by pangr - 0
ParallelGPT stop_words_list
#677 opened by cpm0722 - 1
Incomplete explanation
#719 opened by lix19937 - 0
Incomplete explanation
#718 opened by lix19937 - 5
undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
#708 opened by sfc-gh-zhwang - 1
- 0
GPTNeox decoding argumentation
#712 opened by w775739733 - 0
Can FasterTransformer only deploy via Triton server?
#711 opened by jxcomeon - 0
- 0
opt-6.7b smoothquant result error
#709 opened by sitabulaixizawaluduo - 2
Running deberta gives me different result for fastertransformer vs huggingface
#707 opened by sfc-gh-zhwang - 0
Where can I find config.json for GPT
#706 opened by htang2012 - 5
[bug] CustomAllReduceComm swapInternalBuffer is not safe (modifying const pointer).
#671 opened by rkindi - 1
[bug] gptneox decouped wrong output length
#704 opened by RobotGF - 0
- 0
Is FT thread-safe?
#700 opened by sleepwalker2017 - 1
Could NOT find NCCL
#676 opened by arnavdixit - 0
LongT5 Support
#694 opened by kjtaed - 0
- 0
Support for MSVC on windows ?
#692 opened by FdyCN - 0
GptOp interface for pytorch need update
#690 opened by lygztq - 0
Why does Fused MultiHeadAttention only exist for FP16 but not FP32 for ViT
#689 opened by macrocredit - 1
- 0
- 0
OSError: libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory
#686 opened by Quang-elec44 - 0
support for no_repeat_ngram_size parameter ?
#684 opened by parinaya-007 - 1
gptneox_example error
#670 opened by DesperadoDQY - 0
New repo ownership?
#681 opened by Chris113113 - 0
- 0