pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
PythonBSD-3-Clause
Issues
- 1
- 4
- 2
Throughput Benchmark Scripts
#173 opened by HanGuo97 - 0
Missing Keys in state_dict
#172 opened by bjohn22 - 0
Tensor Parallel Inside notebook
#167 opened by nivibilla - 0
mmap issue in bf16 of gpt-fast
#165 opened by yanbing-j - 1
INT4 quantization not working on MI210
#154 opened by yafehlis - 2
- 2
Input token length question
#160 opened by kaizizzzzzz - 0
Naming: n_local_heads -> n_kv_heads
#162 opened by ad8e - 2
Tiny Llamas Not Found
#150 opened by guihao-liang - 4
On the memory usage of `ConditionalFeedForward`
#149 opened by carmocca - 2
- 3
- 1
- 1
- 0
AMD RX 7900 XTX Wrong outputs
#120 opened by makaveli10 - 3
What happens to bias during int8 quantization?
#108 opened by gchhablani - 2
int4/int4-gptq support in Mixtral 8x7B
#129 opened by yanbing-j - 1
- 0
Reducing Latency in Application with Torch Compilation: Initialization and Inference Optimization
#127 opened by daniyal214 - 0
Int4 perplexity
#125 opened by SinanAkkoyun - 2
Bug convert HF model
#54 opened by vinhtran2611 - 5
- 0
- 3
Can GPT-Fast support larger batch sizes
#90 opened by yetingqiaqia - 2
pass@1 score extremely low using GPT-fast API
#94 opened by yafehlis - 1
- 2
- 2
Does `gpt-fast` work on V100 GPUs?
#72 opened by RomanKoshkin - 8
Try Tensor Parallel on a server equipped with two V100 linked by NVLINK, but got a performance degradation
#111 opened by duanzhaol - 1
batching/dynamic batching
#112 opened by nivibilla - 5
Understanding why TorchInductor cannot speed-up huggingface transformer inference
#59 opened by learning-chip - 3
- 5
- 1
`eval.py` uses older version of lm_eval
#89 opened by nairbv - 1
Code is extremely slow!
#78 opened by yafehlis - 1
- 1
- 2
- 0
- 0
Error when running convert_hf_checkpoint.py for TinyLlama-1.1B-intermediate-step-480k-1T
#75 opened by yafehlis - 0
- 1
- 0
Device-side assertions’ error when speculative decoding with different length of prompts.
#69 opened by ZipECHO - 6
8 (or 2 more) X A100 GPUs Model Output is Garbled and Failure to Terminate the Program Properly (One GPU is Correct)
#64 opened by qianghuangwhu - 0
- 6
Tensor parallel hangs on call to model
#55 opened by briandw - 0
blip2 can be supportted??
#61 opened by wangjing60755 - 1