Pinned issues
Issues
- 0
能否支持Volta/Tesla架构?
#242 opened by alexngng - 0
- 1
Shared-prefix rope issue
#194 opened by lkc1997 - 3
Support torch 2.3
#227 opened by rkooo567 - 1
TypeError: get_cu_file_str() missing 1 required positional argument: 'idtype'
#222 opened by xuzhenqi - 0
[Install] Build error on main branch
#195 opened by esmeetu - 1
[LoRA] Roadmap of LoRA operators
#199 opened by yzh119 - 4
[Feature Request] Versatile head dimension
#142 opened by yzh119 - 0
Vllm support
#202 opened by MikeChenfu - 5
Faster compilation times
#154 opened by skrider - 5
[Roadmap] FlashInfer v0.1.0 release checklist
#19 opened by yzh119 - 7
Make flashinfer kernels cuda graphs friendly
#187 opened by AgrawalAmey - 2
Compare Append Kernel's Results with Xformers
#192 opened by LiuXiaoxuanPKU - 6
How to use low-bit KV Cache in flashinfer?
#125 opened by zhaoyang-star - 0
Does flashinfer support float datatype?
#191 opened by ZSL98 - 1
QUESTION: C++ API support Ragged Tensor now?
#189 opened by yz-tang - 3
Basic inference example for LLama/Mistral
#108 opened by vgoklani - 5
How was the data in the blog measured?
#188 opened by cloudhan - 1
falshinfer build error
#186 opened by yz-tang - 4
Support for Volta / Turing architectures
#160 opened by tgaddair - 1
[BUG] model Yi-34B compat
#181 opened by Qubitium - 0
[Tracking Issue] PyTorch bindings
#64 opened by yzh119 - 0
- 0
Could you release a wheel for Python 3.8 as well?
#129 opened by WoosukKwon - 0
[Roadmap] 0.0.3 Release Checklist
#138 opened by yzh119 - 5
0.0.3 wheels not in flashinfer.ai/whl/
#168 opened by Qubitium - 1
Wheels version bumping
#175 opened by hnyls2002 - 0
JIT compilation
#170 opened by yzh119 - 2
Google Gemma running error with half dtype
#157 opened by hnyls2002 - 1
- 0
[Performance] Support strides in attention kernels
#163 opened by yzh119 - 1
quant support
#150 opened by zhyncs - 2
Sliding window attention
#159 opened by WoosukKwon - 5
Downloadable Package in PyPI
#153 opened by WoosukKwon - 1
Still looking forward to an e2e example!
#149 opened by ZSL98 - 2
Float8 cache usage
#155 opened by YLGH - 4
- 3
Where can I find end-to-end examples?
#51 opened by WoosukKwon - 0
[Feature request] Interleaved ROPE support
#151 opened by guocuimi - 3
Could you support AliBi attention bias?
#137 opened by WoosukKwon - 0
[Feature Request] More versatile GQA group sizes
#140 opened by yzh119 - 0
Can I only profile dense layer or attention layer in flashinfer rather than the whole kernel?
#139 opened by yintao-he - 1
Suppose Gemma model shape
#130 opened by yzh119 - 2
[Compiling Issue] error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapper" matches the argument list
#134 opened by yintao-he - 1
Please release pre-built wheels for python 3.9
#112 opened by merrymercy - 0
Compilation error on A100 + cuda 12 + python3.9
#113 opened by merrymercy - 0
[Tracking Issue] Documentation and Examples
#67 opened by yzh119 - 1
[Tracking Issue] Prebuilt PyPI wheels
#66 opened by yzh119 - 0
[Roadmap] FlashInfer v0.0.1 release checklist
#11 opened by yzh119 - 0