Pinned issues
Issues
- 3
Mismatch output wth INT1 × float16
#276 opened by 1773226512 - 2
- 5
[Feature Request] Default Backend Should be changed into `tilelang` instead of `tvm` before v0.0.1 release
#252 opened by LeiWang1999 - 3
- 0
[Feature Request] Enhance Database to support reload scheduled tilelang operator
#269 opened by LeiWang1999 - 2
[Feature Request] Enhance TileLang to auto inject ifthenelse for Out-of-Bounds Memory Access in TileLang
#265 opened by LeiWang1999 - 1
[Feature Request] LayoutInference pass should be enhanced to analysis vectorize factor cross indices
#266 opened by LeiWang1999 - 3
- 4
- 2
[Feature Request] Lazy tvm tensor intrin registration is required to save import time
#256 opened by LeiWang1999 - 0
[Feature Request] Flash Attention Op should be enhanced with our Scheduler Abstraction
#264 opened by LeiWang1999 - 9
performance of float16 with fast tuning
#173 opened by klxy0304 - 4
InternalError: Check failed: func->buffer_map.size() == 0 (3 vs. 0) : This pass must be called after MakePackedAPI
#259 opened by Cunxiao2002 - 2
Does Bitblas support P40's DP4A for matrix?
#251 opened by sorasoras - 2
Issue with integrating with AutoGPTQ
#257 opened by Steindox - 2
[Feature Request] Lower Vectorized Loop Pass should be enhanced to adapt layout inference
#258 opened by LeiWang1999 - 2
Any example to use `float16xfp4_e2m1` matmul?
#253 opened by yaoyaoding - 5
Release Plan of BitBLAS 0.0.1
#150 opened by LeiWang1999 - 2
- 12
CUDA error: an illegal memory access was encountered when using BitBlas on multiple GPUs
#154 opened by mobicham - 4
- 5
- 3
[Feature Request][TL] Maybe need an annotation to disable part of the ast be translated into TL
#221 opened by LeiWang1999 - 3
- 3
[Feature Request] Enhance Simplification to remove unused function arguments
#215 opened by LeiWang1999 - 17
Different outputs based on weight shape
#157 opened by MekkCyber - 0
[Feature Request] Parallel Primitive Should be enhanced to improve the performance for irregular shapes
#209 opened by LeiWang1999 - 0
'
#206 opened by LeiWang1999 - 3
- 2
Installation Failure on CUDA 11.7
#203 opened by rustic-snob - 5
How to achieve vLLM Row Parallelism correctly?
#186 opened by KeremTurgutlu - 1
Matmul Output Hardware Mismatch
#187 opened by KeremTurgutlu - 1
[Bug] Incorrect output for bitnet model
#184 opened by LeiWang1999 - 4
Segmentation fault when integrated with Ray
#181 opened by mobicham - 11
Speed Comparison: BitLinear and nn.Linear
#118 opened by ZiqingChang - 0
performance of float16 with fast tuning
#172 opened by klxy0304 - 4
"nan" in the outputs
#169 opened by dataparameters - 2
benchmark error
#151 opened by MichoChan - 4
Error in the example
#156 opened by chenjy11223 - 4
- 1
Speed problem of bitblas.Matmul
#122 opened by Chenglin-Yang - 3
Loading BitBlasLinear takes a lot of Time
#152 opened by MekkCyber - 27
running issues
#131 opened by brisker - 6
[Feature Request] 3-bit support
#117 opened by mobicham - 6
Potential Bitblas compat with cuda 12.5
#120 opened by Qubitium - 2
- 3
Extracting Operator or CUDA Kernel for TensorRT/ONNX Custom Op Registration
#123 opened by areddy2022 - 1
- 2
- 2
ValueError: Check failed: lanes <= 4 (8 vs. 4) : Ramp of more than 4 lanes is not allowed.
#112 opened by XA23i