microsoft/BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

PythonMIT

Pinned issues

Release Plan of BitBLAS 0.0.1

#150 opened 4 months ago by LeiWang1999

Open5

Issues

Mismatch output wth INT1 × float16
#276 opened 4 days ago by 1773226512
3
[Feature Request] Implement Split K Tilelang scheduler templates
#263 opened 9 days ago by LeiWang1999
2
[Feature Request] Default Backend Should be changed into `tilelang` instead of `tvm` before v0.0.1 release
#252 opened 11 days ago by LeiWang1999
5
[BUG] Vectorized Bias Add with AtomicAdd may lead to unknown bugs
#271 opened 11 days ago by LeiWang1999
3
[Feature Request] Enhance Database to support reload scheduled tilelang operator
#269 opened 12 days ago by LeiWang1999
0
[Feature Request] Enhance TileLang to auto inject ifthenelse for Out-of-Bounds Memory Access in TileLang
#265 opened 12 days ago by LeiWang1999
2
[Feature Request] LayoutInference pass should be enhanced to analysis vectorize factor cross indices
#266 opened 12 days ago by LeiWang1999
1
[BUG] TVM PopenPoolExecutor may have some bugs on TL Scripts
#211 opened 3 months ago by LeiWang1999
3
Example of bitblas/ladder with dtypes like int3, int5, int6, int7
#244 opened 2 months ago by yaoyaoding
4
[Feature Request] Lazy tvm tensor intrin registration is required to save import time
#256 opened 14 days ago by LeiWang1999
2
[Feature Request] Flash Attention Op should be enhanced with our Scheduler Abstraction
#264 opened 16 days ago by LeiWang1999
0
performance of float16 with fast tuning
#173 opened 4 months ago by klxy0304
9
InternalError: Check failed: func->buffer_map.size() == 0 (3 vs. 0) : This pass must be called after MakePackedAPI
#259 opened 19 days ago by Cunxiao2002
4
Does Bitblas support P40's DP4A for matrix?
#251 opened 20 days ago by sorasoras
2
Issue with integrating with AutoGPTQ
#257 opened 20 days ago by Steindox
2
[Feature Request] Lower Vectorized Loop Pass should be enhanced to adapt layout inference
#258 opened 23 days ago by LeiWang1999
2
Any example to use `float16xfp4_e2m1` matmul?
#253 opened a month ago by yaoyaoding
2
Release Plan of BitBLAS 0.0.1
#150 opened 4 months ago by LeiWang1999
5
[Feature Request] AMD HIP Backend Should be Migrate from Ladder branch
#208 opened 2 months ago by LeiWang1999
2
CUDA error: an illegal memory access was encountered when using BitBlas on multiple GPUs
#154 opened 4 months ago by mobicham
12
[Feature Request]Tail Split Required for Dynamic Symbolic
#192 opened 2 months ago by LeiWang1999
4
TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not registered
#160 opened 2 months ago by huanpengchu
5
[Feature Request][TL] Maybe need an annotation to disable part of the ast be translated into TL
#221 opened 2 months ago by LeiWang1999
3
[BUG] Dynamic symoblic may block the lowering phase of async copy
#218 opened 3 months ago by LeiWang1999
3
[Feature Request] Enhance Simplification to remove unused function arguments
#215 opened 3 months ago by LeiWang1999
3
Different outputs based on weight shape
#157 opened 3 months ago by MekkCyber
17
[Feature Request] Parallel Primitive Should be enhanced to improve the performance for irregular shapes
#209 opened 3 months ago by LeiWang1999
0
'
#206 opened 3 months ago by LeiWang1999
0
[BUG] Database or tuning conflict with Multi-GPU Environment
#204 opened 3 months ago by LeiWang1999
3
Installation Failure on CUDA 11.7
#203 opened 3 months ago by rustic-snob
2
How to achieve vLLM Row Parallelism correctly?
#186 opened 3 months ago by KeremTurgutlu
5
Matmul Output Hardware Mismatch
#187 opened 3 months ago by KeremTurgutlu
1
[Bug] Incorrect output for bitnet model
#184 opened 3 months ago by LeiWang1999
1
Segmentation fault when integrated with Ray
#181 opened 3 months ago by mobicham
4
Speed Comparison: BitLinear and nn.Linear
#118 opened 5 months ago by ZiqingChang
11
performance of float16 with fast tuning
#172 opened 4 months ago by klxy0304
0
"nan" in the outputs
#169 opened 4 months ago by dataparameters
4
benchmark error
#151 opened 4 months ago by MichoChan
2
Error in the example
#156 opened 4 months ago by chenjy11223
4
accuracy and performance of bfloat16 with bitblas linear
#161 opened 4 months ago by AbedKhateeb2
4
Speed problem of bitblas.Matmul
#122 opened 4 months ago by Chenglin-Yang
1
Loading BitBlasLinear takes a lot of Time
#152 opened 4 months ago by MekkCyber
3
running issues
#131 opened 4 months ago by brisker
27
[Feature Request] 3-bit support
#117 opened 5 months ago by mobicham
6
Potential Bitblas compat with cuda 12.5
#120 opened 4 months ago by Qubitium
6
[Discussion] Maybe we should use dynamic shared memory by default
#138 opened 4 months ago by LeiWang1999
2
Extracting Operator or CUDA Kernel for TensorRT/ONNX Custom Op Registration
#123 opened 5 months ago by areddy2022
3
Fix code scanning alert - Alert Suppression Report
#121 opened 5 months ago by LeiWang1999
1
RuntimeError: CUDA error: an illegal memory access was encountered
#116 opened 5 months ago by XA23i
2
ValueError: Check failed: lanes <= 4 (8 vs. 4) : Ramp of more than 4 lanes is not allowed.
#112 opened 5 months ago by XA23i
2