linkedin/Liger-Kernel

Efficient Triton Kernels for LLM Training

PythonBSD-2-Clause

Pinned issues

[RFC] Liger FlexChunkLoss: Alignment and Distillation loss

#371 opened 5 months ago by shivam15s

Open23

Issues

The convergence test `test_mini_models_with_logits` is failing with the latest transformers
#543 opened 13 days ago by Tcc0403
8
Model shows incorrect module names if monkey patch is applied to the model instance.
#625 opened 14 days ago by Tcc0403
0
[transformers] Support Gemma3 model family
#622 opened 16 days ago by yundai424
1
Support Gemma 3 Arch
#604 opened 25 days ago by fizzAI
4
Fine-tuned qwen2.5-7b reported a backward error
#480 opened 3 months ago by chenchen0611
2
[ROCm]: State of Liger Kernel CI on AMD on ROCm 6.3
#624 opened 16 days ago by tjtanaa
0
[transformers] support DeepSeek V3
#623 opened 16 days ago by yundai424
0
When enabling naive model parallelism using ```device_map```, the liger-kernel does not work.
#593 opened a month ago by Songjw133
2
Support DAPO Chunked loss
#620 opened 17 days ago by qingquansong
2
how can grad weights multiply grad output work?
#584 opened 20 days ago by jinchen89
2
How is the Triton Kernel Used in DPO Loss
#601 opened 20 days ago by noc-turne
2
GKD trainer + chunked JSD loss + FSDP
#615 opened 21 days ago by benyaminjami
1
Support Dynamic Tanh (DyT)
#607 opened 23 days ago by shivam15s
2
[QST] Kernel Compilation times
#600 opened a month ago by yigithanyigit
2
[bug] deepspeed zero++ multinode with liger kernel
#603 opened a month ago by SoundProvider
0
Expose `chunk_size` settings from module as well as function in chunked losses
#594 opened a month ago by shivam15s
1
[transformers][FLCE] make compatible with latest (>=4.49.0) `XXXForCausalLM.forward` APIs
#573 opened 2 months ago by yundai424
0
Megatron Support
#513 opened 3 months ago by huyiwen
2
TypeError: 'NoneType' object is not subscriptable. With trl==0.15.0 and later.
#568 opened a month ago by BenasdTW
4
For better numerical accuracy in LayerNorm
#518 opened a month ago by nhamanasu
2
cannot import name 'hip' from 'torch'
#586 opened a month ago by qgallouedec
2
[RFC] Liger FlexChunkLoss: Grouping Loss
#548 opened a month ago by austin362667
1
Support IBM Granite 3.(0, 1) models
#557 opened a month ago by JamesKunstle
1
Support the new Solar architecture
#537 opened 2 months ago by arnavgarg1
2
Incorrect import path
#565 opened 2 months ago by Jokeren
1
error when run `sh run_qwen.sh`
#487 opened 2 months ago by CharlesJhonson
3
Issue while building from source on ROCM
#538 opened 2 months ago by agunapal
4
`transformers` is meant to be an optional dep but import fails without it
#569 opened 2 months ago by tyler-romero
0
`revert_liger_kernel_to_xxx` can't revert LigerCrossEntropyLoss for transformers>=4.46.1
#542 opened 2 months ago by Tcc0403
7
RMSNorm & SwiGLU activation recomputation
#555 opened 2 months ago by huyiwen
3
Any plans to add models from the llava series?
#514 opened 3 months ago by jp1924
12
Batched Text Generation Causes Degraded Results
#544 opened 2 months ago by JC-LMCO
8
Broken links in README
#545 opened 2 months ago by Tcc0403
1
Qwen2-VL breaks with transformers version 4.47.0+: `TypeError: lce_forward() got an unexpected keyword argument 'cache_position'`
#528 opened 3 months ago by BenasdTW
1
What's the different with FlagGems
#478 opened 3 months ago by CharlesJhonson
0
Gradient checkpointing for `grad_weight` in LFCE
#533 opened 3 months ago by cassanof
4
NVIDIA CI failing due to transformers v4.48.0 refactor
#520 opened 3 months ago by Tcc0403
2
`LlamaRotaryEmbedding` Input Argument Is Inconsistent with Hugging Face
#525 opened 3 months ago by austin362667
0
`return_z_loss` is not supported for `LigerFusedLinearCrossEntropyFunction` and `LigerFusedLinearCrossEntropyLoss`
#527 opened 3 months ago by apaz-cli
0
IndexError: The shape of the mask [7387] at index 0 does not match the shape of the indexed tensor [1] at index 0
#515 opened 3 months ago by 14H034160212
2
Memory Optimization with Liger Kernel Shows Limited Effect on larger Model （more than 7B）
#517 opened 3 months ago by dyyoungg
3
Potential Optimization for Preference Training with Prefix Sharing
#476 opened 4 months ago by austin362667
0
`LigerFusedLinearCrossEntropyLoss` Causes Training Loss to Diverge After Reaching ~8
#512 opened 3 months ago by penghui-yang
7
result of LigerCrossEntropyLoss is always 0
#507 opened 3 months ago by wa008
6
Is Liger-Kernel significantly slower than torch based on benchmark?
#509 opened 3 months ago by wa008
1
Consider support liger kernel for internlm model
#505 opened 3 months ago by 14H034160212
0
Dtype Mismatch in `torch.addmm` within `ops/fused_linear_cross_entropy.py` in AMP training
#501 opened 3 months ago by DandinPower
0
LigerFusedLinearCrossEntropyFunction does not support reduction=None
#488 opened 3 months ago by Xiang-cd
1
Extending Liger-Kernel Optimizations to Encoder Models Like BER
#500 opened 3 months ago by pengzhangzhi
0
error when run kernel test
#474 opened 4 months ago by CharlesJhonson
3