Pinned issues
Issues
- 8
The convergence test `test_mini_models_with_logits` is failing with the latest transformers
#543 opened by Tcc0403 - 0
Model shows incorrect module names if monkey patch is applied to the model instance.
#625 opened by Tcc0403 - 1
[transformers] Support Gemma3 model family
#622 opened by yundai424 - 4
Support Gemma 3 Arch
#604 opened by fizzAI - 2
Fine-tuned qwen2.5-7b reported a backward error
#480 opened by chenchen0611 - 0
[ROCm]: State of Liger Kernel CI on AMD on ROCm 6.3
#624 opened by tjtanaa - 0
[transformers] support DeepSeek V3
#623 opened by yundai424 - 2
When enabling naive model parallelism using ```device_map```, the liger-kernel does not work.
#593 opened by Songjw133 - 2
Support DAPO Chunked loss
#620 opened by qingquansong - 2
how can grad weights multiply grad output work?
#584 opened by jinchen89 - 2
How is the Triton Kernel Used in DPO Loss
#601 opened by noc-turne - 1
GKD trainer + chunked JSD loss + FSDP
#615 opened by benyaminjami - 2
Support Dynamic Tanh (DyT)
#607 opened by shivam15s - 2
[QST] Kernel Compilation times
#600 opened by yigithanyigit - 0
- 1
Expose `chunk_size` settings from module as well as function in chunked losses
#594 opened by shivam15s - 0
[transformers][FLCE] make compatible with latest (>=4.49.0) `XXXForCausalLM.forward` APIs
#573 opened by yundai424 - 2
Megatron Support
#513 opened by huyiwen - 4
TypeError: 'NoneType' object is not subscriptable. With trl==0.15.0 and later.
#568 opened by BenasdTW - 2
For better numerical accuracy in LayerNorm
#518 opened by nhamanasu - 2
cannot import name 'hip' from 'torch'
#586 opened by qgallouedec - 1
[RFC] Liger FlexChunkLoss: Grouping Loss
#548 opened by austin362667 - 1
Support IBM Granite 3.(0, 1) models
#557 opened by JamesKunstle - 2
Support the new Solar architecture
#537 opened by arnavgarg1 - 1
Incorrect import path
#565 opened by Jokeren - 3
error when run `sh run_qwen.sh`
#487 opened by CharlesJhonson - 4
Issue while building from source on ROCM
#538 opened by agunapal - 0
`transformers` is meant to be an optional dep but import fails without it
#569 opened by tyler-romero - 7
`revert_liger_kernel_to_xxx` can't revert LigerCrossEntropyLoss for transformers>=4.46.1
#542 opened by Tcc0403 - 3
RMSNorm & SwiGLU activation recomputation
#555 opened by huyiwen - 12
Any plans to add models from the llava series?
#514 opened by jp1924 - 8
Batched Text Generation Causes Degraded Results
#544 opened by JC-LMCO - 1
Broken links in README
#545 opened by Tcc0403 - 1
Qwen2-VL breaks with transformers version 4.47.0+: `TypeError: lce_forward() got an unexpected keyword argument 'cache_position'`
#528 opened by BenasdTW - 0
What's the different with FlagGems
#478 opened by CharlesJhonson - 4
Gradient checkpointing for `grad_weight` in LFCE
#533 opened by cassanof - 2
NVIDIA CI failing due to transformers v4.48.0 refactor
#520 opened by Tcc0403 - 0
- 0
`return_z_loss` is not supported for `LigerFusedLinearCrossEntropyFunction` and `LigerFusedLinearCrossEntropyLoss`
#527 opened by apaz-cli - 2
IndexError: The shape of the mask [7387] at index 0 does not match the shape of the indexed tensor [1] at index 0
#515 opened by 14H034160212 - 3
Memory Optimization with Liger Kernel Shows Limited Effect on larger Model (more than 7B)
#517 opened by dyyoungg - 0
- 7
`LigerFusedLinearCrossEntropyLoss` Causes Training Loss to Diverge After Reaching ~8
#512 opened by penghui-yang - 6
result of LigerCrossEntropyLoss is always 0
#507 opened by wa008 - 1
- 0
Consider support liger kernel for internlm model
#505 opened by 14H034160212 - 0
Dtype Mismatch in `torch.addmm` within `ops/fused_linear_cross_entropy.py` in AMP training
#501 opened by DandinPower - 1
- 0
- 3
error when run kernel test
#474 opened by CharlesJhonson