Releases: linkedin/Liger-Kernel
Releases · linkedin/Liger-Kernel
v0.6.2
What's Changed
- Automate Benchmarking - fixing issue. by @Manan17 in #836
- Make path variable global by @Manan17 in #840
- Adding support for apo losses, sppo_hard and nca_pair by @Manan17 in #841
- Add
accum_dtype
option forFusedLinearCrossEntropy
by @Tcc0403 in #830 - CI tests fix by @Manan17 in #847
- docs(README): fix intel ci link by @Tcc0403 in #842
- Llama4 rope implementation by @Manan17 in #843
- fix(phi3): update monkey patch for
Phi3ForCausalLM
by @Tcc0403 in #837 - feat(FLCE): expose
accum_dtype
for hf model monkey patch by @Tcc0403 in #851 - Fix ci by @Manan17 in #853
- Fix missing low-level api imports by @Kirill-Kravtsov in #856
- Add glm4.1v model support by @vvvdwbvvv in #858
- Update pyproject.toml version to 0.6.2 by @vaibhavjindal in #861
New Contributors
- @Kirill-Kravtsov made their first contribution in #856
Full Changelog: v0.6.1...v0.6.2
v0.6.1
What's Changed
- Fix gemma3 forward with skip_logits by @BitPhinix in #795
- Update README.md by @PKUWZP in #808
- Fix minor typo by @hugoabonizio in #809
- Update README.md by @PKUWZP in #811
- Fix embedding benchmarks for backward pass by @Manan17 in #799
- Giving an option to update benchmark results for previous commits. by @Manan17 in #791
- [Model] Liger support for SmolLM3 by @edbeeching in #798
- FusedAddRMSNorm: Fused residual addition and RMS Norm by @vaibhavjindal in #812
- Skip smollm3 tests in tests-bwd by @vaibhavjindal in #821
- Layernorm enhancement by @Manan17 in #815
- Update README.md by @PKUWZP in #823
- Update index.md by @PKUWZP in #824
- Remove smollm3 import at top of file by @vaibhavjindal in #825
- Fix illegal memory access in Triton RMSNorm kernel by casting program_id to int64 by @vvvdwbvvv in #804
- fix(benchmark): move chunked loss module init out of measurements by @Tcc0403 in #643
- [XPU]Fixed the issue with multiple num_warps parameters being passed in. by @YangKai0616 in #831
- Automate benchmarking - for every release by @Manan17 in #828
- Revert "Bug Fix: name patching for modules" by @vaibhavjindal in #833
- Bug fixes in patching module by @vaibhavjindal in #834
- docs(README): fix gpumode discord badge by @Tcc0403 in #835
- Update pyproject.toml version to 0.6.1 by @shimizust in #838
New Contributors
- @BitPhinix made their first contribution in #795
- @PKUWZP made their first contribution in #808
- @hugoabonizio made their first contribution in #809
- @edbeeching made their first contribution in #798
Full Changelog: v0.6.0...v0.6.1
v0.6.0: New Attention Operators, Cosine Similarity Loss, Llama 4, and VLM Patching Updates
Highlights
This release introduces significant improvements to Liger-Kernel, including new operators, support for Llama 4 models, more robust benchmarking automation, and key fixes for patching of vision-language models (VLMs) due to recent transformers refactoring.
Key Changes
New Features & Improvements
- Multi-Token Attention by @AndreSlavescu (#689)
- Fused Neighborhood Attention by @AndreSlavescu (#732)
- Cosine Similarity Loss for Distillation by @Dexterai (#780)
- Support for Llama 4 by @Manan17 (#740)
- Option to choose fused LCE/CE loss by @connermanuel (#704)
- Add block_rms_norm for QK norm by @mdy666 (#731)
Bug Fixes
- Vision-language model patching in recent transformers versions (>=4.52.0):
- RMS Norm patching by @vaibhavjindal, @BenasdTW (#741, #765)
- Hugging Face forward kwargs fix by @llllvvuu (#708)
- Fix import tanh by @jue-jue-zi (#762)
- Apply monkey patch to instances by @YangKai0616 (#772)
Documentation & CI Fixes
- Deploy MkDocs to GitHub Pages by @ParagEkbote (#724)
- Robust doc updates by @ParagEkbote (#726, #727)
- .idea ignored by @Tcc0403 (#784)
- ReadMe, MTA + softmax docs by @AndreSlavescu (#730)
- Relax DyT tol, XPU skip MTA by @Tcc0403 (#778)
- Paligemma test fixes by @vvvdwbvvv (#785)
- Style & test fixes by @Tcc0403, @vaibhavjindal (#736, #794)
- Add torchvision for multimodal test by @Tcc0403 (#755)
Benchmarking & Automation
- Automated benchmarking and visualization UI in GitHub pages by @Manan17 (#744, #747, #749, #752, #753, #756, #759, #760, #770, #779)
New Contributors
- @connermanuel made their first contribution in #704
- @llllvvuu made their first contribution in #708
- @jue-jue-zi made their first contribution in #762
- @YangKai0616 made their first contribution in #772
- @Dexterai made their first contribution in #780
- @vvvdwbvvv made their first contribution in #785
Full Changelog: v0.5.10...v0.6.0
v0.5.10: Qwen3 MOE support, Sparsemax kernel, bug fixes
What's Changed
- fix zip bug by @KareemMusleh in #702
- [dpo] set default average_log_prob to False by @cyr0930 in #693
- Rank build status lower by @momochen in #707
- Add support for Qwen3 MoE models by @chiwanpark in #706
- Fix qwen3_moe flaky convergence test by @vaibhavjindal in #710
- Fix empty Medusa head tensors by @chiwanpark in #698
- Sparsemax by @AndreSlavescu in #687
- fix: remove docstring imports in transformer patches by @NanoCode012 in #712
- Increase tests timeout to 45 mins by @vaibhavjindal in #718
- fix modal tests by @shivam15s in #719
- Visualizer Update by @AndreSlavescu in #717
- Sparsemax Documentation by @AndreSlavescu in #716
- element-wise-DyT faster than the origin LigerDyT by @mdy666 in #673
- GRPO Loss kernel fully write by triton, reduce 46G memory by @mdy666 in #672
- Make FLCE compatible with FSDP and PEFT by @astefanutti in #674
- Fix incorrect module patching when using LoRA with modules_to_save by @BenasdTW in #632
- [XPU] Changed how XPU discovery works during
setup.py
by @Egor-Krivov in #720 - Fix to publish docs on pushes to main branch by @shimizust in #722
- Release 0.5.10 by @shimizust in #725
New Contributors
- @KareemMusleh made their first contribution in #702
- @cyr0930 made their first contribution in #693
- @NanoCode012 made their first contribution in #712
- @mdy666 made their first contribution in #673
- @astefanutti made their first contribution in #674
- @Egor-Krivov made their first contribution in #720
Full Changelog: v0.5.9...v0.5.10
v0.5.9: Adds XPU Setup, GLM-4 & Qwen3 Model Support, Key Bugfixes
What's Changed
- update setup.py for installation on xpu by @faaany in #668
- update XPU CI yaml file to use docker container by @faaany in #669
- Add average_log_prob as an init param for LigerFusedLinearDPOLoss by @vaibhavjindal in #676
- add shift label change by @shivam15s in #683
- remove tests that can pass on XPU by @faaany in #686
- Update mkdocs.yml by @shivam15s in #691
- Fix LigerCrossEntropy reduction='none' by @Tcc0403 in #680
- Support GLM-4 models by @intervitens in #685
- Import glm4_lce_forward locally in function by @vaibhavjindal in #695
- Qwen3 model support by @vaibhavjindal in #692
- Use logits_to_keep logic for training runs by @vaibhavjindal in #696
- increase gemma3 multimodal convergence test loss atol by @shivam15s in #697
- Update pyproject.toml by @shivam15s in #700
New Contributors
- @intervitens made their first contribution in #685
Full Changelog: v0.5.8...v0.5.9
v0.5.8: Backward-Compatible Fix
What's Changed
- backward compatible initialization by @shivam15s in #666
- Update pyproject.toml by @shivam15s in #667
Full Changelog: v0.5.7...v0.5.8
v0.5.7: Gemma3 Support, XPU Tuning Enhancements, GRPO Improvements, and API Compatibility Fixes
What's Changed
- Gemma3 (Text and Multimodal) by @eljandoubi in #621
- Make FLCE compatible with latest
XXXForCausalLM.forward()
APIs by @Tcc0403 in #596 - do bias addition in tests in float32 to make testing code similar to torch compile by @shivam15s in #655
- [CI] fix siglip dummy config by @yundai424 in #658
- add XPU tuning to JSD by @rmukhopa in #649
- add XPU tuning to Rmsnorm and Layernorm by @Tarakarevu1 in #653
- Fix imports without transformers by @vaibhavjindal in #659
- Use TYPE_CHECKING to fix static-only imports in IDEs etc by @vaibhavjindal in #660
- [kl_div] Modified block and warp sizes for improved performance by @jgtong in #654
- [GRPO] add support for different loss types by @kashif in #662
- Remove unexpected kwargs passing to flce by @Tcc0403 in #651
- reduce number of tests for grpo by @shivam15s in #663
- Update pyproject.toml by @shivam15s in #665
New Contributors
- @rmukhopa made their first contribution in #649
- @Tarakarevu1 made their first contribution in #653
- @jgtong made their first contribution in #654
Full Changelog: v0.5.6...v0.5.7
v0.5.6: Enhancements, Fixes, and Expanded Support (Paligemma, DyT, XPU, Llava, GRPO, and More!)
What's Changed
- [JSD] JSD fixes by @kashif in #609
- Paligemma support by @eljandoubi in #608
- Fix hidden size by @eljandoubi in #612
- Add loss_utils for rewriting lce_forward methods by @Tcc0403 in #614
- Update Star History URL by @ryankert01 in #616
- Update README.md by @shivam15s in #617
- language model of paligemma 1 is gemma 1. by @eljandoubi in #613
- Update README to reflect recent changes by @helloworld1 in #619
- Support Dynamic Tanh (DyT) by @Tcc0403 in #618
- Fix incorrect module name when monkey_patch applied to instantiated model by @vaibhavjindal in #629
- [chunked loss] align teacher and student logit shape by @yundai424 in #634
- Fix incorrect condition comment in log_target calculation by @p81sunshine in #633
- Add huggingface llava by @jp1924 in #524
- fix Llava test-bwd failure by @jp1924 in #639
- Fix GRPO to conform with TRL: Fix loss, make tests accurate, correct metrics computation by @shivam15s and @mRSun15 in #628
- add xpu tuning to CE by @mgrabban in #645
- add xpu tuning to FLJSD by @mgrabban in #647
- Change tests to use rocm 6.3 version and tol changes to make liger run on amd by @shivam15s in #646
- Update pyproject.toml by @shivam15s in #648
New Contributors
- @eljandoubi made their first contribution in #608
- @p81sunshine made their first contribution in #633
Full Changelog: v0.5.5...v0.5.6
v0.5.5: Chunk size fixes for JSD; KTO speed fixes; better metrics tests
What's Changed
- Infer correct device for AMD HIP device by @helloworld1 in #587
- add out of bounds check to cross entropy by @shivam15s in #588
- Monkeypatch for Qwen2.5-VL by @BenasdTW in #552
- KTO changes to return aux outputs by @vaibhavjindal in #589
- [KTO] Only return summed metrics by @vaibhavjindal in #591
- increase chunk size for distillation and add bias to jsd by @shivam15s in #590
- [CI] Add ROCm 6.3 CI by @tjtanaa in #506
- Fix KTO speed issue by @vaibhavjindal in #592
- Compare means of aggregated outputs in KTO tests by @vaibhavjindal in #595
- Fix means of logps and rewards by @vaibhavjindal in #597
- Add chunk_size param to chunked losses by @RichhLi in #599
- Fix DPO/ORPO typo in readme by @tyler-romero in #602
- version bump by @shivam15s in #605
New Contributors
Full Changelog: v0.5.4...v0.5.5
v0.5.4: Granite 3.0 & 3.1, OLMo2, GRPO, TVD loss, and minor fixes
What's Changed
- add GitHub CI for Intel GPU by @faaany in #536
- Add Intel GPU CI to README.md by @hebiao064 in #562
- test split to 16, 32 by @jp1924 in #564
- Clean up workaround introduced in PR #564 by @austin362667 in #566
- Update README.md by @momochen in #567
- Grpo loss by @kashif in #553
- Update Readme with ROCM installation instruction by @zcnrex in #570
- fix qwen2vl and mllama test to pass failing tests by @shivam15s in #571
- KTO: Minor fix and documentation update by @vaibhavjindal in #574
- Add TVD Loss Kernel by @saurabhkoshatwar in #324
- Add KTO Benchmark Data into README by @hebiao064 in #575
- Support Granite 3.0 and 3.1 models by @JamesKunstle in #558
- Improve Hugging Face SFT Script by @ParagEkbote in #539
- Add unit tests for shared prefix masked attention with
torch.FlexAttention
by @austin362667 in #504 - update project readme to include Granite support by @JamesKunstle in #576
- Revert "Improve Hugging Face SFT Script (#539)" and Fix TVD Test for Intel #580 by @shivam15s in #578
- Fix Rope Test by @hebiao064 in #577
- Fix layer norm kernels by @lancerts in #582
- Add OLMO2 model support by @yundai424 in #581
- bump version to 0.5.4 by @yundai424 in #585
New Contributors
- @jp1924 made their first contribution in #564
- @zcnrex made their first contribution in #570
- @vaibhavjindal made their first contribution in #574
- @saurabhkoshatwar made their first contribution in #324
- @JamesKunstle made their first contribution in #558
Full Changelog: v0.5.3...v0.5.4