Releases · linkedin/Liger-Kernel

22 Aug 00:15

vaibhavjindal

v0.6.2

77a4c1a

v0.6.2 Latest

Latest

What's Changed

Automate Benchmarking - fixing issue. by @Manan17 in #836
Make path variable global by @Manan17 in #840
Adding support for apo losses, sppo_hard and nca_pair by @Manan17 in #841
Add accum_dtype option for FusedLinearCrossEntropy by @Tcc0403 in #830
CI tests fix by @Manan17 in #847
docs(README): fix intel ci link by @Tcc0403 in #842
Llama4 rope implementation by @Manan17 in #843
fix(phi3): update monkey patch for Phi3ForCausalLM by @Tcc0403 in #837
feat(FLCE): expose accum_dtype for hf model monkey patch by @Tcc0403 in #851
Fix ci by @Manan17 in #853
Fix missing low-level api imports by @Kirill-Kravtsov in #856
Add glm4.1v model support by @vvvdwbvvv in #858
Update pyproject.toml version to 0.6.2 by @vaibhavjindal in #861

New Contributors

@Kirill-Kravtsov made their first contribution in #856

Full Changelog: v0.6.1...v0.6.2

Contributors

Kirill-Kravtsov, vaibhavjindal, and 3 other contributors

Assets 2

28 Jul 18:36

shimizust

v0.6.1

7705dcc

v0.6.1

What's Changed

Fix gemma3 forward with skip_logits by @BitPhinix in #795
Update README.md by @PKUWZP in #808
Fix minor typo by @hugoabonizio in #809
Update README.md by @PKUWZP in #811
Fix embedding benchmarks for backward pass by @Manan17 in #799
Giving an option to update benchmark results for previous commits. by @Manan17 in #791
[Model] Liger support for SmolLM3 by @edbeeching in #798
FusedAddRMSNorm: Fused residual addition and RMS Norm by @vaibhavjindal in #812
Skip smollm3 tests in tests-bwd by @vaibhavjindal in #821
Layernorm enhancement by @Manan17 in #815
Update README.md by @PKUWZP in #823
Update index.md by @PKUWZP in #824
Remove smollm3 import at top of file by @vaibhavjindal in #825
Fix illegal memory access in Triton RMSNorm kernel by casting program_id to int64 by @vvvdwbvvv in #804
fix(benchmark): move chunked loss module init out of measurements by @Tcc0403 in #643
[XPU]Fixed the issue with multiple num_warps parameters being passed in. by @YangKai0616 in #831
Automate benchmarking - for every release by @Manan17 in #828
Revert "Bug Fix: name patching for modules" by @vaibhavjindal in #833
Bug fixes in patching module by @vaibhavjindal in #834
docs(README): fix gpumode discord badge by @Tcc0403 in #835
Update pyproject.toml version to 0.6.1 by @shimizust in #838

New Contributors

@BitPhinix made their first contribution in #795
@PKUWZP made their first contribution in #808
@hugoabonizio made their first contribution in #809
@edbeeching made their first contribution in #798

Full Changelog: v0.6.0...v0.6.1

Contributors

hugoabonizio, edbeeching, and 8 other contributors

Assets 2

09 Jul 05:05

shimizust

v0.6.0

66570b1

v0.6.0: New Attention Operators, Cosine Similarity Loss, Llama 4, and VLM Patching Updates

Highlights

This release introduces significant improvements to Liger-Kernel, including new operators, support for Llama 4 models, more robust benchmarking automation, and key fixes for patching of vision-language models (VLMs) due to recent transformers refactoring.

Key Changes

New Features & Improvements

Multi-Token Attention by @AndreSlavescu (#689)
Fused Neighborhood Attention by @AndreSlavescu (#732)
Cosine Similarity Loss for Distillation by @Dexterai (#780)
Support for Llama 4 by @Manan17 (#740)
Option to choose fused LCE/CE loss by @connermanuel (#704)
Add block_rms_norm for QK norm by @mdy666 (#731)

Bug Fixes

Vision-language model patching in recent transformers versions (>=4.52.0):
- Qwen2vl, Qwen2_5_vl by @Tcc0403 (#728, #738)
- Llava by @Tcc0403, @Manan17 (#714, #743, #751)
- Gemma3 by @shimizust, @Tcc0403 (#735, #787, #790);
RMS Norm patching by @vaibhavjindal, @BenasdTW (#741, #765)
Hugging Face forward kwargs fix by @llllvvuu (#708)
Fix import tanh by @jue-jue-zi (#762)
Apply monkey patch to instances by @YangKai0616 (#772)

Documentation & CI Fixes

Deploy MkDocs to GitHub Pages by @ParagEkbote (#724)
Robust doc updates by @ParagEkbote (#726, #727)
.idea ignored by @Tcc0403 (#784)
ReadMe, MTA + softmax docs by @AndreSlavescu (#730)
Relax DyT tol, XPU skip MTA by @Tcc0403 (#778)
Paligemma test fixes by @vvvdwbvvv (#785)
Style & test fixes by @Tcc0403, @vaibhavjindal (#736, #794)
Add torchvision for multimodal test by @Tcc0403 (#755)

Benchmarking & Automation

Automated benchmarking and visualization UI in GitHub pages by @Manan17 (#744, #747, #749, #752, #753, #756, #759, #760, #770, #779)

New Contributors

@connermanuel made their first contribution in #704
@llllvvuu made their first contribution in #708
@jue-jue-zi made their first contribution in #762
@YangKai0616 made their first contribution in #772
@Dexterai made their first contribution in #780
@vvvdwbvvv made their first contribution in #785

Full Changelog: v0.5.10...v0.6.0

Contributors

llllvvuu, jue-jue-zi, and 12 other contributors

Assets 2

22 May 17:52

shimizust

v0.5.10

44a8f2f

v0.5.10: Qwen3 MOE support, Sparsemax kernel, bug fixes

What's Changed

fix zip bug by @KareemMusleh in #702
[dpo] set default average_log_prob to False by @cyr0930 in #693
Rank build status lower by @momochen in #707
Add support for Qwen3 MoE models by @chiwanpark in #706
Fix qwen3_moe flaky convergence test by @vaibhavjindal in #710
Fix empty Medusa head tensors by @chiwanpark in #698
Sparsemax by @AndreSlavescu in #687
fix: remove docstring imports in transformer patches by @NanoCode012 in #712
Increase tests timeout to 45 mins by @vaibhavjindal in #718
fix modal tests by @shivam15s in #719
Visualizer Update by @AndreSlavescu in #717
Sparsemax Documentation by @AndreSlavescu in #716
element-wise-DyT faster than the origin LigerDyT by @mdy666 in #673
GRPO Loss kernel fully write by triton, reduce 46G memory by @mdy666 in #672
Make FLCE compatible with FSDP and PEFT by @astefanutti in #674
Fix incorrect module patching when using LoRA with modules_to_save by @BenasdTW in #632
[XPU] Changed how XPU discovery works during setup.py by @Egor-Krivov in #720
Fix to publish docs on pushes to main branch by @shimizust in #722
Release 0.5.10 by @shimizust in #725

New Contributors

@KareemMusleh made their first contribution in #702
@cyr0930 made their first contribution in #693
@NanoCode012 made their first contribution in #712
@mdy666 made their first contribution in #673
@astefanutti made their first contribution in #674
@Egor-Krivov made their first contribution in #720

Full Changelog: v0.5.9...v0.5.10

Contributors

astefanutti, momochen, and 11 other contributors

Assets 2

04 May 19:47

shivam15s

v0.5.9

f19068f

v0.5.9: Adds XPU Setup, GLM-4 & Qwen3 Model Support, Key Bugfixes

What's Changed

update setup.py for installation on xpu by @faaany in #668
update XPU CI yaml file to use docker container by @faaany in #669
Add average_log_prob as an init param for LigerFusedLinearDPOLoss by @vaibhavjindal in #676
add shift label change by @shivam15s in #683
remove tests that can pass on XPU by @faaany in #686
Update mkdocs.yml by @shivam15s in #691
Fix LigerCrossEntropy reduction='none' by @Tcc0403 in #680
Support GLM-4 models by @intervitens in #685
Import glm4_lce_forward locally in function by @vaibhavjindal in #695
Qwen3 model support by @vaibhavjindal in #692
Use logits_to_keep logic for training runs by @vaibhavjindal in #696
increase gemma3 multimodal convergence test loss atol by @shivam15s in #697
Update pyproject.toml by @shivam15s in #700

New Contributors

@intervitens made their first contribution in #685

Full Changelog: v0.5.8...v0.5.9

Contributors

faaany, vaibhavjindal, and 3 other contributors

Assets 2

12 Apr 16:44

shivam15s

v0.5.8

43d0ac1

v0.5.8: Backward-Compatible Fix

What's Changed

backward compatible initialization by @shivam15s in #666
Update pyproject.toml by @shivam15s in #667

Full Changelog: v0.5.7...v0.5.8

Contributors

shivam15s

Assets 2

12 Apr 00:49

shivam15s

v0.5.7

cdd8e74

v0.5.7: Gemma3 Support, XPU Tuning Enhancements, GRPO Improvements, and API Compatibility Fixes

What's Changed

Gemma3 (Text and Multimodal) by @eljandoubi in #621
Make FLCE compatible with latest XXXForCausalLM.forward() APIs by @Tcc0403 in #596
do bias addition in tests in float32 to make testing code similar to torch compile by @shivam15s in #655
[CI] fix siglip dummy config by @yundai424 in #658
add XPU tuning to JSD by @rmukhopa in #649
add XPU tuning to Rmsnorm and Layernorm by @Tarakarevu1 in #653
Fix imports without transformers by @vaibhavjindal in #659
Use TYPE_CHECKING to fix static-only imports in IDEs etc by @vaibhavjindal in #660
[kl_div] Modified block and warp sizes for improved performance by @jgtong in #654
[GRPO] add support for different loss types by @kashif in #662
Remove unexpected kwargs passing to flce by @Tcc0403 in #651
reduce number of tests for grpo by @shivam15s in #663
Update pyproject.toml by @shivam15s in #665

New Contributors

@rmukhopa made their first contribution in #649
@Tarakarevu1 made their first contribution in #653
@jgtong made their first contribution in #654

Full Changelog: v0.5.6...v0.5.7

Contributors

kashif, vaibhavjindal, and 6 other contributors

Assets 2

02 Apr 21:55

shivam15s

v0.5.6

c3c2d4f

v0.5.6: Enhancements, Fixes, and Expanded Support (Paligemma, DyT, XPU, Llava, GRPO, and More!)

What's Changed

[JSD] JSD fixes by @kashif in #609
Paligemma support by @eljandoubi in #608
Fix hidden size by @eljandoubi in #612
Add loss_utils for rewriting lce_forward methods by @Tcc0403 in #614
Update Star History URL by @ryankert01 in #616
Update README.md by @shivam15s in #617
language model of paligemma 1 is gemma 1. by @eljandoubi in #613
Update README to reflect recent changes by @helloworld1 in #619
Support Dynamic Tanh (DyT) by @Tcc0403 in #618
Fix incorrect module name when monkey_patch applied to instantiated model by @vaibhavjindal in #629
[chunked loss] align teacher and student logit shape by @yundai424 in #634
Fix incorrect condition comment in log_target calculation by @p81sunshine in #633
Add huggingface llava by @jp1924 in #524
fix Llava test-bwd failure by @jp1924 in #639
Fix GRPO to conform with TRL: Fix loss, make tests accurate, correct metrics computation by @shivam15s and @mRSun15 in #628
add xpu tuning to CE by @mgrabban in #645
add xpu tuning to FLJSD by @mgrabban in #647
Change tests to use rocm 6.3 version and tol changes to make liger run on amd by @shivam15s in #646
Update pyproject.toml by @shivam15s in #648

New Contributors

@eljandoubi made their first contribution in #608
@p81sunshine made their first contribution in #633

Full Changelog: v0.5.5...v0.5.6

Contributors

kashif, helloworld1, and 10 other contributors

Assets 2

14 Mar 00:27

shivam15s

v0.5.5

a6dc70d

v0.5.5: Chunk size fixes for JSD; KTO speed fixes; better metrics tests

What's Changed

Infer correct device for AMD HIP device by @helloworld1 in #587
add out of bounds check to cross entropy by @shivam15s in #588
Monkeypatch for Qwen2.5-VL by @BenasdTW in #552
KTO changes to return aux outputs by @vaibhavjindal in #589
[KTO] Only return summed metrics by @vaibhavjindal in #591
increase chunk size for distillation and add bias to jsd by @shivam15s in #590
[CI] Add ROCm 6.3 CI by @tjtanaa in #506
Fix KTO speed issue by @vaibhavjindal in #592
Compare means of aggregated outputs in KTO tests by @vaibhavjindal in #595
Fix means of logps and rewards by @vaibhavjindal in #597
Add chunk_size param to chunked losses by @RichhLi in #599
Fix DPO/ORPO typo in readme by @tyler-romero in #602
version bump by @shivam15s in #605

New Contributors

@RichhLi made their first contribution in #599

Full Changelog: v0.5.4...v0.5.5

Contributors

helloworld1, tyler-romero, and 5 other contributors

Assets 2

24 Feb 21:59

yundai424

v0.5.4

911db5d

v0.5.4: Granite 3.0 & 3.1, OLMo2, GRPO, TVD loss, and minor fixes

What's Changed

add GitHub CI for Intel GPU by @faaany in #536
Add Intel GPU CI to README.md by @hebiao064 in #562
test split to 16, 32 by @jp1924 in #564
Clean up workaround introduced in PR #564 by @austin362667 in #566
Update README.md by @momochen in #567
Grpo loss by @kashif in #553
Update Readme with ROCM installation instruction by @zcnrex in #570
fix qwen2vl and mllama test to pass failing tests by @shivam15s in #571
KTO: Minor fix and documentation update by @vaibhavjindal in #574
Add TVD Loss Kernel by @saurabhkoshatwar in #324
Add KTO Benchmark Data into README by @hebiao064 in #575
Support Granite 3.0 and 3.1 models by @JamesKunstle in #558
Improve Hugging Face SFT Script by @ParagEkbote in #539
Add unit tests for shared prefix masked attention with torch.FlexAttention by @austin362667 in #504
update project readme to include Granite support by @JamesKunstle in #576
Revert "Improve Hugging Face SFT Script (#539)" and Fix TVD Test for Intel #580 by @shivam15s in #578
Fix Rope Test by @hebiao064 in #577
Fix layer norm kernels by @lancerts in #582
Add OLMO2 model support by @yundai424 in #581
bump version to 0.5.4 by @yundai424 in #585

New Contributors

@jp1924 made their first contribution in #564
@zcnrex made their first contribution in #570
@vaibhavjindal made their first contribution in #574
@saurabhkoshatwar made their first contribution in #324
@JamesKunstle made their first contribution in #558

Full Changelog: v0.5.3...v0.5.4

Contributors

kashif, momochen, and 12 other contributors

Assets 2

Releases: linkedin/Liger-Kernel

v0.6.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.0: New Attention Operators, Cosine Similarity Loss, Llama 4, and VLM Patching Updates

Highlights

Key Changes

New Features & Improvements

Bug Fixes

Documentation & CI Fixes

Benchmarking & Automation

New Contributors

Contributors

Uh oh!

v0.5.10: Qwen3 MOE support, Sparsemax kernel, bug fixes

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.9: Adds XPU Setup, GLM-4 & Qwen3 Model Support, Key Bugfixes

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.8: Backward-Compatible Fix

What's Changed

Contributors

Uh oh!

v0.5.7: Gemma3 Support, XPU Tuning Enhancements, GRPO Improvements, and API Compatibility Fixes

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.6: Enhancements, Fixes, and Expanded Support (Paligemma, DyT, XPU, Llava, GRPO, and More!)

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.5: Chunk size fixes for JSD; KTO speed fixes; better metrics tests

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.4: Granite 3.0 & 3.1, OLMo2, GRPO, TVD loss, and minor fixes

What's Changed

New Contributors

Contributors

Uh oh!