🎉 New ground-breaking GPTQ v2
quantization option for improved model quantization accuracy validated by GSM8K_PLATINUM
benchmarks vs original gptq.
✨ New Phi4-MultiModal
model support.
✨ New Nvidia Nemotron Ultra
model support.
✨ New Dream
model support. New experimental multi-gpu quantization support. Reduced vram usage. Faster quantization.
What's Changed
- Multi GPU Quantization by @Qubitium in #1502
- experimental multi-gpu quantization by @Qubitium in #1503
- reduce allocation by @Qubitium in #1504
- revert add_ by @Qubitium in #1506
- Switch to non-deprecated mlx.core.clear_cache() by @smpanaro in #1510
- Dream Model Support by @Qubitium in #1512
- fix disabling batch/mask for dream by @Qubitium in #1514
- reduce tensor device movement by @Qubitium in #1516
- fix deepseek v3 module order by @Qubitium in #1517
- Nemotron Ultra Support by @Qubitium in #1518
- faster process_batch by @Qubitium in #1519
- Fix missing arg due to recent
Processor
api changes by @Qubitium in #1523 - Fix gpt2 columns calculation by @Qubitium in #1524
- temp damper should not overwrite damp cfg by @Qubitium in #1526
- Replace module hooking with tree-defined targeting by @Qubitium in #1527
- Fix compat with XPU by @Qubitium in #1535
- Phi4 MultiModal by @Qubitium in #1511
- disable selection of ExllamaV2 kernel for group_size=16 for now by @Qubitium in #1537
- Add Gptqv2 by @yhhhli and @Qubitium in #1533
New Contributors
Full Changelog: v2.2.0...v3.0.0