This is a practical auto-tuner on Megatron targeted at post-training frameworks like verl project.
Our performance tuning target is MFU in MCore training process on both forward-only models and forward-backward-update models, which leads to high performance on training side in RLHF.
Speaking of tuning dimensions, currently include:
| Dense Layer | TP | CP | DP | PP | VPP |
| MoE Parallel Folding | ETP | EP | EDP | ||
| Pipeline layout | |||||
| Seqlen | max_token_len | ||||
| Recompute | recompute_granularity | recompute_method | recompute_num_layers | recompute_modules |
Target shapes:
- thd (mainly)
- bshd
Go to docs directory