Skip to content

v0.6.1

Latest

Choose a tag to compare

@wuxibin89 wuxibin89 released this 14 Nov 02:45
d62da49

Highlights

Trainer

  • support fp16 training (FSDP/Megatron)

Megatron

  • support 1f1b_overlap/moe_a2a_overlap
  • support for Qwen3VL MoE/dense models
  • support Qwen2.5/3vl with context parallel

Rollout

  • Use vllm and sglang release image as ci base image, upgrade vllm==0.11.0, upgrade sglang==0.5.5
  • Prometheus monitoring

Algorithm

  • Rollout Correction: comprehensive overhaul of the rollout correction system with typed configuration, mathematical documentation, and performance optimizations.

Recipe

Introduce two new experimental recipes, which will be gradually merge to main in future release.

  • Fully Async Policy Trainer: fully asynchronous PPO training system that completely decouples the Trainer and Rollouter, supporting asynchronous sample generation and training.
  • TransferQueue Data System: an asynchronous streaming data management system for efficient post-training.
  • FlowRL

Importance bug fixes

  • #3861: fix missing offload parameter and optimizer to cpu when no checkpoint
  • #4097: fix missing finalize_model_grads_func in megatron model engine

What's Changed

  • [misc] feat: bump version to 0.7.0.dev by @vermouth1992 in #3772
  • [recipe] feat: Add example for gpt-oss training using agent loop by @HJSang in #3774
  • [docker] feat: update Dockerfile.rocm7 by @vickytsang in #3781
  • [doc] fix: actor_rollout_ref.critic is not correct by @HollowMan6 in #3778
  • [misc] fix: sft SFT E2E CI test failure due to megatron engine by @houminz in #3786
  • [recipe] fix: fix the gpt-oss-20b training script for agent loop recipe by @HJSang in #3793
  • [doc] chore: add agent loop get started tutorial by @wuxibin89 in #3790
  • [vllm] fix: catch exception of vllm async engine by @Yangruipis in #3789
  • [trainer] fix: batch size mismatch with n>1 when gen_max for ReMax by @HollowMan6 in #3779
  • [trainer] feat: ReMax support using reward model for baseline by @HollowMan6 in #3780
  • [megatron] feat: script of qwen3vl 235b by @ISEEKYAN in #3799
  • [trainer, recipe] feat: fully async training recipe by @ArronHZG in #2981
  • [doc] feat: update fully async experiment message by @ArronHZG in #3804
  • [worker] fix: create a new event loop if none exists when building rollouts by @ChangyWen in #3803
  • [trainer] fix: address serialization issues when using async reward function and ray ppo trainer by @benprofessionaledition in #3769
  • [megatron] fix: fix logits process error when disable pack_seqs by @HaochenYuan in #3777
  • [misc] fix: Sanitize MLFlow metric names by @pratik9891 in #3736
  • [ci] fix: Install mlflow dependency by @HollowMan6 in #3817
  • [rollout, vllm] fix: make LoRA with async vLLM work properly by @listar2000 in #3821
  • Revert "[worker] fix: create a new event loop if none exists when building rollouts" by @vermouth1992 in #3820
  • [trainer] fix: Add data.seed to config by @HollowMan6 in #3815
  • [doc] fix: update install instruction and retool readme by @chenhaiq in #3824
  • [algo] fix: remove torch.quantile-based percentile metrics to resolve tensor size limit error by @szrlee in #3810
  • [data] feat: filter out malformed data together with long prompts by @HollowMan6 in #3814
  • [worker] fix: to create a new event loop if none exists when building rollouts (a safer fix) by @ChangyWen in #3828
  • [data, trainer] feat: add support for limiting samples from dataset by @HollowMan6 in #3812
  • [model, megatron] feat: Support for Qwen3VL dense models by @HollowMan6 in #3838
  • [recipe] fix: Update the grpo training script for gpt-oss models by @HJSang in #3836
  • [recipe, rollout] feat: enable gpt-oss training for tool agent add gpt-oss for retool recipe by @HJSang in #3837
  • [data] feat: TransferQueue - An asynchronous streaming data management system by @0oshowero0 in #3649
  • [trainer, worker] feat: more flexible and easy-to-use reward model by @yyDing1 in #3679
  • [doc] fix: fix async policy message by @ArronHZG in #3845
  • [worker] fix: create a new event loop if none exists by @baymax591 in #3839
  • [misc] feat: add megatron script for open math reasoning by @vermouth1992 in #3844
  • [rollout, vllm] fix: name change for compilation level by @HollowMan6 in #3848
  • [trainer] fix: missing offload parameter and optimizer to cpu when no checkpoint by @wuxibin89 in #3861
  • [sglang] fix: make sglang wake_up/sleep work in colocate mode by @yyDing1 in #3860
  • [doc] feat: add doc for reward loop by @yyDing1 in #3851
  • [doc] misc: fix doc that penalty starts when exceeds the max_response_length - overlong_buffer.len by @bzantium in #3856
  • [recipe]fix: bugfix of Qwen3 8b/14b DAPO npu script by @acat-rw in #3858
  • [BREAKING][misc] feat: Abstract optimizer by @EduardDurech in #3656
  • [ci] feat: migrate gpu_unit_tests to volcengine by @vermouth1992 in #3872
  • [rollout] fix: Fix gpt-oss training in tool agent by @HJSang in #3865
  • [fsdp] fix : fix moe model run on full-async error by @chenjiaoAngel in #3874
  • [doc] feat: update doc of reward loop by @yyDing1 in #3880
  • [perf, data] feat: DP workload balance by @conver334 in #3605
  • [ci] fix: gsm8k interaction unit test by @wuxibin89 in #3888
  • [model] chore: deprecated legacy code for GRM by @yyDing1 in #3885
  • [recipe] fix: Qwen3-vl moe model patch by @leisuzz in #3878
  • Add PokeeResearch to README resources by @BillMatrix in #3892
  • [misc] feat: read environment for WandB entity (team) name by @BaiqingL in #3889
  • [tool] fix: remove duplicate tool initialization by @Tree-Shu-Zhao in #3893
  • [rollout] fix: incorrect value assignment while trying to access call_tool_result by @BaiqingL in #3891
  • [megatron] fix: VLMs using fused kernels by @HollowMan6 in #3849
  • [megatron] fix: mbridge load optimizer dist_ckpt by @ccilery in #3850
  • [misc] feat: fix ci break by @wuxibin89 in #3898
  • [doc, recipe] feat: update doc of rewardloop and add runnable scripts of fapo by @yyDing1 in #3900
  • [doc] chore: update installation scripts to use newer versions by @HollowMan6 in #3901
  • [recipe] fix: fix bug of tranfer queue runtime env by @baymax591 in #3904
  • [doc] fix: formatting issue for kl_ctrl and fused_kernel_options configs by @HollowMan6 in #3917
  • [recipe] fix: DAPO using KL in reward by @HollowMan6 in #3916
  • [recipe] fix: DAPO add trust_remote_code parameter to tokenizer and processor by @quancs in #3913
  • [recipe] fix: Update README with training and backend instructions by @vermouth1992 in #3929
  • [recipe] chore: use verl.utils.metric to import reduce_metrics by @HollowMan6 in #3927
  • [algo] refactor: Rollout Importance Sampling - Separate IS Weights from Rejection Sampling by @szrlee in #3915
  • [trainer, worker] feat: support loading LoRA adapters by @piood in #3523
  • [rollout, sglang] fix: correct input length check in sglang_rollout by @triston-lee in #3935
  • [rollout, vllm] fix: handle lora request when base_sync_done is false initially by @listar2000 in #3907
  • [rollout] fix: Add "non_block" argument compatibility to collective_rpc() by @kevssim in #3934
  • [megatron] chore: update mcore docs by @ISEEKYAN in #3940
  • [docker] feat: add Ascend dockerfile and image build pipeline by @songyy29 in #3485
  • [rollout] fix: Pass tool related extra fields in reward loop by @huaiyizhao in #3941
  • [docker] fix: Workaround mount-type=bind issue from scratch in some environments. by @vickytsang in #3944
  • [megatron] feat: support async training with megatron and VLM by @ISEEKYAN in #3846
  • [data] fix: Pass video metadata to vLLM and support change image_patch_size by @kaln27 in #3928
  • [ci] fix: disable docker-build-ascend from running on fork by @HollowMan6 in #3959
  • [recipe] chore: entropy removes WANDB_API_KEY in code by @HollowMan6 in #3956
  • [Megatron] feat: 1f1b overlap/moe_a2a_overlap by @ISEEKYAN in #3522
  • [ci] chore: Rename dockerfile and update e2e_ascend by @FightingZhen in #3966
  • [trainer, recipe] feat: Fully Async Policy add Rollout Importance Sampling by @ArronHZG in #3955
  • [ci, recipe] fix: add the missing key compute_prox_log_prob to fully_async_ppo_megatron_trainer.yaml by @ji-huazhong in #3979
  • [misc] fix: add compileall pre-commit hook checks and improve code quality by @HollowMan6 in #3946
  • [doc] feat: add a doc for vllm+megatron training by @techkang in #3974
  • [data] feat: passing tool_config to data by @huaiyizhao in #3950
  • [worker] fix: Add attn_implementation override support in FSDP workers by @arde171 in #3978
  • [trainer] fix: normalize sft loss by num_tokens in global batch by @wuxibin89 in #3994
  • [sglang,rollout] fix: sglang port race condition by @theely in #3977
  • [BREAKING][megatron] feat: support qwen2.5/3vl with context parallel by @ISEEKYAN in #3998
  • [ci] feat: Add weekly scheduled validation workflow for Ascend docker image by @songyy29 in #3997
  • [recipe] fix: message extension logic in tool_agent_loop.py by @NIL-zhuang in #3991
  • [tool] fix: Errors when merge Qwen3-VL-2B (FSDP) by @ieellee in #3971
  • [trainer] fix: Handle None sandbox_config in load_reward_manager by @CzsGit in #4008
  • [ci] fix: update config in docker_validate_ascend.yml by @FightingZhen in #4009
  • [sglang] fix: relocate sglang cache free logic to avoid GPU OOM by @dongju-2 in #4005
  • [docker] feat: support Ascend A3 docker image build pipeline, update related documents by @FightingZhen in #3970
  • [megatron] fix: pass video data to megatron backend by @ccilery in #4016
  • [env] feat: update docker file building schema, from VLLM base images by @ISEEKYAN in #3937
  • [rollout, vllm] fix: Fixed the issue of rollout causing OOM in ep > 1 by @echo-rain in #4007
  • [recipe] feat: add FlowRL recipe by @Xuekai-Zhu in #3924
  • [BREAKING][algo] feat: Rollout Correction for General Off-Policy Problems by @szrlee in #3984
  • [doc] feat: render LaTeX in md docs by @tongyx361 in #4061
  • [ci] fix: Remove extra pip config in Ascend dockerfile by @FightingZhen in #4059
  • [ci, docker] chore: Update Ascend dockerfile and docs by @FightingZhen in #4064
  • [algo] feat: return loss and metrics from policy_loss_fn by @tongyx361 in #4062
  • [trainer] fix: prevent ReactAgentLoop infinite recursion by @CzsGit in #4051
  • [rollout] fix: Agentloop agent.image_data bug #4050 by @DBMing in #4052
  • [rollout] fix: resolve agent loop config path in multi-node Ray training by @CzsGit in #4029
  • Clear up the deepeyes README by @willem-bd in #4076
  • [rollout,vllm] fix: custom model config pickle error when trust_remote_code=True by @wuxibin89 in #4079
  • [trainer, recipe] feat: Fully Async Policy add Rollout Importance Sampling with Megatron by @lalala-2 in #4023
  • [megatron] fix: engine alignment by @ISEEKYAN in #4097
  • [trainer, megatron, tool] fix: megatron not support memory profiler by @maijia-cwh in #4031
  • [ci,sglang] feat: update docker file building schema, from sglang base images by @ISEEKYAN in #4037
  • [recipe] fix: dynamic recursion_limit and error handling in ReactAgentLoop by @le-czs in #4102
  • [doc,algo] feat: Rollout Correction - Fix Metrics, Add Documentation, and Add Batch Normalization by @szrlee in #4070
  • [doc] chore: update instructions for enabling AMD MI3xx sleep mode by @HollowMan6 in #4108
  • [fsdp] feat: add NPU fusion kernels for Qwen2 and Qwen2.5 dense model. by @ZLiao097 in #3923
  • [doc] fix: improve docs clarity, fix IS gradient flow, and optimize memory by @szrlee in #4105
  • [megatron] fix: expose moe_aux_loss_coeff and moe_z_loss_coeff to improve MoE load balancing by @Kairosxy in #4103
  • [megatron] feat: fp16 training (dense model only) by @ISEEKYAN in #4086
  • [ci] feat: Enable python cache in ascend docker build workflow by @FightingZhen in #4113
  • [ci] chore: update pip before using it in ascend dockerfile by @FightingZhen in #4114
  • [sglang,vllm] feat: use prometheus and grafana to show rollout message by @ArronHZG in #4088
  • [worker, trainer, recipe] feat: add FP16 training and inference support by @Xuekai-Zhu in #4036
  • [megatron] fix: compatible with older megatron version for NPU CI by @ISEEKYAN in #4116
  • [ci, training_utils] fix: get_nccl_backend default to nccl by @HollowMan6 in #4117
  • [doc] fix: Propose fix a couple of typos by @jeis4wpi in #4118
  • [BREAKING][megatron] chore: set ETP default to null by @HollowMan6 in #4119
  • [misc] chore: bump version to 0.6.1 by @wuxibin89 in #4122

New Contributors

Full Changelog: v0.6.0...v0.6.1