[trainer, recipe] feat: add support for external generative reward models #2121

yyDing1 · 2025-06-20T10:17:22Z

Checklist Before Starting

Searched for similar PR(s).
Checked PR Title format
- In format of: [modules] type: Title
- modules are in fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- type is in feat, fix, refactor, chore, test
- can involve multiple modules, seperated by , or space, like [megatron, fsdp, doc] feat: xxx

What does this PR do?

Support External Generative Reward Model.

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc.

High-Level Design

Demonstrate the high-level design if this PR is complex.

Specific Changes

List the specific changes.

API

Demonstrate how the API changes if any.

Usage Example

Provide usage example(s) for easier usage.

# Add code snippet or script demonstrating how to use this

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks.
Add [BREAKING] to the PR title description if it breaks any API.
Update the documentation about your changes in the docs.
New CI unit test(s) are added to cover the code path.
Rely on existing unit tests on CI that covers the code path.

vermouth1992 · 2025-06-20T10:24:54Z

recipe/api-genrm/reward_function.py

@@ -0,0 +1,70 @@
+from openai import AsyncOpenAI


Add license

vermouth1992 · 2025-06-20T10:25:45Z

recipe/api-genrm/run_func_rm.sh

@@ -0,0 +1,36 @@
+set -x


Could you show the training results and paste it to here https://github.com/volcengine/verl/blob/main/docs/algo/baseline.md

vermouth1992 · 2025-06-20T11:04:55Z

recipe/api-genrm/README.md

+Deploy the pretrained GenRM model using vLLM. Skip this step if you want to use an external api service.
+
+```
+vllm serve dyyyyyyyy/Qwen2.5-1.5B-GenRM-QueryOnly --served-model-name genrm-demo


Could you show using sglang as well? Thanks.

ccclyu · 2025-06-21T09:01:36Z

recipe/api-genrm/run_genrm_api.sh

+# MAX_RETRIES=60
+# RETRY_INTERVAL=5
+# for ((i=0; i<MAX_RETRIES; i++)); do
+#     if curl -s http://localhost:8000/v1/chat/completions > /dev/null 2>&1; then
+#         echo "vllm server 已启动"
+#         break
+#     fi
+#     if ! ps -p $VLLM_PID > /dev/null 2>&1; then
+#         echo "vllm server 启动失败"
+#         exit 1
+#     fi
+#     sleep $RETRY_INTERVAL
+# done
+
+# if [ $i -eq $MAX_RETRIES ]; then
+#     echo "等待 vllm server 启动超时"
+#     kill $VLLM_PID
+#     exit 1


can we only have english statements in the code?

ccclyu · 2025-06-21T09:02:17Z

requirements_sglang.txt

 pybind11
 pylatexenc
-ray[default]>=2.10
+# ray[default]>=2.10


no need to comment out

ccclyu · 2025-06-21T09:02:29Z

requirements_sglang.txt

 torchvision
 transformers
-wandb
+# wandb


the same as line 14.

ccclyu · 2025-06-21T09:03:37Z

recipe/api-genrm/sglang_server.sh

@@ -0,0 +1,3 @@
+


can we move one-line commonly-used command to readme?

eric-haibin-lin

Could u add a CI test in .github/wrolflows ?

vermouth1992 · 2025-06-28T06:15:47Z

recipe/genrm_remote/run_func_rm.sh

+
+python3 -m verl.trainer.main_ppo \
+    algorithm.adv_estimator=grpo \
+    data.train_files=/mnt/hdfs/resources/datasets/GSM8K-Processed/train.parquet \


Do not use hdfs path

…dels (volcengine#2121) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Support External Generative Reward Model. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.

yyDing1 and others added 11 commits June 19, 2025 12:07

add gen reward func

9e22ac7

update

0d0bb86

update

49a5f19

Merge branch 'volcengine:main' into external-genrm

2c3a728

update

5d97b20

update

cf68ada

update

8fbca4c

update

9a2e57f

Merge branch 'volcengine:main' into external-genrm

c053a49

update

a1fb0e4

update

4c722bf

vermouth1992 reviewed Jun 20, 2025

View reviewed changes

recipe/api-genrm/reward_function.py Outdated

@@ -0,0 +1,70 @@

from openai import AsyncOpenAI

Copy link

Collaborator

vermouth1992 Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add license

vermouth1992 reviewed Jun 20, 2025

View reviewed changes

yyDing1 added 4 commits June 20, 2025 20:36

add max retry

560d964

add max retry

6a1b387

fix bug

e7c626d

add sglang server

4986e2f

ccclyu reviewed Jun 21, 2025

View reviewed changes

yyDing1 added 3 commits June 22, 2025 15:01

update

d3367da

update

fc79d21

update

08e0915

eric-haibin-lin reviewed Jun 22, 2025

View reviewed changes

yyDing1 and others added 7 commits June 23, 2025 03:07

modify async to multiprocess

781510d

replace with gsm8k

e9fb990

add ci test

7b88c2a

Merge branch 'volcengine:main' into external-genrm

0484ccd

Update README.md

46b33e3

Update README.md

e823404

Update run_genrm_api.sh

1b1e439

yyDing1 added 2 commits June 23, 2025 17:24

reformat

fd04a8d

rename

8ebdc12

yyDing1 requested review from eric-haibin-lin and vermouth1992 June 24, 2025 03:30

yyDing1 and others added 13 commits June 25, 2025 14:40

fix

31b8cac

fix bug in sglang serve

fb3c93f

format

ef7cda9

Merge branch 'volcengine:main' into external-genrm

83e7a4f

fix ci

9c1461a

fix bug

2d4c1e3

Update run_genrm_remote.sh

99491bf

update ci

ebd98ba

update ci

0e0a729

update ci

4321ba9

debug remote ci qwq

c6c8216

debug remote ci qwq

1ed4a2b

completed!

3994053

vermouth1992 reviewed Jun 28, 2025

View reviewed changes

rm func_rm script

161f57f

yyDing1 requested a review from vermouth1992 June 28, 2025 08:49

vermouth1992 approved these changes Jun 29, 2025

View reviewed changes

vermouth1992 merged commit 072725c into volcengine:main Jun 29, 2025
8 checks passed

yyDing1 mentioned this pull request Jun 29, 2025

Rollout reward evaluation is serial — how to parallelize LLM-based reward computation? #2236

Open

Wangmerlyn mentioned this pull request Jun 30, 2025

CI failures: e2e_genrm_remote reward model chat template breaks with only user as input #2283

Closed

yyDing1 deleted the external-genrm branch July 1, 2025 07:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[trainer, recipe] feat: add support for external generative reward models #2121

[trainer, recipe] feat: add support for external generative reward models #2121

Uh oh!

yyDing1 commented Jun 20, 2025 •

edited

Loading

Uh oh!

vermouth1992 Jun 20, 2025

Uh oh!

vermouth1992 Jun 20, 2025

Uh oh!

vermouth1992 Jun 20, 2025

Uh oh!

ccclyu Jun 21, 2025

Uh oh!

ccclyu Jun 21, 2025

Uh oh!

ccclyu Jun 21, 2025

Uh oh!

ccclyu Jun 21, 2025

Uh oh!

eric-haibin-lin left a comment

Uh oh!

vermouth1992 Jun 28, 2025

Uh oh!

yyDing1 Jun 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[trainer, recipe] feat: add support for external generative reward models #2121

[trainer, recipe] feat: add support for external generative reward models #2121

Uh oh!

Conversation

yyDing1 commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist Before Starting

What does this PR do?

Test

High-Level Design

Specific Changes

API

Usage Example

Checklist Before Submitting

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-haibin-lin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yyDing1 commented Jun 20, 2025 •

edited

Loading