Skip to content

Conversation

@yyDing1
Copy link
Collaborator

@yyDing1 yyDing1 commented Jun 20, 2025

Checklist Before Starting

  • Searched for similar PR(s).
  • Checked PR Title format
    • In format of: [modules] type: Title
    • modules are in fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • type is in feat, fix, refactor, chore, test
    • can involve multiple modules, seperated by , or space, like [megatron, fsdp, doc] feat: xxx

What does this PR do?

Support External Generative Reward Model.

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc.

High-Level Design

Demonstrate the high-level design if this PR is complex.

Specific Changes

List the specific changes.

API

Demonstrate how the API changes if any.

Usage Example

Provide usage example(s) for easier usage.

# Add code snippet or script demonstrating how to use this 

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks.
  • Add [BREAKING] to the PR title description if it breaks any API.
  • Update the documentation about your changes in the docs.
  • New CI unit test(s) are added to cover the code path.
  • Rely on existing unit tests on CI that covers the code path.

@@ -0,0 +1,70 @@
from openai import AsyncOpenAI
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add license

@@ -0,0 +1,36 @@
set -x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you show the training results and paste it to here https://github.com/volcengine/verl/blob/main/docs/algo/baseline.md

Deploy the pretrained GenRM model using vLLM. Skip this step if you want to use an external api service.

```
vllm serve dyyyyyyyy/Qwen2.5-1.5B-GenRM-QueryOnly --served-model-name genrm-demo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you show using sglang as well? Thanks.

Comment on lines 9 to 26
# MAX_RETRIES=60
# RETRY_INTERVAL=5
# for ((i=0; i<MAX_RETRIES; i++)); do
# if curl -s http://localhost:8000/v1/chat/completions > /dev/null 2>&1; then
# echo "vllm server 已启动"
# break
# fi
# if ! ps -p $VLLM_PID > /dev/null 2>&1; then
# echo "vllm server 启动失败"
# exit 1
# fi
# sleep $RETRY_INTERVAL
# done

# if [ $i -eq $MAX_RETRIES ]; then
# echo "等待 vllm server 启动超时"
# kill $VLLM_PID
# exit 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we only have english statements in the code?

pybind11
pylatexenc
ray[default]>=2.10
# ray[default]>=2.10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to comment out

torchvision
transformers
wandb
# wandb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same as line 14.

@@ -0,0 +1,3 @@

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move one-line commonly-used command to readme?

Copy link
Collaborator

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could u add a CI test in .github/wrolflows ?


python3 -m verl.trainer.main_ppo \
algorithm.adv_estimator=grpo \
data.train_files=/mnt/hdfs/resources/datasets/GSM8K-Processed/train.parquet \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use hdfs path

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@yyDing1 yyDing1 requested a review from vermouth1992 June 28, 2025 08:49
@vermouth1992 vermouth1992 merged commit 072725c into volcengine:main Jun 29, 2025
8 checks passed
@yyDing1 yyDing1 deleted the external-genrm branch July 1, 2025 07:49
oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jul 28, 2025
…dels (volcengine#2121)

### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Support External Generative Reward Model.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
Juniper1021 pushed a commit to Juniper1021/verl that referenced this pull request Aug 7, 2025
…dels (volcengine#2121)

### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Support External Generative Reward Model.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
…dels (volcengine#2121)

### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Support External Generative Reward Model.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
…dels (volcengine#2121)

### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Support External Generative Reward Model.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants