feat(resource-strategy-fit): add per-Pod scoring strategy #4641

kingeasternsun · 2025-09-25T08:25:52Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Currently, the resource-strategy-fit plugin only supports a global scoring-type configuration, which is insufficient for heterogeneous workloads.

This PR adds support for per-Pod level scoring configuration, so that different types of workloads can be scored according to their primary resource requirements:

GPU Pods → scored only by GPU resources.
CPU Pods → scored only by CPU resources.

This enhancement improves scheduling accuracy and resource utilization, making the plugin more suitable for real-world mixed workloads.

Which issue(s) this PR fixes:

Fixes #4640

Special notes for your reviewer:

The design introduces per-Pod scoring by allowing Pods to specify their scoring resource type.
Backward compatibility is preserved: if no per-Pod scoring config is specified, the global scoring-type is still applied.

Does this PR introduce a user-facing change?

The `resource-strategy-fit` plugin now supports per-Pod level scoring configuration.
- GPU Pods are scored by GPU resources.
- CPU Pods are scored by CPU resources.
- Global scoring-type config remains as the default if no Pod-level config is provided.

gemini-code-assist · 2025-09-25T08:26:09Z

Summary of Changes

Hello @kingeasternsun, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the resource-strategy-fit scheduler plugin by introducing the capability for per-Pod level scoring configurations. This allows for more intelligent and tailored scheduling decisions for diverse workloads, such as those heavily reliant on GPUs or CPUs, by enabling pods to specify their preferred resource scoring strategies through annotations. The change aims to boost scheduling accuracy and overall resource utilization in heterogeneous environments, all while ensuring seamless integration with existing global configurations.

Highlights

Per-Pod Scoring Configuration: The resource-strategy-fit plugin now supports defining scoring configurations at the individual Pod level, moving beyond the previous global-only setting.
Heterogeneous Workload Optimization: This enhancement allows specific workloads, such as GPU-intensive or CPU-intensive Pods, to be scored based on their primary resource requirements, improving scheduling accuracy and resource utilization.
Backward Compatibility: The plugin maintains backward compatibility; if no per-Pod scoring configuration is specified, the existing global scoring-type will still be applied.
Annotation-Driven Configuration: Per-Pod scoring types and resource weights are configured via new Pod annotations: volcano.sh/resource-strategy-scoring-type and volcano.sh/resource-strategy-weight.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable feature by enabling per-Pod level scoring configurations in the resource-strategy-fit plugin. This allows for more granular control over scheduling, catering to heterogeneous workloads. The implementation is sound, maintaining backward compatibility with the global configuration. The test suite has been significantly improved with the addition of helper functions and comprehensive test cases for the new functionality, which is a great enhancement for maintainability. I have one suggestion to improve the error handling for misconfigurations to make it more user-friendly.

pkg/scheduler/plugins/resource-strategy-fit/resource_strategy_fit.go

docs/user-guide/how_to_use_resource_strategy_fit_plugin.md

hwdef · 2025-09-26T04:47:42Z

LGTM

I have a small suggestion(maybe in another PR): could we add some fields for Volcano Job,
e.g. job.spec.tasks[].resourceStrategyScoringType and job.spec.tasks[].resourceStrategyWeight?
This way you don't need to use pod.annotations.

Benefits:

Better performance (no per-pod JSON unmarshal).
Improved readability and VolcanoJob usability.

Signed-off-by: wangdongyang1 <[email protected]>

Monokaix · 2025-09-26T09:05:28Z

/approve

volcano-sh-bot · 2025-09-26T09:05:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Monokaix

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Monokaix]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

JesseStutler · 2025-09-28T02:51:21Z

docs/user-guide/how_to_use_resource_strategy_fit_plugin.md

I think the user guide of how to use resource strategy fit plugin should also contain the sra and proportional strategy: #4454

@XbaoWu Could you help add sra and propotional user guide to this doc? We can merge this pr first. Or if we already have user guide in design doc, we can add a ref link

JesseStutler

/lgtm
Thanks!

kingeasternsun · 2025-09-28T03:15:12Z

LGTM

I have a small suggestion(maybe in another PR): could we add some fields for Volcano Job, e.g. job.spec.tasks[].resourceStrategyScoringType and job.spec.tasks[].resourceStrategyWeight? This way you don't need to use pod.annotations.

Benefits:
1. Better performance (no per-pod JSON unmarshal).

2. Improved readability and VolcanoJob usability.

But it does not support jobs like Deployments, we could add these fields for Volcano Job, and volcano controller add these annotations to pod according these job fields.

volcano-sh-bot requested review from hudson741 and william-wang September 25, 2025 08:25

volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 25, 2025

kingeasternsun changed the title ~~pod level resource fit~~ feat(resource-strategy-fit): add per-Pod scoring strategy Sep 25, 2025

gemini-code-assist bot reviewed Sep 25, 2025

View reviewed changes

pkg/scheduler/plugins/resource-strategy-fit/resource_strategy_fit.go Show resolved Hide resolved

volcano-sh-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 25, 2025

kingeasternsun force-pushed the improve/pod-level-resource-strategy-policy branch from 852d46f to d69cf67 Compare September 25, 2025 08:43

volcano-sh-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 25, 2025

Monokaix reviewed Sep 25, 2025

View reviewed changes

docs/user-guide/how_to_use_resource_strategy_fit_plugin.md Show resolved Hide resolved

pod level resource fit

ecd2bee

Signed-off-by: wangdongyang1 <[email protected]>

kingeasternsun force-pushed the improve/pod-level-resource-strategy-policy branch from d69cf67 to ecd2bee Compare September 26, 2025 08:57

volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 26, 2025

JesseStutler reviewed Sep 28, 2025

View reviewed changes

volcano-sh-bot assigned JesseStutler Sep 28, 2025

volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 28, 2025

volcano-sh-bot merged commit 57d871a into volcano-sh:master Sep 28, 2025
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(resource-strategy-fit): add per-Pod scoring strategy #4641

feat(resource-strategy-fit): add per-Pod scoring strategy #4641

Uh oh!

kingeasternsun commented Sep 25, 2025 •

edited by Monokaix

Loading

Uh oh!

gemini-code-assist bot commented Sep 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

hwdef commented Sep 26, 2025

Uh oh!

Monokaix commented Sep 26, 2025

Uh oh!

volcano-sh-bot commented Sep 26, 2025

Uh oh!

JesseStutler Sep 28, 2025

Uh oh!

JesseStutler Sep 28, 2025

Uh oh!

JesseStutler left a comment

Uh oh!

Uh oh!

kingeasternsun commented Sep 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat(resource-strategy-fit): add per-Pod scoring strategy #4641

feat(resource-strategy-fit): add per-Pod scoring strategy #4641

Uh oh!

Conversation

kingeasternsun commented Sep 25, 2025 • edited by Monokaix Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

gemini-code-assist bot commented Sep 25, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

hwdef commented Sep 26, 2025

Uh oh!

Monokaix commented Sep 26, 2025

Uh oh!

volcano-sh-bot commented Sep 26, 2025

Uh oh!

JesseStutler Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

JesseStutler Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

JesseStutler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kingeasternsun commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kingeasternsun commented Sep 25, 2025 •

edited by Monokaix

Loading

kingeasternsun commented Sep 28, 2025 •

edited

Loading