Skip to content

Conversation

@kingeasternsun
Copy link
Contributor

@kingeasternsun kingeasternsun commented Sep 25, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Currently, the resource-strategy-fit plugin only supports a global scoring-type configuration, which is insufficient for heterogeneous workloads.

This PR adds support for per-Pod level scoring configuration, so that different types of workloads can be scored according to their primary resource requirements:

  • GPU Pods → scored only by GPU resources.
  • CPU Pods → scored only by CPU resources.

This enhancement improves scheduling accuracy and resource utilization, making the plugin more suitable for real-world mixed workloads.

Which issue(s) this PR fixes:

Fixes #4640

Special notes for your reviewer:

  • The design introduces per-Pod scoring by allowing Pods to specify their scoring resource type.
  • Backward compatibility is preserved: if no per-Pod scoring config is specified, the global scoring-type is still applied.

Does this PR introduce a user-facing change?

The `resource-strategy-fit` plugin now supports per-Pod level scoring configuration.
- GPU Pods are scored by GPU resources.
- CPU Pods are scored by CPU resources.
- Global scoring-type config remains as the default if no Pod-level config is provided.

@gemini-code-assist
Copy link

Summary of Changes

Hello @kingeasternsun, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the resource-strategy-fit scheduler plugin by introducing the capability for per-Pod level scoring configurations. This allows for more intelligent and tailored scheduling decisions for diverse workloads, such as those heavily reliant on GPUs or CPUs, by enabling pods to specify their preferred resource scoring strategies through annotations. The change aims to boost scheduling accuracy and overall resource utilization in heterogeneous environments, all while ensuring seamless integration with existing global configurations.

Highlights

  • Per-Pod Scoring Configuration: The resource-strategy-fit plugin now supports defining scoring configurations at the individual Pod level, moving beyond the previous global-only setting.
  • Heterogeneous Workload Optimization: This enhancement allows specific workloads, such as GPU-intensive or CPU-intensive Pods, to be scored based on their primary resource requirements, improving scheduling accuracy and resource utilization.
  • Backward Compatibility: The plugin maintains backward compatibility; if no per-Pod scoring configuration is specified, the existing global scoring-type will still be applied.
  • Annotation-Driven Configuration: Per-Pod scoring types and resource weights are configured via new Pod annotations: volcano.sh/resource-strategy-scoring-type and volcano.sh/resource-strategy-weight.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 25, 2025
@kingeasternsun kingeasternsun changed the title pod level resource fit feat(resource-strategy-fit): add per-Pod scoring strategy Sep 25, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature by enabling per-Pod level scoring configurations in the resource-strategy-fit plugin. This allows for more granular control over scheduling, catering to heterogeneous workloads. The implementation is sound, maintaining backward compatibility with the global configuration. The test suite has been significantly improved with the addition of helper functions and comprehensive test cases for the new functionality, which is a great enhancement for maintainability. I have one suggestion to improve the error handling for misconfigurations to make it more user-friendly.

@volcano-sh-bot volcano-sh-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 25, 2025
@kingeasternsun kingeasternsun force-pushed the improve/pod-level-resource-strategy-policy branch from 852d46f to d69cf67 Compare September 25, 2025 08:43
@volcano-sh-bot volcano-sh-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 25, 2025
@hwdef
Copy link
Member

hwdef commented Sep 26, 2025

LGTM

I have a small suggestion(maybe in another PR): could we add some fields for Volcano Job,
e.g. job.spec.tasks[].resourceStrategyScoringType and job.spec.tasks[].resourceStrategyWeight?
This way you don't need to use pod.annotations.

Benefits:

  1. Better performance (no per-pod JSON unmarshal).
  2. Improved readability and VolcanoJob usability.

Signed-off-by: wangdongyang1 <[email protected]>
@kingeasternsun kingeasternsun force-pushed the improve/pod-level-resource-strategy-policy branch from d69cf67 to ecd2bee Compare September 26, 2025 08:57
@Monokaix
Copy link
Member

/approve

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Monokaix

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 26, 2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the user guide of how to use resource strategy fit plugin should also contain the sra and proportional strategy: #4454

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XbaoWu Could you help add sra and propotional user guide to this doc? We can merge this pr first. Or if we already have user guide in design doc, we can add a ref link

Copy link
Member

@JesseStutler JesseStutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Thanks!

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 28, 2025
@volcano-sh-bot volcano-sh-bot merged commit 57d871a into volcano-sh:master Sep 28, 2025
20 checks passed
@kingeasternsun
Copy link
Contributor Author

kingeasternsun commented Sep 28, 2025

LGTM

I have a small suggestion(maybe in another PR): could we add some fields for Volcano Job, e.g. job.spec.tasks[].resourceStrategyScoringType and job.spec.tasks[].resourceStrategyWeight? This way you don't need to use pod.annotations.

Benefits:

1. Better performance (no per-pod JSON unmarshal).

2. Improved readability and VolcanoJob usability.

But it does not support jobs like Deployments, we could add these fields for Volcano Job, and volcano controller add these annotations to pod according these job fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

✨ resource-strategy-fit plugin should support per-Pod scoring configuration

5 participants