Skip to content

Conversation

@XbaoWu
Copy link
Member

@XbaoWu XbaoWu commented Jul 13, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Prevent task starvation for those requesting critical resources, enhance the utilization of important resources, and thereby achieve more effective task scheduling strategies.
To keep the predicates plugin clean, the processing logic of predicates.proportional is moved to the resource-strategy-fit plugin.

Which issue(s) this PR fixes:

Fixes #4244

Special notes for your reviewer:

No additional assistance messages

Does this PR introduce a user-facing change?

1. Support users to avoid scarce resource nodes when scheduling tasks that do not require scarce resources.
2. Move the processing logic of `predicates.proportional` to the `resource-strategy-fit `plugin and adjust the previous configuration.

@volcano-sh-bot volcano-sh-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 13, 2025
@volcano-sh-bot volcano-sh-bot requested review from Thor-wl and k82cn July 13, 2025 16:03
@volcano-sh-bot volcano-sh-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 13, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @XbaoWu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a Scarce Resource Avoidance (SRA) policy within the ResourceStrategyFit plugin to enhance task scheduling strategies. It aims to prevent starvation for tasks requiring critical resources and improve overall resource utilization by allowing more intelligent placement of workloads, including the migration of existing proportional resource logic and the addition of a new retention policy.

Highlights

  • Scarce Resource Avoidance (SRA): Implemented a new Scarce Resource Avoidance (SRA) policy within the ResourceStrategyFit plugin to optimize resource utilization and prevent task starvation, particularly for critical resources like GPUs.
  • Proportional Policy Migration: The proportional predicate logic has been migrated from the predicates plugin to the resource-strategy-fit plugin, streamlining the codebase and centralizing resource-related scheduling policies.
  • New Retention Policy: Introduced a retention SRA policy, enabling users to define weights for scarce resources. This policy ensures that tasks not requiring these resources are preferentially scheduled away from nodes possessing them, thereby retaining critical resources for specialized workloads.
  • Configuration Updates: Updated the scheduler configuration to support the new SRA policies, introducing sra.policy to select between proportional and retention strategies, along with specific arguments like sra.proportional.* and sra.retention.*.
  • Documentation Enhancements: Added comprehensive design documentation for the new SRA feature, including detailed explanations of both retention and proportional policies, configuration examples, and best practices for AI-specific scheduling scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Scarce Resource Avoidance (SRA) policy for the ResourceStrategyFit plugin, improving resource utilization and refactoring proportional logic from the predicates plugin. The review focuses on documentation clarity, a correctness issue in scoring, and code duplication.

@XbaoWu XbaoWu force-pushed the master-4248 branch 2 times, most recently from 7df8c1e to 666446b Compare July 13, 2025 17:01
@XbaoWu
Copy link
Member Author

XbaoWu commented Jul 14, 2025

@JesseStutler @Monokaix @LY-today sra related content has been adjusted, when your time is convenient, please take a look at it, thank you :)

@volcano-sh-bot volcano-sh-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 15, 2025
@volcano-sh-bot volcano-sh-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 15, 2025
@XbaoWu XbaoWu force-pushed the master-4248 branch 3 times, most recently from c5a2a9c to 47d2c9d Compare July 15, 2025 10:50
@XbaoWu XbaoWu requested a review from JesseStutler July 15, 2025 11:07
@JesseStutler
Copy link
Member

/cc @Monokaix

@XbaoWu
Copy link
Member Author

XbaoWu commented Jul 26, 2025

@JesseStutler @LY-today The log content is slightly adjusted, and the sra's configuration items is also modified to the struct structure.
If it 's convenient, please help to see if there are still some parts that need to be improved or other ideas and suggestions for the modification, I will modify it as soon as possible.

@XbaoWu XbaoWu closed this Jul 26, 2025
@XbaoWu XbaoWu reopened this Jul 26, 2025
@XbaoWu XbaoWu requested a review from Monokaix August 21, 2025 01:11
@XbaoWu
Copy link
Member Author

XbaoWu commented Aug 25, 2025

@Monokaix The configuration structure has been updated, the middle semantic layer has been removed, and only sra and proportion have been retained.
The difference between these two policy is supplemented in the doc.

@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 16, 2025
@volcano-sh-bot volcano-sh-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 18, 2025
@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 21, 2025
@volcano-sh-bot volcano-sh-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 22, 2025
@Monokaix Monokaix requested a review from Copilot September 23, 2025 01:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Scarce Resource Avoidance (SRA) policy to the resource-strategy-fit plugin to prevent task starvation for critical resources and enhance resource utilization. The proportional policy processing logic is also moved from the predicates plugin to the resource-strategy-fit plugin to keep the predicates plugin clean.

  • Implements SRA policy with configurable weights for scarce resources
  • Moves proportional policy from predicates plugin to resource-strategy-fit plugin
  • Adds comprehensive tests for the new SRA functionality

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pkg/scheduler/plugins/util/util.go Adds ShouldAbort utility function for status checking
pkg/scheduler/plugins/resource-strategy-fit/sra.go Implements core SRA scheduling logic and scoring algorithm
pkg/scheduler/plugins/resource-strategy-fit/sra_test.go Comprehensive test coverage for SRA functionality
pkg/scheduler/plugins/resource-strategy-fit/resource_strategy_fit.go Integrates SRA and proportional policies into main plugin
pkg/scheduler/plugins/resource-strategy-fit/resource_strategy_fit_test.go Updates existing tests to reflect structural changes
pkg/scheduler/plugins/resource-strategy-fit/proportional.go Moves proportional logic from predicates plugin
pkg/scheduler/plugins/resource-strategy-fit/proportional_test.go Updates package declaration for moved code
pkg/scheduler/plugins/predicates/predicates.go Removes proportional logic and uses util.ShouldAbort
docs/design/resource-strategy-fit-scheduling.md Documents SRA and proportional policies
docs/design/proportional.md Updates configuration examples

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@XbaoWu XbaoWu force-pushed the master-4248 branch 2 times, most recently from 7e8b695 to d9a61d3 Compare September 23, 2025 04:37
@Monokaix
Copy link
Member

Ok, please also solve the code conflict.

@Monokaix
Copy link
Member

/approve
Thanks a lot!

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Monokaix

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 24, 2025
Copy link
Member

@JesseStutler JesseStutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 24, 2025
@volcano-sh-bot volcano-sh-bot merged commit d2ad73f into volcano-sh:master Sep 24, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Scarce Resource Avoidance Plugin scheduling

6 participants