feat: add kep md #845

LY-today · 2024-12-24T03:57:09Z

What would you like to be added?

What is your proposal:
The NodeResourcesFit plug-in of native k8s can only adopt a type of strategy for all resources, such as MostRequestedPriority and LeastRequestedPriority. However, in industrial practice, this design does not apply to some scenarios. For example: In AI scenarios, businesses that apply for GPUs prefer to occupy the entire GPU machine first to prevent GPU fragmentation; businesses that apply for CPU & MEM are prioritized and dispersed to non-GPU machines to prevent excessive consumption of CPU & MEM on GPU machines, resulting in real tasks of applying for GPUs. Pending due to insufficient non-GPU resources
. It is therefore hoped that both strategies can be extended to address this business need.

Why is this needed:
There are related descriptions above

Is there a suggested solution, if so, please add it:

plugin-one

config：

resources: 
  nvidia.com/gpu:
    type: MostAllocated
    weight: 2
  cpu:
    type: LeastAllocated
    weight: 1
  memory:
    type: LeastAllocated
    weight: 1

config description：

node score:

finalScoreNode = [(weight1 * resource1) + (weight2 * resource2) + … + (weightN* resourceN)] /(weight1+weight2+ … +weightN)

plugin-two

config：

resources: 
- nvidia.com/gpu

config description：

node score:

finalScoreNode = (allocatablesResourcesNum - requestsResourcesNum) * framework.MaxNodeScore / allocatablesResourcesNum

Why is this needed?

It’s introduced above

Signed-off-by: LY-today <[email protected]>

k8s-ci-robot · 2024-12-24T03:57:12Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2024-12-24T03:57:14Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: LY-today
Once this PR has been reviewed and has the lgtm label, please assign huang-wei for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2024-12-24T03:57:18Z

Hi @LY-today. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

netlify · 2024-12-24T03:57:42Z

✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.

Name	Link
🔨 Latest commit	`956d7f5`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-scheduler-plugins/deploys/676a63e8cf3a17000811db0a

LY-today · 2024-12-24T06:37:45Z

@googs1025 KEP

Signed-off-by: LY-today <[email protected]>

LY-today · 2024-12-25T07:25:08Z

@googs1025 KEP

@googs1025 Can you help advance this MR?

googs1025 · 2024-12-25T09:34:39Z

@googs1025 KEP

@googs1025 Can you help advance this MR?

Thanks for the invite, I'll handle this on weekend :)

LY-today · 2024-12-25T09:48:12Z

@googs1025 KEP

@googs1025 Can you help advance this MR?

Thanks for the invite, I'll handle this on weekend :)

thank you for your time

googs1025 · 2024-12-29T08:38:11Z

kep/625-node-resource-fit-plus-scoring/README.md

@@ -0,0 +1,114 @@
+# Node Resource Fit plus Scheduling


This seems very similar to an old plugin. Can you help to tell the difference or integrate it?
FYI: https://github.com/kubernetes-sigs/scheduler-plugins/tree/master/kep/48-node-resources-allocatable-scoring

@googs1025 Older version policies can only use one policy for different resources. Not suitable for complex resource scenarios, such as AI

@googs1025 Under the AI cluster. It is hoped that GPU tasks will be scheduled on one GPU machine as much as possible, and CPU tasks will be scattered on CPU machines. However, the old version of the policy does not support using different policies for the two resources.

As I mentioned, it seems to be very similar to the previous nodeResourcesAllocatable, and I don't think it needs to be extended with a new plugin. If it is possible, can it be integrated into the original plugin? 🤔

@googs1025 What you mean is that you agree with the design of the NodeResourcesFitPlus strategy, but you want to implement it by modifying the original nodeResourcesAllocatable strategy?

@googs1025 What you mean is that you agree with the design of the NodeResourcesFitPlus strategy, but you want to implement it by modifying the original nodeResourcesAllocatable strategy?

@googs1025 Do I understand correctly?

+1 to exploring extension of existing plugins before to introduce a "plus" variant.
In addition, I think the plugin name should convey its purpose in a bit more explicit way, so let's try to find a better name rather than appending the Plus :)

googs1025 · 2024-12-29T08:40:11Z

kep/625-node-resource-fit-plus-scoring/images/img1.png

It is not recommended to use screenshots of tables and pictures.

I'm going to make adjustments

googs1025 · 2024-12-29T08:45:17Z

kep/625-node-resource-fit-plus-scoring/README.md

+
+## Summary
+
+The NodeResourcesFit plug-in of native k8s can only adopt a type of strategy for all resources, such as MostRequestedPriority and LeastRequestedPriority. However, in industrial practice, this design does not apply to some scenarios. For example: In AI scenarios, businesses that apply for GPUs prefer to occupy the entire GPU machine first to prevent GPU fragmentation; businesses that apply for CPU & MEM are prioritized and dispersed to non-GPU machines to prevent excessive consumption of CPU & MEM on GPU machines, resulting in real tasks of applying for GPUs. Pending due to insufficient non-GPU resources


In the scenario of using gpu nodes, which are scarce resources, we should directly filter out the gpu nodes. Shouldn't this reduce the score? In addition, IIUC, gpu nodes (or other devices) are labeled (based on gpu-operator or nfd), and are generally filtered in this way.

Affinity strategies or nodeSelector require labeling nodes in advance, which is costly for cluster maintainers. The advantage of the strategy is to reduce this maintenance operation

I think quite the opposite, we should have provided feature tags for device-specific nodes (eg: nvidia.com/gpu.xxx). 🤔

Understand, labels can indeed be printed to distinguish machine types. Of course, it can also be done using the Affinity strategy. But what I want to say is that the above process has costs at the industrial practice level. 100 heterogeneous resources require the maintenance cost of 100 sets of labels.

@googs1025 If you think that maintenance cost is not something that k8s needs to consider, then indeed the second expansion strategy does not need to be incorporated.

@googs1025 Does the ScarceResourceAvoidance strategy have a clear conclusion? Accept or not?

This is not for me to decide and can be left to other maintainers to suggest.

@googs1025 Thank you for your feedback. Can you help me let other students review it?

LY-today · 2024-12-31T03:21:57Z

@googs1025 Hello, do you have any clear plans for these two plugins?

LY-today · 2025-01-02T07:58:14Z

@swatisehgal @zwpaper Please check

LY-today · 2025-01-06T03:17:28Z

Who can pay attention to this PR?

ffromani

initial review

ffromani · 2025-01-10T13:04:19Z

kep/625-node-resource-fit-plus-scoring/README.md

@@ -0,0 +1,114 @@
+# Node Resource Fit plus Scheduling


+1 to exploring extension of existing plugins before to introduce a "plus" variant.
In addition, I think the plugin name should convey its purpose in a bit more explicit way, so let's try to find a better name rather than appending the Plus :)

ffromani · 2025-01-10T13:07:10Z

kep/625-node-resource-fit-plus-scoring/README.md

+## Summary
+
+The NodeResourcesFit plug-in of native k8s can only adopt a type of strategy for all resources, such as MostRequestedPriority and LeastRequestedPriority. However, in industrial practice, this design does not apply to some scenarios. For example: In AI scenarios, businesses that apply for GPUs prefer to occupy the entire GPU machine first to prevent GPU fragmentation; businesses that apply for CPU & MEM are prioritized and dispersed to non-GPU machines to prevent excessive consumption of CPU & MEM on GPU machines, resulting in real tasks of applying for GPUs. Pending due to insufficient non-GPU resources
+. Therefore, two plugins are extended to solve this common problem.


It's AFAICT uncommon for a single KEP to introduce two different concepts. If they concepts are closely coupled, can they be handled by the same plugin?
If the concepts are loosely coupled and indpendent from each other, we should have 2 KEPs and 2 Plugin implementation in paralle, independent from each other

ffromani · 2025-01-10T13:07:57Z

kep/625-node-resource-fit-plus-scoring/README.md

+## Motivation
+case: 
+- GPU tasks take priority over the entire GPU
+- CPU&MEM tasks are distributed to the CPU machine first


are these use cases covered somehow by the DRA feature (https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/ ) ?

ffromani · 2025-01-10T13:08:53Z

kep/625-node-resource-fit-plus-scoring/README.md

+- The solution is more versatile, not limited to AI clusters or CPU clusters, and not limited to common CPU resources or extended GPU resources.
+
+- Different resource policies can be configured for different cluster types and prioritized in the form of weights.
+
+- Easy to expand


these looks like pros of your approach rather than the rationale for the aforementioned approach, which is the topic of this sections, on which we usually explain the design decisions and the motivations

ffromani · 2025-01-10T13:09:20Z

kep/625-node-resource-fit-plus-scoring/README.md

+
+- Different types of resources can be configured with different strategies to prioritize them in the form of weights
+
+- Prevent pods that have not applied for scarce resources from being scheduled to nodes with scarce resources.


which is the usecase beyond GPUs? Above you mention CPU/MEM (commodity) and GPU (scarce resource?).
Are there any other noteworthy resources? This also ties to the conversation about the amount of labels raised previously in the review

ffromani · 2025-01-10T13:11:40Z

kep/625-node-resource-fit-plus-scoring/README.md

+node score:
+```
+finalScoreNode = [(weight1 * resource1) + (weight2 * resource2) + … + (weightN* resourceN)] /(weight1+weight2+ … +weightN)
+```


can we have few user stories and/or examples to see how this would translate in practice in various usage scenarios?

ffromani · 2025-01-10T13:11:46Z

kep/625-node-resource-fit-plus-scoring/README.md

+```
+finalScoreNode = (allocatablesResourcesNum - requestsResourcesNum) * framework.MaxNodeScore / allocatablesResourcesNum
+```


LY-today · 2025-02-08T06:24:54Z

@Huang-Wei Regarding plugin-2, how should I modify KEP? Is there any reference? Or is there something you don’t understand?

k8s-triage-robot · 2025-05-09T06:59:19Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2025-06-08T07:42:13Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

feat: add kep

fe07ce9

Signed-off-by: LY-today <[email protected]>

k8s-ci-robot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Dec 24, 2024

k8s-ci-robot requested review from swatisehgal and zwpaper December 24, 2024 03:57

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 24, 2024

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 24, 2024

LY-today mentioned this pull request Dec 24, 2024

feat: add NodeResourcesFitPlus and ScarceResourceAvoidance plugin #843

Open

fix: update kep md

956d7f5

Signed-off-by: LY-today <[email protected]>

LY-today changed the title ~~feat: add kep~~ feat: add kep md Dec 24, 2024

googs1025 reviewed Dec 29, 2024

View reviewed changes

ffromani reviewed Jan 10, 2025

View reviewed changes

Huang-Wei mentioned this pull request Jan 13, 2025

[proposal]Two better resource scheduling and allocation plugins #842

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 9, 2025

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 8, 2025


		## Summary

		The NodeResourcesFit plug-in of native k8s can only adopt a type of strategy for all resources, such as MostRequestedPriority and LeastRequestedPriority. However, in industrial practice, this design does not apply to some scenarios. For example: In AI scenarios, businesses that apply for GPUs prefer to occupy the entire GPU machine first to prevent GPU fragmentation; businesses that apply for CPU & MEM are prioritized and dispersed to non-GPU machines to prevent excessive consumption of CPU & MEM on GPU machines, resulting in real tasks of applying for GPUs. Pending due to insufficient non-GPU resources


		- Different types of resources can be configured with different strategies to prioritize them in the form of weights

		- Prevent pods that have not applied for scarce resources from being scheduled to nodes with scarce resources.

feat: add kep md #845

Are you sure you want to change the base?

feat: add kep md #845

Conversation

LY-today commented Dec 24, 2024

What would you like to be added?

plugin-one

plugin-two

Why is this needed?

Uh oh!

k8s-ci-robot commented Dec 24, 2024

Uh oh!

k8s-ci-robot commented Dec 24, 2024

Uh oh!

k8s-ci-robot commented Dec 24, 2024

Uh oh!

netlify bot commented Dec 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.

Uh oh!

LY-today commented Dec 24, 2024

Uh oh!

LY-today commented Dec 25, 2024

Uh oh!

googs1025 commented Dec 25, 2024

Uh oh!

LY-today commented Dec 25, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LY-today Dec 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LY-today Dec 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LY-today Dec 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LY-today Dec 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LY-today commented Dec 31, 2024

Uh oh!

LY-today commented Jan 2, 2025

Uh oh!

LY-today commented Jan 6, 2025

Uh oh!

ffromani left a comment

Choose a reason for hiding this comment

netlify bot commented Dec 24, 2024 •

edited

Loading

LY-today Dec 30, 2024 •

edited

Loading

LY-today Dec 30, 2024 •

edited

Loading

LY-today Dec 31, 2024 •

edited

Loading

LY-today Dec 31, 2024 •

edited

Loading