Skip to content

Conversation

@houyuting
Copy link

@houyuting houyuting commented Jun 9, 2025

Fixes #4350

Result:

UT- when I use old code, I got:
image

UT- when I use new code, I got:
image

@volcano-sh-bot volcano-sh-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2025
@volcano-sh-bot volcano-sh-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 9, 2025
@volcano-sh-bot volcano-sh-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 9, 2025
@houyuting houyuting marked this pull request as ready for review June 9, 2025 09:23
@volcano-sh-bot volcano-sh-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2025
// init root queue realCapability/capability/deserved as cp.totalResource
rootQueueAttr := cp.queueOpts[api.QueueID(cp.rootQueue)]
rootQueueAttr.capability = cp.totalResource
if rootQueueAttr.capability == nil || rootQueueAttr.capability.IsEmpty() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rootQueueAttr.capability.IsEmpty() is enough because the capability of root queue will be initialized in newQueueAttr, and it won't be nil

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion, done.

Copy link
Member

@JesseStutler JesseStutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sqush to one commit, thanks :)

@houyuting houyuting requested a review from JesseStutler June 10, 2025 02:45
@JesseStutler
Copy link
Member

image
@houyuting Please squash to one commit :)

@JesseStutler
Copy link
Member

/ok-to-test

@volcano-sh-bot volcano-sh-bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jun 10, 2025
@JesseStutler
Copy link
Member

/lgtm
Please also take a look @hwdef @lowang-bh @Monokaix I think it's reasonable when users want to pre-set the capability for the root queue

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Jun 10, 2025
Comment on lines 604 to 605
rootQueueAttr.realCapability = cp.totalResource
rootQueueAttr.deserved = cp.totalResource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we handle realCapability and deseverd too? if realCapability and deseverd not empty and less than cp.totalResource may occurs error ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think realCapability and deseverd == cp.totalResource is suitable,because we need at least a field that identifies how many resources are actually available in the cluster and I haven't thought of any scenarios where these two fields need to be customized. Or do you have some ideas? We can discuss it together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think realCapability and deseverd == cp.totalResource is suitable,because we need at least a field that identifies how many resources are actually available in the cluster and I haven't thought of any scenarios where these two fields need to be customized. Or do you have some ideas? We can discuss it together.

yes, i think u r right, the realcapacity and deseverd will be caculated in checkHierarchicalQueue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems that if users previously set deserved fields of each sub queue, and then deserved field check may also fail when cluster resources reduced.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, done, please review again, thanks.

@Monokaix
Copy link
Member

That's a good catch, and @Xu-Wentao also has the same requirement, and I think that we can add root queue at helm chart, then user can set the spec of root queue when install volcano, and we can remove root queue creation operation in volcano scheduler.

@houyuting
Copy link
Author

@Monokaix Good idea and we can open another feature issue and track it. If you have time,please give me approve, thanks.

@JesseStutler
Copy link
Member

@Monokaix Good idea and we can open another feature issue and track it. If you have time,please give me approve, thanks.

OK @houyuting Would you like to raise this issue? Otherwise we may miss it :)

@volcano-sh-bot volcano-sh-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 30, 2025
@houyuting houyuting requested a review from Monokaix July 1, 2025 02:44
@Monokaix
Copy link
Member

Monokaix commented Jul 1, 2025

Please sign off your commit with git commit -s.

@Monokaix
Copy link
Member

Monokaix commented Jul 1, 2025

The root queue will also be updated when close session, so it's not enough to just modify plugin.

@houyuting
Copy link
Author

The root queue will also be updated when close session, so it's not enough to just modify plugin.

got it

@JesseStutler
Copy link
Member

image The UT failed, please fix it thanks

@JesseStutler
Copy link
Member

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly implements the intended change to only set the root queue's capability and deserved resources if they are not already defined by the user. The changes are logical and the new test cases are a good addition.

I've pointed out a couple of areas for improvement. One is a minor comment update for maintainability. The other is a more significant concern regarding the removal of logic that caps a queue's deserved resources by its realCapability in a hierarchical setup. This could lead to inconsistencies and scheduling issues, so it's worth a second look. Overall, great work on addressing the issue.

attr.name, attr.allocated.String(), attr.request.String(), attr.inqueue.String(), attr.elastic.String())
}

// init root queue realCapability/capability/deserved as cp.totalResource

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment on this line is now outdated due to the changes below. It states that capability and deserved are initialized as cp.totalResource, but this is now done conditionally. Please update the comment to reflect the new logic for better code maintainability.

Suggested change
// init root queue realCapability/capability/deserved as cp.totalResource
// init root queue: realCapability is set to total resource, and capability/deserved are also set if empty.

pg1 := util.BuildPodGroup("pg1", "ns1", "q11", 1, nil, schedulingv1beta1.PodGroupInqueue)
// queue
root := buildQueueWithParents("root", "", nil, nil)
root1 := buildQueueWithParents("root", "", nil, api.BuildResourceList("16", "16Gi"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It seems like root1 is overriding root queue, but root1 is not used anywhere in the test. Is this intentional?

// pod
p13 := util.BuildPod("ns1", "p13", "", corev1.PodPending, api.BuildResourceList("2", "2Gi", []api.ScalarResource{{Name: "nvidia.com/gpu", Value: "1"}}...), "pg13", make(map[string]string), make(map[string]string))
// queue
queue10 := buildQueueWithParents("q10", "root", nil, api.BuildResourceList("10", "4Gi", []api.ScalarResource{}...))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It seems like queue10 is not used anywhere in the test. Is this intentional?

Copy link
Member

@JesseStutler JesseStutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my latest review comments and gemini's comments, thanks

@volcano-sh-bot volcano-sh-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 21, 2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think go.mod and go.sum should not change in your pr

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@volcano-sh-bot volcano-sh-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 22, 2025
if attr.name == cp.rootQueue {
attr.guarantee = totalGuarantee
cp.totalGuarantee = totalGuarantee
if attr.guarantee.IsEmpty() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root queue is created by yourself or auto created?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto created

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then its guarantee and capability fields will be updated once created, so seems this can not take effect.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

root guarantee will be computed when one of sub queues guarantee update, and root capability will be updated when user reset it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that when the first time the root queue is created, its deserved and guarantee fields will be empty, so it will be directly be updated according to the cluster resources, and it will still check failed in the first time, until you have set these fields on root queue manually.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know your means we can add root queue at helm chart, I created a new issue #4496 track it, and I will complete it. it's ok?

@JesseStutler
Copy link
Member

/lgtm

@Monokaix
Copy link
Member

/approve
Thanks!

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Monokaix

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2025
@volcano-sh-bot volcano-sh-bot merged commit 33c9277 into volcano-sh:master Jul 29, 2025
18 checks passed
hajnalmt added a commit to hajnalmt/volcano that referenced this pull request Aug 25, 2025
When the hierarchy plugin is enabled, the capability check can
fail:
volcano-sh#4350
It will pass now, if the capability is set properly:
volcano-sh#4354
We should be able to set this value for the root queue through
helm too.

Signed-off-by: Hajnal Máté <[email protected]>
hajnalmt added a commit to hajnalmt/volcano that referenced this pull request Aug 25, 2025
When the hierarchy plugin is enabled, the capability check can
fail:
volcano-sh#4350
It will pass now, if the capability is set properly:
volcano-sh#4354
We should be able to set this value for the root queue through
helm too.

Signed-off-by: Hajnal Máté <[email protected]>
hajnalmt added a commit to hajnalmt/volcano that referenced this pull request Aug 25, 2025
When the hierarchy plugin is enabled, the capability check can
fail:
volcano-sh#4350
It will pass now, if the capability is set properly:
volcano-sh#4354
We should be able to set this value for the root queue through
helm too.

Signed-off-by: Hajnal Máté <[email protected]>
hajnalmt added a commit to hajnalmt/volcano that referenced this pull request Aug 25, 2025
When the hierarchy plugin is enabled, the capability check can
fail:
volcano-sh#4350
It will pass now, if the capability is set properly:
volcano-sh#4354
We should be able to set this value for the root queue through
helm too.

Signed-off-by: Hajnal Máté <[email protected]>
hajnalmt added a commit to hajnalmt/volcano that referenced this pull request Aug 26, 2025
When the hierarchy plugin is enabled, the capability check can
fail:
volcano-sh#4350
It will pass now, if the capability is set properly:
volcano-sh#4354
We should be able to set this value for the root queue through
helm too.

Signed-off-by: Hajnal Máté <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hierarychical queues validation can not pass although root queue capability greater than sub queue capability

5 participants