-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Set root capability only when user not set it #4354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // init root queue realCapability/capability/deserved as cp.totalResource | ||
| rootQueueAttr := cp.queueOpts[api.QueueID(cp.rootQueue)] | ||
| rootQueueAttr.capability = cp.totalResource | ||
| if rootQueueAttr.capability == nil || rootQueueAttr.capability.IsEmpty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rootQueueAttr.capability.IsEmpty() is enough because the capability of root queue will be initialized in newQueueAttr, and it won't be nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion, done.
JesseStutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please sqush to one commit, thanks :)
|
|
|
/ok-to-test |
|
/lgtm |
| rootQueueAttr.realCapability = cp.totalResource | ||
| rootQueueAttr.deserved = cp.totalResource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we handle realCapability and deseverd too? if realCapability and deseverd not empty and less than cp.totalResource may occurs error ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think realCapability and deseverd == cp.totalResource is suitable,because we need at least a field that identifies how many resources are actually available in the cluster and I haven't thought of any scenarios where these two fields need to be customized. Or do you have some ideas? We can discuss it together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
realCapability and deseverd == cp.totalResourceis suitable,because we need at least a field that identifies how many resources are actually available in the cluster and I haven't thought of any scenarios where these two fields need to be customized. Or do you have some ideas? We can discuss it together.
yes, i think u r right, the realcapacity and deseverd will be caculated in checkHierarchicalQueue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems that if users previously set deserved fields of each sub queue, and then deserved field check may also fail when cluster resources reduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, done, please review again, thanks.
|
That's a good catch, and @Xu-Wentao also has the same requirement, and I think that we can add root queue at helm chart, then user can set the spec of root queue when install volcano, and we can remove root queue creation operation in volcano scheduler. |
|
@Monokaix Good idea and we can open another feature issue and track it. If you have time,please give me approve, thanks. |
OK @houyuting Would you like to raise this issue? Otherwise we may miss it :) |
|
Please sign off your commit with |
|
The root queue will also be updated when close session, so it's not enough to just modify plugin. |
got it |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly implements the intended change to only set the root queue's capability and deserved resources if they are not already defined by the user. The changes are logical and the new test cases are a good addition.
I've pointed out a couple of areas for improvement. One is a minor comment update for maintainability. The other is a more significant concern regarding the removal of logic that caps a queue's deserved resources by its realCapability in a hierarchical setup. This could lead to inconsistencies and scheduling issues, so it's worth a second look. Overall, great work on addressing the issue.
| attr.name, attr.allocated.String(), attr.request.String(), attr.inqueue.String(), attr.elastic.String()) | ||
| } | ||
|
|
||
| // init root queue realCapability/capability/deserved as cp.totalResource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on this line is now outdated due to the changes below. It states that capability and deserved are initialized as cp.totalResource, but this is now done conditionally. Please update the comment to reflect the new logic for better code maintainability.
| // init root queue realCapability/capability/deserved as cp.totalResource | |
| // init root queue: realCapability is set to total resource, and capability/deserved are also set if empty. |
| pg1 := util.BuildPodGroup("pg1", "ns1", "q11", 1, nil, schedulingv1beta1.PodGroupInqueue) | ||
| // queue | ||
| root := buildQueueWithParents("root", "", nil, nil) | ||
| root1 := buildQueueWithParents("root", "", nil, api.BuildResourceList("16", "16Gi")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // pod | ||
| p13 := util.BuildPod("ns1", "p13", "", corev1.PodPending, api.BuildResourceList("2", "2Gi", []api.ScalarResource{{Name: "nvidia.com/gpu", Value: "1"}}...), "pg13", make(map[string]string), make(map[string]string)) | ||
| // queue | ||
| queue10 := buildQueueWithParents("q10", "root", nil, api.BuildResourceList("10", "4Gi", []api.ScalarResource{}...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JesseStutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check my latest review comments and gemini's comments, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think go.mod and go.sum should not change in your pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: leona.hou <[email protected]>
| if attr.name == cp.rootQueue { | ||
| attr.guarantee = totalGuarantee | ||
| cp.totalGuarantee = totalGuarantee | ||
| if attr.guarantee.IsEmpty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The root queue is created by yourself or auto created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then its guarantee and capability fields will be updated once created, so seems this can not take effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
root guarantee will be computed when one of sub queues guarantee update, and root capability will be updated when user reset it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that when the first time the root queue is created, its deserved and guarantee fields will be empty, so it will be directly be updated according to the cluster resources, and it will still check failed in the first time, until you have set these fields on root queue manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know your means we can add root queue at helm chart, I created a new issue #4496 track it, and I will complete it. it's ok?
|
/lgtm |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Monokaix The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
When the hierarchy plugin is enabled, the capability check can fail: volcano-sh#4350 It will pass now, if the capability is set properly: volcano-sh#4354 We should be able to set this value for the root queue through helm too. Signed-off-by: Hajnal Máté <[email protected]>
When the hierarchy plugin is enabled, the capability check can fail: volcano-sh#4350 It will pass now, if the capability is set properly: volcano-sh#4354 We should be able to set this value for the root queue through helm too. Signed-off-by: Hajnal Máté <[email protected]>
When the hierarchy plugin is enabled, the capability check can fail: volcano-sh#4350 It will pass now, if the capability is set properly: volcano-sh#4354 We should be able to set this value for the root queue through helm too. Signed-off-by: Hajnal Máté <[email protected]>
When the hierarchy plugin is enabled, the capability check can fail: volcano-sh#4350 It will pass now, if the capability is set properly: volcano-sh#4354 We should be able to set this value for the root queue through helm too. Signed-off-by: Hajnal Máté <[email protected]>
When the hierarchy plugin is enabled, the capability check can fail: volcano-sh#4350 It will pass now, if the capability is set properly: volcano-sh#4354 We should be able to set this value for the root queue through helm too. Signed-off-by: Hajnal Máté <[email protected]>


Fixes #4350
Result:
UT- when I use old code, I got:

UT- when I use new code, I got:
