-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: Node resource topology awareness, stop scheduling and notReady #4373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@hwdef please check |
|
@googs1025 Do you have any other questions? It seems that my changes have a great impact on the testing of various links, and solving the testing problems seems to be the difficulty |
|
can you squash commits to one? |
Maybe wait until ci is fixed before merging? |
|
@hwdef @googs1025 @k82cn There are many unit test issues involved here, and I personally cannot accurately measure the business logic of each unit test, so if you approve of the changes in this PR, can you help me complete the unit test repair? |
Just a guess, maybe you validated the node status(ready or not). Maybe some unit tests didn't set the node status? You can look at this part |
|
I still think this method is too complicated. . . What we need is the logic to filter out NotReady or unschedulable. I think this should be very simple. In fact, we don’t care about these fields in the status list at all. |
If I analyze this PR from the perspective of implementation difficulty, I can certainly start from luxury and start from thriftiness, so that the test will not be very complicated. But I am worried that there will still be boundary issues when running it online. Referring to k8s describe node is the best practice I know so far. When resource calculation is abnormal, if the log is consistent with the describe node output most commonly used by users, it will be easier for users to understand. |
|
/cc @Monokaix @JesseStutler @hwdef please,check,This part of the logic has a great impact on the calculation of quota. We have encountered many related problems online. Please help follow up. |
|
/cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a helper to determine node readiness and integrates it into the scheduler’s session logic, along with corresponding tests and adjustments to test utilities.
- Add a default
NodeReadycondition in the test node builder - Introduce
nodeIsNotReadyinframework/util.goand its test suite - Skip unready nodes during session setup in
session.go
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pkg/scheduler/util/test_utils.go | Set a NodeReady condition by default in BuildNode |
| pkg/scheduler/framework/util_test.go | New tests for the nodeIsNotReady function |
| pkg/scheduler/framework/util.go | Add nodeIsNotReady utility and import slices |
| pkg/scheduler/framework/session.go | Skip nodes flagged as not ready in openSession |
Comments suppressed due to low confidence (2)
pkg/scheduler/framework/util.go:268
- [nitpick] The function name
nodeIsNotReadysuggests it returns true for an unready node, but it actually returns true for ready nodes. Consider renaming it tonodeIsReadyor inverting its return logic to match its name.
func nodeIsNotReady(obj *v1.Node) bool {
pkg/scheduler/framework/util_test.go:11
- [nitpick] Test name
TestGetNodeStatusdoesn’t match the tested functionnodeIsNotReady. Rename the test to something likeTestNodeIsReadyorTestNodeIsNotReadyfor clarity.
func TestGetNodeStatus(t *testing.T) {
|
@JesseStutler @Monokaix please,check |
JesseStutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Much better, thanks
|
@LY-today Please squash your commits into one, thanks |
done, compressed into one line, the upstream master has changed, and merge is performed |
b69c29f to
3641512
Compare
|
@JesseStutler please check |
|
Please merge to one commit. |
Signed-off-by: LY-today <[email protected]>
@Monokaix done |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Monokaix The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
JesseStutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
#4370 (comment)