Skip to content

Conversation

@iomarsayed
Copy link
Contributor

@iomarsayed iomarsayed commented Oct 13, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

During flavor assignment, misleading messages presented to user where only the total remaining requested quota of all podsets is compared to capacity.
Now, quota of current podset will be distinguished in messages.

Which issue(s) this PR fixes:

Fixes #4134

Does this PR introduce a user-facing change?

Observability: Improve the messages presented to the user in scheduling events, by clarifying the reason for "insufficient quota" in case of workloads with multiple PodSets. 
Before: "insufficient quota for resource-type in flavor example-flavor, request > maximum capacity (24 > 16)"
After: "insufficient quota for resource-type in flavor example-flavor, previously considered podsets requests (16) + current podset request (8) > maximum capacity (16)"

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 13, 2025
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 13, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Oct 13, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @iomarsayed!

It looks like this is your first PR to kubernetes-sigs/kueue 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kueue has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 13, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @iomarsayed. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@netlify
Copy link

netlify bot commented Oct 13, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 1d06f43
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/68efa3169fdfff0008a23419

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 13, 2025
@mimowo
Copy link
Contributor

mimowo commented Oct 13, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 13, 2025
@mimowo
Copy link
Contributor

mimowo commented Oct 13, 2025

@iomarsayed please make sure CLA is signed

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Oct 13, 2025
status.appendf("insufficient quota for %s in flavor %s, request > maximum capacity (%s > %s)",
fr.Resource, fr.Flavor, resources.ResourceQuantityString(fr.Resource, val), resources.ResourceQuantityString(fr.Resource, maxCapacity))
if totalRequestQuota > maxCapacity {
status.appendf("insufficient quota for %s in flavor %s, remaining podset requests (%s) + current podset request (%s) > maximum capacity (%s)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a user-facing change, we could handle it as a bugfix, but please add a unit test to demonstrate the new message. I think this could be tested in scheduler_test.go by asserting using wantEvents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will modify it as a bugfix, and will also add a unit test for that!

@mimowo
Copy link
Contributor

mimowo commented Oct 13, 2025

the release not under "Does this PR introduce a user-facing change?" needs to be inside the release-note block as if you create a new PR

@k8s-ci-robot
Copy link
Contributor

@iomarsayed: The label(s) /remove-label kind/cleanup cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor, ci-short, ci-extended, ci-full. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/remove-label kind/cleanup

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

@iomarsayed: The label(s) /label release-note, /remove-label do-not-merge/release-note-label-needed cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor, ci-short, ci-extended, ci-full. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/remove-label do-not-merge/release-note-label-needed
/label release-note

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 13, 2025
}
}

resourceLimit := podSets[0].Template.Spec.Containers[0].Resources.Limits[rName]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally Kueue doesn't make decisions based on "Limits", so I'm surprised by this code.

I think (by might be missing something) we should just pass val and assignmentUsage[fr] as separate parameters, here:

  • val is the "current PodSet usage"
  • assignmentUsage[fr] is the usage coming from the previously considered PodSets

/hold

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want first to confirm that no changes to the actual logic/decisions are done in this PR and all are only adjusting messages logged to the user.

From my understanding, In case of multi-replica "val" refers to the total podset quota, while I am concerned with podset quota for a single replica. So this was my resort. We want to tell the user only if a single-replica podset quota > current resource flavor, because that's how they are actually assigned (as a single replica not the total replica quota).
If that's incorrect let me know.

Thanks for your clarification though it made realize I have to adjust the message because it wasn't incorrectly showing the previous podset usage.
The new message layout is based on @gabesaba suggestion though:
#4134 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want first to confirm that no changes to the actual logic/decisions are done in this PR and all are only adjusting messages logged to the user.

Yes, but I'm sceptical we should base the message on the "limits". The name podSetQuota is also rather confusing. We declare quota at the CQ level. PodSets are more associated with "usage", calculated based on "requests".

From my understanding, In case of multi-replica "val" refers to the total podset quota, while I am concerned with podset quota for a single replica. So this was my resort. We want to tell the user only if a single-replica podset quota > current resource flavor, because that's how they are actually assigned (as a single replica not the total replica quota).
If that's incorrect let me know.

I think there is room for improvement for sure, but I think the source of the problem is that currently we use val+assignmentUsage[fr] which does not allow for more detailed message. So my proposal is to also keep the same logic, just pass val and assignmentUsage[fr] as separate params.

The new message layout is based on @gabesaba suggestion though: #4134 (comment)

Yeah, this proposal reads ok. I would just tweak slightly as maybe "previous requests" would not be clear, so: requests for previously considered PodSets (%s) + request for current PodSet (%s) > maximum capacity (%s).

In any case, the usage for comparision comes from "Requests", not "limits".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so, I have made the following changes:

  • Adjusted the message to be clear as you suggested.
  • You are correct! I have corrected the comparison to be against requests, to match the actual logic happening (although it is concerning why a total podset quota including all replicas is compared to a resource flavor and not per replica, but this should be a separate discussion). I have separated the variables, and renamed the function parameters, because they were very vague (specifically "val") while keeping changes as minimum as possible.
  • Adjusted test cases reflecting the changes in the message logs.git statu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall, just added a small follow up comment on naming

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/unhold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 13, 2025
@iomarsayed
Copy link
Contributor Author

/test pull-kueue-test-unit-main

@iomarsayed iomarsayed force-pushed the 4134-differentiate-podsets-quota-in-messages-presented-to-user branch from 7b23bd9 to 992a55a Compare October 15, 2025 13:15
// If the flavor doesn't satisfy limits immediately (when waiting or preemption
// could help), it returns a Status with reasons.
func (a *FlavorAssigner) fitsResourceQuota(log logr.Logger, fr resources.FlavorResource, val int64, rQuota schdcache.ResourceQuota) (preemptionMode, int, *Status) {
func (a *FlavorAssigner) fitsResourceQuota(log logr.Logger, fr resources.FlavorResource, usageQuota int64, requestQuota int64, rQuota schdcache.ResourceQuota) (preemptionMode, int, *Status) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (a *FlavorAssigner) fitsResourceQuota(log logr.Logger, fr resources.FlavorResource, usageQuota int64, requestQuota int64, rQuota schdcache.ResourceQuota) (preemptionMode, int, *Status) {
func (a *FlavorAssigner) fitsResourceQuota(log logr.Logger, fr resources.FlavorResource, assumedUsage int64, requestedUsage int64, rQuota schdcache.ResourceQuota) (preemptionMode, int, *Status) {

Copy link
Contributor Author

@iomarsayed iomarsayed Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done just now

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 15, 2025
@mimowo
Copy link
Contributor

mimowo commented Oct 16, 2025

/release-note-edit

Improve the messages presented to the user in scheduling events, by clarifying the reason for "insufficient quota"
in case of workloads with multiple PodSets. 

Example:
- before: "insufficient quota for resource-type in flavor example-flavor, request > maximum capacity (24 > 16)"
- after: "insufficient quota for resource-type in flavor example-flavor, previously considered podsets requests (16) + current podset request (8) > maximum capacity (16)"

@mimowo
Copy link
Contributor

mimowo commented Oct 16, 2025

Thanks 👍
/lgtm
/approve
/cherrypick release-0.14
/cherrypick release-0.13

@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: once the present PR merges, I will cherry-pick it on top of release-0.13, release-0.14 in new PRs and assign them to you.

In response to this:

Thanks 👍
/lgtm
/approve
/cherrypick release-0.14
/cherrypick release-0.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 7ebca5b4b76a8dc5e9faeaa338143e8637002451

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: iomarsayed, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2025
@k8s-ci-robot k8s-ci-robot merged commit aac26f5 into kubernetes-sigs:main Oct 16, 2025
23 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.15 milestone Oct 16, 2025
@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: new pull request created: #7293

In response to this:

Thanks 👍
/lgtm
/approve
/cherrypick release-0.14
/cherrypick release-0.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: new pull request created: #7294

In response to this:

Thanks 👍
/lgtm
/approve
/cherrypick release-0.14
/cherrypick release-0.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mimowo
Copy link
Contributor

mimowo commented Nov 28, 2025

/release-note-edit

Observability: Improve the messages presented to the user in scheduling events, by clarifying the reason for "insufficient quota" in case of workloads with multiple PodSets. 
Before: "insufficient quota for resource-type in flavor example-flavor, request > maximum capacity (24 > 16)"
After: "insufficient quota for resource-type in flavor example-flavor, previously considered podsets requests (16) + current podset request (8) > maximum capacity (16)"

@mimowo
Copy link
Contributor

mimowo commented Nov 28, 2025

/remove-kind cleanup
/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Differentiate usage by previous podsets and current podsets in messages presented to user

4 participants