Skip to content

Conversation

@j-skiba
Copy link
Contributor

@j-skiba j-skiba commented Oct 21, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This change updates the kueue_cluster_queue_weighted_share and kueue_cohort_weighted_share metrics to report precise fair sharing weights, rather than rounded values, and adds a cohort label for better context.

Which issue(s) this PR fixes:

Fixes #7244

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Observability: Adjust the `cluster_queue_weighted_share` and `cohort_weighted_share` metrics to report the precise value for the Weighted share, rather than the value rounded to an integer. Also, expand the `cluster_queue_weighted_share` metric with the "cohort" label.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels Oct 21, 2025
@netlify
Copy link

netlify bot commented Oct 21, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit a4916a2
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/68ff38044a2b5a0008beb5d6

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 21, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @j-skiba. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 21, 2025
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 21, 2025
@k8s-triage-robot
Copy link

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-contributor-experience at kubernetes/community.

/check-cla
/easycla

1 similar comment
@k8s-triage-robot
Copy link

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-contributor-experience at kubernetes/community.

/check-cla
/easycla

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 21, 2025
@mbobrovskyi
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 22, 2025
@mbobrovskyi
Copy link
Contributor

@j-skiba please rebase

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 22, 2025
@j-skiba j-skiba marked this pull request as ready for review October 22, 2025 06:00
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 22, 2025
Comment on lines 694 to 696
if s.PreciseWeightedShare == math.Inf(1) {
return math.MaxInt64
}
Copy link
Contributor

@mimowo mimowo Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky, we need a test for this scenario. I'm worried this will be cumbersome to visualize in any tool like graphana. For example consider visualing on one graph DRS from two CQs: one with weight=0, and one with weight=1. The one will weight=0 will have DRS=max in64 making the entire plot flattened for other CQs (IIUC). Maybe grafana could deal with that somehow, but then it needs to be investigated.

Instead of using MaxInt64 I would rather like to report MaxRange + 1. MaxRange curretly being 1000. wdyt @amy @gabesaba ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @pajakd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to add this just to cover this case and the comment here says that functional branches should never reach here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But nevertheless, it might be worth handling this as you said.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think this comment is a bit tricky: https://github.com/j-skiba/kueue/blob/5cf835a6bf0dd22feee6b9bf650738f71a99cd3e/pkg/cache/scheduler/fair_sharing.go#L72 - it assumed the state of the world as before

Now, with this new feature this is a new functional branch, so I think this comment would no longer be accurate. So I would like to adjust that comment.

But nevertheless, it might be worth handling this as you said.

I think so, the only scenario which does not make me totally sure is when one is reducing the CQ quota, then the CQ might be temporarily running "overcommitted", and thus above 1000. You may check experimentally if this scenario is real. If this is the case then assuming 1001 might be tricky indeed.

I would be good to consider what is the range actually possible. Using Maxint64 for the metrics is weird.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entire plot flattened for other CQs (IIUC). Maybe grafana could deal with that somehow, but then it needs to be investigated.

Yeah... this sounds not great. If this is the case, can you look into if grafana can cap the Y axis for the viewing window?

Copy link
Contributor Author

@j-skiba j-skiba Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about setting the metric value to NaN if weight equals 0.0 and noting that in the metric's description? If the metric value can theoretically be anything from 0 to over 1000 in this case

the only scenario which does not make me totally sure is when one is reducing the CQ quota, then the CQ might be temporarily running "overcommitted", and thus above 1000.

Considering the metric value can theoretically be anything from 0 to over 1000 (especially in the "overcommitted" scenario you mentioned), perhaps the NaN approach would be fine. Grafana has a mapping feature that can handle special values: https://grafana.com/docs/grafana/latest/panels-visualizations/configure-value-mappings/#special. By default, metrics with NaN would be skipped by Grafana.  

Although, I'm not sure if using NaN like this is a good pattern. Just throwing out an idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using NaN feels better than MaxInt64 and 1001. I'm ok with that approach if no other voices or better ideas. I would also log NaN for consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify my understanding and confirm the logic:

  1. The return value from this WeightedShare() method is only used to set the status.fairSharing.WeightedShare field. This method handles Inf by returning math.MaxInt64, which is fine for the status API.

  2. The Prometheus metric, on the other hand, gets its value from the raw PreciseWeightedShare(). This value can be Inf, which is what could cause the issue with flattened graphs.

Therefore, the change I pushed (in clusterqueue_controller.go) to convert Inf to NaN specifically for the metric seems fine. It solves the graphing problem without impacting the status field's logic.

@amy
Copy link
Contributor

amy commented Oct 22, 2025

Can you also add scheduling cycle number? We need something to collate the values within a tournament. I'm not sure about metric cardinality for that though.

Perhaps the tournament correlation needs to be done via logs. But yeah, the main context that matters are the DRS values grouped within a tournament.

@mimowo
Copy link
Contributor

mimowo commented Oct 23, 2025

Can you also add scheduling cycle number? We need something to collate the values within a tournament. I'm not sure about metric cardinality for that though.

Yes, I don't think we should be exposing the schedulingCycle counter. This is more of the technical detail.

Also, what would you exactly correlate the schedulingCycle with even if exposed - this is very transient state (state of the tournament) which is partially recorded in logs only. If we are going to correlate that precisely to the logs anyway, then we could just log the DRS at higher level of logging while the tournament is happening.

To make this easier I'm thinking we could have in the Kueue repo a small script like "fair sharing log analyzer" even, which would rely on logs at say V4+.

@amy
Copy link
Contributor

amy commented Oct 23, 2025

Also, what would you exactly correlate the schedulingCycle with even if exposed - this is very transient state (state of the tournament) which is partially recorded in logs only. If we are going to correlate that precisely to the logs anyway, then we could just log the DRS at higher level of logging while the tournament is happening.

So the context to why we want higher DRS value precision instrumentation (regardless of metrics or logging) is so that operators could validate scheduling logic. When we originally found the rounding errors for fairshare tournaments, we looked at both the workload/CQ with the wrong DRS value and the competitors in the tournament.

A different question would be, why expose DRS with higher precision at all given its pretty transient and doesn't really make sense outside the context of a tournament.

To make this easier I'm thinking we could have in the Kueue repo a small script like "fair sharing log analyzer" even, which would rely on logs at say V4+.

Sounds like an interesting idea!

@j-skiba
Copy link
Contributor Author

j-skiba commented Oct 24, 2025

@mimowo should I also change how cohort_weighted_share is reported? It's a similar case to cluster_queue_weighted_share in case of max_int value -

If the Cohort has a weight of zero and is borrowing, this will return 9223372036854775807,

@mimowo
Copy link
Contributor

mimowo commented Oct 24, 2025

So the context to why we want higher DRS value precision instrumentation (regardless of metrics or logging) is so that operators could validate scheduling logic. When we originally found the rounding errors for fairshare tournaments, we looked at both the workload/CQ with the wrong DRS value and the competitors in the tournament.

Indeed the value of the metric and API for CQ is bumped in the cluster_queue controller, which is by design decoupled from scheduler's value which is in cache, see here.

However, metrics are only scraped by tools like prometheus in intervals, by default 15s. So this will also not give us a super precise tool for debugging.

A different question would be, why expose DRS with higher precision at all given its pretty transient and doesn't really make sense outside the context of a tournament.

That is a valid question to ask. As mentioned above, neither API nor metric will give us a super precise value as used by scheduler (at least I have no clue how to do it). We can only get us close as possible with the approximation, thus the proposal to increase the precision of the metric.

To make this easier I'm thinking we could have in the Kueue repo a small script like "fair sharing log analyzer" even, which would rely on logs at say V4+.
Sounds like an interesting idea!

Well, this is the only idea currently I have to expose the "precise point in time" values for DRS as used by the scheduler.

@mimowo
Copy link
Contributor

mimowo commented Oct 24, 2025

cc @PBundyra @mwielgus who are also looking into debuggability of DRS

@mimowo
Copy link
Contributor

mimowo commented Oct 24, 2025

@mimowo should I also change how cohort_weighted_share is reported? It's a similar case to cluster_queue_weighted_share in case of max_int value -

Oh yes, I think if we change for ClusterQueue, then we should change for Cohort in sync, so please update the PR.

However, be aware the discussion may continue as it was raised if this is needed #7338 (comment)

@amy
Copy link
Contributor

amy commented Oct 24, 2025

However, metrics are only scraped by tools like prometheus in intervals, by default 15s. So this will also not give us a super precise tool for debugging.

However, be aware the discussion may continue as it was raised if this is needed #7338 (comment)

Ah alrighty. This metric without schedulingcycle could still be useful! We can retroactively correlate this with other metrics with time roughly. (Ex: at the most basic levels, when we expect a CQ to be bursting/using guarantees. Or when we have high CQ weights what the potential values could be. Then we use those clues to dig further in logs.)

@mimowo
Copy link
Contributor

mimowo commented Oct 27, 2025

/release-note-edit

Adjust the `cluster_queue_weighted_share` and `cohort_weighted_share` metrics to report the precise value for the 
Weighted share, rather than the value rounded to an integer. Also, expand the `cluster_queue_weighted_share` metric
with the "cohort" label.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Oct 27, 2025
@mimowo
Copy link
Contributor

mimowo commented Oct 27, 2025

Thanks 👍
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 27, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: j-skiba, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 709e86591d5654964567426a60051bb5ec7018be

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 27, 2025
@k8s-ci-robot k8s-ci-robot merged commit 4d88320 into kubernetes-sigs:main Oct 27, 2025
23 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.15 milestone Oct 27, 2025
mbobrovskyi pushed a commit to epam/kubernetes-kueue that referenced this pull request Oct 27, 2025
…ics (kubernetes-sigs#7338)

* use float instead of int in cluster_queue_weighted_share metric and add cohort label

* don't use two fields for weighted share

* adjust metric test util to the changes

* make ExpectClusterQueueWeightedShareMetric accept float64 as value

* adjust integration test

* report NaN instead of max_int when weight is 0

* remove unused imports in e2e tests

* use float instead of int in cohort_weighted_share metric

* fix format and naming cleanup
Singularity23x0 pushed a commit to Singularity23x0/kueue that referenced this pull request Nov 3, 2025
…ics (kubernetes-sigs#7338)

* use float instead of int in cluster_queue_weighted_share metric and add cohort label

* don't use two fields for weighted share

* adjust metric test util to the changes

* make ExpectClusterQueueWeightedShareMetric accept float64 as value

* adjust integration test

* report NaN instead of max_int when weight is 0

* remove unused imports in e2e tests

* use float instead of int in cohort_weighted_share metric

* fix format and naming cleanup
@mimowo
Copy link
Contributor

mimowo commented Nov 28, 2025

/release-note-edit

Observability: Adjust the `cluster_queue_weighted_share` and `cohort_weighted_share` metrics to report the precise value for the Weighted share, rather than the value rounded to an integer. Also, expand the `cluster_queue_weighted_share` metric with the "cohort" label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose contextualized FairSharing Weights for ClusterQueues as metrics

6 participants