-
Notifications
You must be signed in to change notification settings - Fork 479
Expose contextualized fair sharing weights for cluster queues as metrics #7338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose contextualized fair sharing weights for cluster queues as metrics #7338
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
|
Hi @j-skiba. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
1 similar comment
|
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
|
/ok-to-test |
|
@j-skiba please rebase |
pkg/cache/scheduler/cache.go
Outdated
| if s.PreciseWeightedShare == math.Inf(1) { | ||
| return math.MaxInt64 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky, we need a test for this scenario. I'm worried this will be cumbersome to visualize in any tool like graphana. For example consider visualing on one graph DRS from two CQs: one with weight=0, and one with weight=1. The one will weight=0 will have DRS=max in64 making the entire plot flattened for other CQs (IIUC). Maybe grafana could deal with that somehow, but then it needs to be investigated.
Instead of using MaxInt64 I would rather like to report MaxRange + 1. MaxRange curretly being 1000. wdyt @amy @gabesaba ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @pajakd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to add this just to cover this case and the comment here says that functional branches should never reach here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But nevertheless, it might be worth handling this as you said.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think this comment is a bit tricky: https://github.com/j-skiba/kueue/blob/5cf835a6bf0dd22feee6b9bf650738f71a99cd3e/pkg/cache/scheduler/fair_sharing.go#L72 - it assumed the state of the world as before
Now, with this new feature this is a new functional branch, so I think this comment would no longer be accurate. So I would like to adjust that comment.
But nevertheless, it might be worth handling this as you said.
I think so, the only scenario which does not make me totally sure is when one is reducing the CQ quota, then the CQ might be temporarily running "overcommitted", and thus above 1000. You may check experimentally if this scenario is real. If this is the case then assuming 1001 might be tricky indeed.
I would be good to consider what is the range actually possible. Using Maxint64 for the metrics is weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entire plot flattened for other CQs (IIUC). Maybe grafana could deal with that somehow, but then it needs to be investigated.
Yeah... this sounds not great. If this is the case, can you look into if grafana can cap the Y axis for the viewing window?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about setting the metric value to NaN if weight equals 0.0 and noting that in the metric's description? If the metric value can theoretically be anything from 0 to over 1000 in this case
the only scenario which does not make me totally sure is when one is reducing the CQ quota, then the CQ might be temporarily running "overcommitted", and thus above 1000.
Considering the metric value can theoretically be anything from 0 to over 1000 (especially in the "overcommitted" scenario you mentioned), perhaps the NaN approach would be fine. Grafana has a mapping feature that can handle special values: https://grafana.com/docs/grafana/latest/panels-visualizations/configure-value-mappings/#special. By default, metrics with NaN would be skipped by Grafana.
Although, I'm not sure if using NaN like this is a good pattern. Just throwing out an idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using NaN feels better than MaxInt64 and 1001. I'm ok with that approach if no other voices or better ideas. I would also log NaN for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify my understanding and confirm the logic:
-
The return value from this WeightedShare() method is only used to set the status.fairSharing.WeightedShare field. This method handles
Infby returningmath.MaxInt64, which is fine for the status API. -
The Prometheus metric, on the other hand, gets its value from the raw PreciseWeightedShare(). This value can be
Inf, which is what could cause the issue with flattened graphs.
Therefore, the change I pushed (in clusterqueue_controller.go) to convert Inf to NaN specifically for the metric seems fine. It solves the graphing problem without impacting the status field's logic.
|
Can you also add scheduling cycle number? We need something to collate the values within a tournament. I'm not sure about metric cardinality for that though. Perhaps the tournament correlation needs to be done via logs. But yeah, the main context that matters are the DRS values grouped within a tournament. |
Yes, I don't think we should be exposing the schedulingCycle counter. This is more of the technical detail. Also, what would you exactly correlate the schedulingCycle with even if exposed - this is very transient state (state of the tournament) which is partially recorded in logs only. If we are going to correlate that precisely to the logs anyway, then we could just log the DRS at higher level of logging while the tournament is happening. To make this easier I'm thinking we could have in the Kueue repo a small script like "fair sharing log analyzer" even, which would rely on logs at say V4+. |
So the context to why we want higher DRS value precision instrumentation (regardless of metrics or logging) is so that operators could validate scheduling logic. When we originally found the rounding errors for fairshare tournaments, we looked at both the workload/CQ with the wrong DRS value and the competitors in the tournament. A different question would be, why expose DRS with higher precision at all given its pretty transient and doesn't really make sense outside the context of a tournament.
Sounds like an interesting idea! |
Indeed the value of the metric and API for CQ is bumped in the cluster_queue controller, which is by design decoupled from scheduler's value which is in cache, see here. However, metrics are only scraped by tools like prometheus in intervals, by default 15s. So this will also not give us a super precise tool for debugging.
That is a valid question to ask. As mentioned above, neither API nor metric will give us a super precise value as used by scheduler (at least I have no clue how to do it). We can only get us close as possible with the approximation, thus the proposal to increase the precision of the metric.
Well, this is the only idea currently I have to expose the "precise point in time" values for DRS as used by the scheduler. |
Oh yes, I think if we change for ClusterQueue, then we should change for Cohort in sync, so please update the PR. However, be aware the discussion may continue as it was raised if this is needed #7338 (comment) |
Ah alrighty. This metric without schedulingcycle could still be useful! We can retroactively correlate this with other metrics with time roughly. (Ex: at the most basic levels, when we expect a CQ to be bursting/using guarantees. Or when we have high CQ weights what the potential values could be. Then we use those clues to dig further in logs.) |
|
/release-note-edit |
|
Thanks 👍 |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: j-skiba, mimowo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
LGTM label has been added. Git tree hash: 709e86591d5654964567426a60051bb5ec7018be
|
…ics (kubernetes-sigs#7338) * use float instead of int in cluster_queue_weighted_share metric and add cohort label * don't use two fields for weighted share * adjust metric test util to the changes * make ExpectClusterQueueWeightedShareMetric accept float64 as value * adjust integration test * report NaN instead of max_int when weight is 0 * remove unused imports in e2e tests * use float instead of int in cohort_weighted_share metric * fix format and naming cleanup
…ics (kubernetes-sigs#7338) * use float instead of int in cluster_queue_weighted_share metric and add cohort label * don't use two fields for weighted share * adjust metric test util to the changes * make ExpectClusterQueueWeightedShareMetric accept float64 as value * adjust integration test * report NaN instead of max_int when weight is 0 * remove unused imports in e2e tests * use float instead of int in cohort_weighted_share metric * fix format and naming cleanup
|
/release-note-edit |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This change updates the
kueue_cluster_queue_weighted_shareandkueue_cohort_weighted_sharemetrics to report precise fair sharing weights, rather than rounded values, and adds a cohort label for better context.Which issue(s) this PR fixes:
Fixes #7244
Special notes for your reviewer:
Does this PR introduce a user-facing change?