Skip to content

Conversation

@stutibiyani
Copy link
Contributor

@stutibiyani stutibiyani commented Oct 14, 2025

Description

This PR adds observability metrics to the QueryThrottler to track throttling behavior with granular visibility into request patterns and throttling decisions.

Key Changes:

Metrics Addition: Introduced two new multi-label counters to track:

  • QueryThrottlerRequests: Total number of requests evaluated by the query throttler
  • QueryThrottlerThrottled: Number of requests that were throttled

Both metrics are labeled by:

  • strategy: The throttling strategy being used (e.g., "MockStrategy", "Unknown")
  • workload: The workload name from ExecuteOptions (defaults to "default" or "unknown")
  • priority: The query priority value (0-100, defaults to 100)

Test Plan

Metrics Tracking Tests:

  1. Updated unit test to verify that:
  • requestsTotal counter increments for all requests (both throttled and allowed)
  • requestsThrottled counter increments only when throttling occurs
  • Metrics are properly labeled with strategy, workload, and priority
  • Metrics behavior is correct in both normal and dry-run modes
  1. Tested in Uber's environment
  • Total requests
Screenshot 2025-10-14 at 2 49 02 PM
  • Throttled requests
Screenshot 2025-10-14 at 2 55 44 PM

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

AI Disclosure

@vitess-bot
Copy link
Contributor

vitess-bot bot commented Oct 14, 2025

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Oct 14, 2025
@github-actions github-actions bot added this to the v24.0.0 milestone Oct 14, 2025
@stutibiyani stutibiyani marked this pull request as ready for review October 14, 2025 09:36
@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@3dd1516). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #18740   +/-   ##
=======================================
  Coverage        ?   69.83%           
=======================================
  Files           ?     1610           
  Lines           ?   215370           
  Branches        ?        0           
=======================================
  Hits            ?   150404           
  Misses          ?    64966           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@timvaillancourt timvaillancourt added Component: Observability Pull requests that touch tracing/metrics/monitoring Component: VTTablet Type: Enhancement Logical improvement (somewhere between a bug and feature) and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Oct 22, 2025
Comment on lines 122 to 136
attrs := registry.QueryAttributes{
WorkloadName: extractWorkloadName(options),
Priority: extractPriority(options),
}
strategyName := tStrategy.GetStrategyName()
workload := attrs.WorkloadName
priorityStr := strconv.Itoa(attrs.Priority)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stutibiyani do we need the attrs struct?

I wonder if we can just do this:

Suggested change
attrs := registry.QueryAttributes{
WorkloadName: extractWorkloadName(options),
Priority: extractPriority(options),
}
strategyName := tStrategy.GetStrategyName()
workload := attrs.WorkloadName
priorityStr := strconv.Itoa(attrs.Priority)
strategyName := tStrategy.GetStrategyName()
workload := extractWorkloadName(options)
priorityStr := strconv.Itoa(extractPriority(options))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @timvaillancourt we added the attrs since it is required in specific strategy implementation as well. So we want to avoid that recomputation.

@stutibiyani stutibiyani marked this pull request as draft November 4, 2025 05:49
@stutibiyani stutibiyani force-pushed the throttler-metrics branch 2 times, most recently from 40e30a1 to bbd6c4d Compare November 5, 2025 06:00
@mattlord mattlord self-requested a review November 13, 2025 19:27
@mattlord mattlord self-assigned this Nov 13, 2025
@stutibiyani stutibiyani marked this pull request as ready for review December 1, 2025 10:53
@promptless
Copy link

promptless bot commented Dec 1, 2025

📝 Documentation updates detected!

New suggestion: Document QueryThrottler metrics

@mattlord
Copy link
Member

mattlord commented Dec 1, 2025

@stutibiyani Thank you! Here's a docs PR that you can review and modify: vitessio/website#2031

@mattlord mattlord removed the request for review from systay December 1, 2025 13:29
Copy link
Member

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a few minor notes. Please let me know what you think.

Also, please note the docs PR that I linked to. I can help get you set up in the website repo as needed.

Thank you! ❤️

@stutibiyani
Copy link
Contributor Author

LGTM! Just a few minor notes. Please let me know what you think.

Also, please note the docs PR that I linked to. I can help get you set up in the website repo as needed.

Thank you! ❤️

Hi @mattlord! I have addressed your comments on the PR. Will also update the docs PR in a day or two.

Copy link
Member

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you, @stutibiyani ! ❤️

@mattlord
Copy link
Member

mattlord commented Dec 3, 2025

@stutibiyani looks like the new unit tests are now failing. We'll have to get that fixed up before merging. Please let me know if I can be of any help.

qt.stats.requestsThrottled.Add([]string{strategyName, workload, priorityStr, decision.MetricName, strconv.FormatFloat(decision.MetricValue, 'f', -1, 64), strconv.FormatBool(tCfg.DryRun)}, 1)

// If dry-run mode is enabled, log the decision but don't throttle
if qt.cfg.DryRun {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have tCfg := qt.cfg above, we should use tCfg. Otherwise, we can just reference qt.cfg everywhere (IMO).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed updating this, thank you!

env: env,
stats: Stats{
requestsTotal: env.Exporter().NewCountersWithMultiLabels(_queryThrottlerAppName+"Requests", "query throttler requests", []string{"Strategy", "Workload", "Priority"}),
requestsThrottled: env.Exporter().NewCountersWithMultiLabels(_queryThrottlerAppName+"Throttled", "query throttler requests throttled", []string{"Strategy", "Workload", "Priority", "MetricName", "MetricValue", "DryRun"}),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cardinality of MetricValue will potentially be very high, right? - If so, it may make sense to use the existing metrics to determine what the value is, instead of emitting it as a tag here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickvanw so this metric will only be emitted when a request is throttled. We have also been using this throttler internally and the cardinality has not been that high. But if you still feel strongly I can remove the MetricValue from the labels.

Signed-off-by: Stuti Biyani <[email protected]>
Signed-off-by: Stuti Biyani <[email protected]>
Signed-off-by: Stuti Biyani <[email protected]>
Signed-off-by: Stuti Biyani <[email protected]>
Signed-off-by: Stuti Biyani <[email protected]>
Signed-off-by: Stuti Biyani <[email protected]>
Signed-off-by: Stuti Biyani <[email protected]>
Signed-off-by: Stuti Biyani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Observability Pull requests that touch tracing/metrics/monitoring Component: VTTablet Type: Enhancement Logical improvement (somewhere between a bug and feature)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants