[RFC ]region level isolation #121

mittalrishabh · 2025-12-11T07:41:19Z

Created RFC for region level isolation

Signed-off-by: rishabh_mittal <[email protected]>

Tema · 2025-12-15T20:32:32Z

text/region-level-resource-isolation.md

+These functionalities are missing in the current implementation. 
+
+1. **Region-level fairness**: Hot regions (with hot keys or large scans) should be deprioritized to prevent resource monopolization within a tenant
+3. **Traffic Moderation**: In a multi-tenant SOA environment, setting correct rate limits is challenging - limits that are too tight reject valid traffic, while limits that are too loose allow overload. Instead of hard rate limits, implement adaptive traffic moderation that responds to sudden spikes on hot regions by gracefully deprioritizing rather than outright rejecting requests


nit: 3rd when it is 2nd

This point sounds more like non-goal. Move it to alternative approaches section?

Separately, do you want to add a goal around non interfering with region split?

This is a goal. If we don't have this goal then system will become overloaded after split.

Tema · 2025-12-15T20:34:40Z

text/region-level-resource-isolation.md

+
+### Traffic moderation and split/scatter
+
+Currently, split/scatter is non-deterministic when node is overloaded - it depends on how many requests on this region are succeeded. With this design, Hot regions accumulate high VT and get deprioritized, which slows down split decisions based on served QPS.


If split depends on requests succeeded then deprioritization of such requests will delay the split. Should it use scheduled/dropped QPS rather than succeeded instead>

Yes, we can use scheduled/dropped qps. I will modify the design.

Tema · 2025-12-15T20:35:44Z

text/region-level-resource-isolation.md

+### Background Task Demotion
+
+Background tasks (GC, compaction, statistics) use LOW `group_priority` regardless of their resource group's configured priority:
+```
+group_priority = LOW  // instead of resource group's configured priority
+```
+
+This ensures foreground traffic is always prioritized over background.


Is the proposal here to create a virtual RG for background task with low priority to schedule them relative to other traffic as well?

tagged you below in the implementation.

Tema · 2025-12-15T20:43:48Z

text/region-level-resource-isolation.md

+
+When queue is full:
+1. Calculate priority of incoming task
+2. Compare with lowest priority task in queue


That is probably hard to implement effectively. Will it require more one priority queue with inverse priority? Or the current SkipMap implementation allows popping from both ends effectively?

yes, current skipmap allows. It is O(log N) complexity.

Tema · 2025-12-15T22:20:05Z

text/region-level-resource-isolation.md

+2. **Shared region fairness issues**: When multiple resource groups access the same region, two fairness problems arise:
+   - **Innocent tenant penalized**: Tenant A's heavy usage increases the region's VT, penalizing Tenant B's requests to that region even though Tenant B didn't cause the hotness
+   - **Hot region stays hot**: If Tenant A and B alternate requests to a shared region, each tenant's group_vt stays low (they're taking turns), so the region never gets properly deprioritized despite being continuously hot


Did you consider to have a region tracker per group? What are trade offs of this approach instead?

what is region tracker per group ?

Current design tracks cpu per regions across all tenants which results on the highlighted issues. I'm asking if you considered to track it per tenant.

existing design is if a region is split into r1 and r2 then they will share the same VT if cpu utilization is more than 80%. Having region tracker per group will complicate this design.

mittalrishabh · 2025-12-16T04:19:41Z

text/region-level-resource-isolation.md

+
+        // 3. Use LOW priority for background tasks
+        let group_priority = if metadata.is_background() {
+            LOW


@Tema this is how priority is decided for background task

region level isolation

5b39bed

Signed-off-by: rishabh_mittal <[email protected]>

ti-chi-bot bot added dco-signoff: yes contribution labels Dec 11, 2025

mittalrishabh changed the title ~~region level isolation~~ [RFC ]region level isolation Dec 11, 2025

rishabh_mittal added 3 commits December 11, 2025 09:41

region level isolation

cf06dce

Signed-off-by: rishabh_mittal <[email protected]>

region level isolation

10e0a96

Signed-off-by: rishabh_mittal <[email protected]>

add traffic moderation

194228b

Signed-off-by: rishabh_mittal <[email protected]>

Tema reviewed Dec 15, 2025

View reviewed changes

mittalrishabh commented Dec 16, 2025

View reviewed changes


		### Traffic moderation and split/scatter

		Currently, split/scatter is non-deterministic when node is overloaded - it depends on how many requests on this region are succeeded. With this design, Hot regions accumulate high VT and get deprioritized, which slows down split decisions based on served QPS.

[RFC ]region level isolation #121

Are you sure you want to change the base?

[RFC ]region level isolation #121

Uh oh!

Conversation

mittalrishabh commented Dec 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants