Skip to content

MultiKueue: Support sequential attempts to try worker clusters #3757

@mimowo

Description

@mimowo

What would you like to be added:

We would like to try sequentially the worker clusters, not all of them at the same time. The attempts could be time-based.

This will require at least API for controlling the time between the attempts. Also, the question -should the timeout be global, per manager, or per worker. Needs to be designed.

Why is this needed:

  • To avoid the risk of admitting the same workload on two clusters at the same time, and thus possibly doing preemptions on both clusters
  • To prioritize the use of some clusters over others. For example a user may have one cluster with reservations, and one auto-scaled. The user prefers to first try the reservation cluster, and only as a fallback try autoscaling.
  • To avoid autoscaling on multiple worker clusters at the same time.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

Metadata

Metadata

Assignees

Labels

kind/featureCategorizes issue or PR as related to a new feature.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions