Skip to content

Conversation

@tg123
Copy link
Contributor

@tg123 tg123 commented Oct 7, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Background:
I am integrating Kueue with a custom job type that has its own flag to control pause and resume behavior. This pause/resume mechanism is independent of GenericJob.Suspend(), as Suspend() has a different semantic meaning and does not directly map to workload.active.

Problem:
When the custom job is paused, it should release all resources and enter a hibernation state — behavior equivalent to setting workload.active = false.
Currently, jobs can be paused by implementing JobWithCustomWorkloadConditions and emitting a WorkloadDeactivationTarget, but there is no way to re-activate the workload once it’s deactivated.

Solution:
This PR introduces the JobWithCustomWorkloadActivation interface, allowing workloads linked with custom jobs to explicitly decide when to sleep or wake, enabling full control over activation and deactivation lifecycle.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

JobFramework: Introduce an optional interface for custom Jobs, called JobWithCustomWorkloadActivation, which can be used to deactivate or active a custom CRD workload.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 7, 2025
@netlify
Copy link

netlify bot commented Oct 7, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 8f2e991
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/68ee04dd87c63c00082472c0

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 7, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @tg123!

It looks like this is your first PR to kubernetes-sigs/kueue 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kueue has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 7, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @tg123. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 7, 2025
@tg123 tg123 marked this pull request as ready for review October 8, 2025 06:48
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 8, 2025
@k8s-ci-robot k8s-ci-robot requested a review from PBundyra October 8, 2025 06:48
@mimowo
Copy link
Contributor

mimowo commented Oct 8, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 8, 2025
@mimowo
Copy link
Contributor

mimowo commented Oct 10, 2025

Nice feature @tg123 . So, IIUC you have a custom Job which has a field on the CRD indicating "hybernated / active". When hybernated you would like to set workload.status.active=false, if "active" then give the workload status.active=true.

I'm happy with that interface, just out of curiosity. Did you consider as alternative to simulate hybernation by setting Jobs' suspend=true + removing "queue-name"? Then, "activation" would be adding back "queue-name" and setting "suspend=false". I have not tested that flow so there might be gotchas, it is also more involving on your side. Also, maybe some metrics for Jobs would be skewed in this process, so there are definitely drawbacks.

@mimowo
Copy link
Contributor

mimowo commented Oct 10, 2025

/release-note-edit

Introduce an optional interface for custom Jobs, called JobWithCustomWorkloadActivation, which can be used to deactivate or active a custom CRD workload.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 10, 2025
@mimowo
Copy link
Contributor

mimowo commented Oct 10, 2025

Actually, we could also use this enhancement for all Jobs and implement JobWithCustomActivation interface based on a dedicated Kueue label, say kueue.x-k8s.io/active. Currently, users need to interact directly with a workload to activate / deactivate their workloads. The northstar is that users only interact directly with Jobs. So this feature could be used to let users activate / deactivate just by interacting with the Job iteself. However, we can keep this as a separate feature to Kueue.

cc @tenzen-y

@mimowo
Copy link
Contributor

mimowo commented Oct 10, 2025

Could you please consider adding a unit test. I think this would be possible using our mocks, see here:

// Code generated by MockGen. DO NOT EDIT.
// Source: sigs.k8s.io/kueue/pkg/controller/jobframework (interfaces: GenericJob,JobWithCustomValidation,JobWithManagedBy)
//
// Generated by this command:
//
// mockgen -destination=internal/mocks/controller/jobframework/interface.go -package mocks sigs.k8s.io/kueue/pkg/controller/jobframework GenericJob,JobWithCustomValidation,JobWithManagedBy
//

@mimowo
Copy link
Contributor

mimowo commented Oct 10, 2025

Also, @tg123 would you like to cherrypick this to 0.13 or 0.14 branches? Since this is just a new interface I think we could do that

@tg123
Copy link
Contributor Author

tg123 commented Oct 11, 2025

@mimowo ah good idea, by removing queue label or moving them to a sleeping queue

this sounds like work to me, but need to introduce a label/annotation to remember previous queue name, our crd allows users to choose which queue to use

i will add ut and questoin for cherry-pick: what is the policy of cherry-pick, prefer squash into single commit cherry-pick right?

@mimowo
Copy link
Contributor

mimowo commented Oct 13, 2025

@mimowo ah good idea, by removing queue label or moving them to a sleeping queue

Well, I think the proposal is actually a better idea, that was just a workaround proposal :) but also it is inferior because we block removal of queue-name for already running Jobs.

i will add ut and questoin for cherry-pick: what is the policy of cherry-pick, prefer squash into single commit cherry-pick right?

Yes, please squash it is slightly easier to manage cherrypicks then.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 14, 2025
@tg123
Copy link
Contributor Author

tg123 commented Oct 14, 2025

update:
1 impl kueue.x-k8s.io/active
2 add ut

…n logic to respect workload active state"

This reverts commit 4ad258f.
@mimowo
Copy link
Contributor

mimowo commented Oct 15, 2025

/lgtm
/approve
Thanks 👍
/cherrypick release-0.14
/cherrypick release-0.13

@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: once the present PR merges, I will cherry-pick it on top of release-0.13, release-0.14 in new PRs and assign them to you.

In response to this:

/lgtm
/approve
Thanks 👍
/cherrypick release-0.14
/cherrypick release-0.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 15, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, tg123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 4d71e00756b773f50d5cc402f7a9ce58b3006168

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 15, 2025
@k8s-ci-robot k8s-ci-robot merged commit d0a72a6 into kubernetes-sigs:main Oct 15, 2025
23 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.15 milestone Oct 15, 2025
@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: new pull request created: #7270

In response to this:

/lgtm
/approve
Thanks 👍
/cherrypick release-0.14
/cherrypick release-0.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: #7199 failed to apply on top of branch "release-0.13":

Applying: Add JobWithCustomWorkloadActivation interface and update reconciliation logic
Applying: Add tests for workload activation in generic job reconciliation
Using index info to reconstruct a base tree...
M	pkg/controller/jobframework/reconciler_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/controller/jobframework/reconciler_test.go
CONFLICT (content): Merge conflict in pkg/controller/jobframework/reconciler_test.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0002 Add tests for workload activation in generic job reconciliation

In response to this:

/lgtm
/approve
Thanks 👍
/cherrypick release-0.14
/cherrypick release-0.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mimowo
Copy link
Contributor

mimowo commented Oct 15, 2025

@tg123 the PR prepared by the robot does not build, would you like to prepare a cherrypick PR manually using the ./hack/cherry_pick_pull.sh?

@tg123
Copy link
Contributor Author

tg123 commented Oct 15, 2025

sure will send 2 pr to pick
thanks

tg123 added a commit to tg123/kueue that referenced this pull request Oct 16, 2025
…7199)

* Add JobWithCustomWorkloadActivation interface and update reconciliation logic

* Add tests for workload activation in generic job reconciliation

* Add WorkloadActiveLabel constant and update job reconciliation logic to respect workload active state

* Revert "Add WorkloadActiveLabel constant and update job reconciliation logic to respect workload active state"

This reverts commit 4ad258f.
tg123 added a commit to tg123/kueue that referenced this pull request Oct 16, 2025
…7199)

* Add JobWithCustomWorkloadActivation interface and update reconciliation logic

* Add tests for workload activation in generic job reconciliation

* Add WorkloadActiveLabel constant and update job reconciliation logic to respect workload active state

* Revert "Add WorkloadActiveLabel constant and update job reconciliation logic to respect workload active state"

This reverts commit 4ad258f.
k8s-ci-robot pushed a commit that referenced this pull request Oct 16, 2025
…7286)

* introduce JobWithCustomWorkloadActivation interface (#7199)

* Add JobWithCustomWorkloadActivation interface and update reconciliation logic

* Add tests for workload activation in generic job reconciliation

* Add WorkloadActiveLabel constant and update job reconciliation logic to respect workload active state

* Revert "Add WorkloadActiveLabel constant and update job reconciliation logic to respect workload active state"

This reverts commit 4ad258f.

* refactor: simplify mock expectations in TestReconcileGenericJobWithCustomWorkloadActivation
k8s-ci-robot pushed a commit that referenced this pull request Oct 16, 2025
* Add JobWithCustomWorkloadActivation interface and update reconciliation logic

* Add tests for workload activation in generic job reconciliation

* Add WorkloadActiveLabel constant and update job reconciliation logic to respect workload active state

* Revert "Add WorkloadActiveLabel constant and update job reconciliation logic to respect workload active state"

This reverts commit 4ad258f.
@mimowo
Copy link
Contributor

mimowo commented Oct 22, 2025

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 22, 2025
@tenzen-y
Copy link
Member

I think manipulating the activation by label is a nice feature, but do you have a use case for it? I mentioned that as a possibility in the future, but maybe it is safer to scope the PR to just the interface.

Since it extends the API surface I think it is preferred to go via the KEP process. cc @tenzen-y

For example, it got me thinking about the scenario when the user activates by setting the label active=true, but then Kueue wants to deactivate. In that case we may have two competing mechanism, and so Kueue may change status.active=false, but then the new mechanism will flip it back to status.active=true. So, I think this may require more thought.

To solve this problem (only thought experiment) when Kueue deactivates a workload it should also clear the label / annotation. Alternatively, it could clear the label or annotation as soon as the expected state is reached by the workload. wdyt?

For these types of interactions we certainly need an integration test.

I agree with postponing the decision for active annotation support.

@tenzen-y
Copy link
Member

/release-note-edit

JobFramework: Introduce an optional interface for custom Jobs, called JobWithCustomWorkloadActivation, which can be used to deactivate or active a custom CRD workload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants