Skip to content

Conversation

@JesseStutler
Copy link
Member

@JesseStutler JesseStutler commented Jul 24, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

In #4430, we removed these lines:

if !b.capacityCheckEnabled {
return true, nil
}
, which is important when EnableCSIStorage server option is false. Now when we want to create Dynamic provision pod, the scheduler will panic at this line:
capacities, err := b.csiStorageCapacityLister.List(labels.Everything())

Besides, current volume binding e2e testing incorrectly use default-scheduler to schedue pods requesting PVCs. These pods should use volcano to schedule, therefore I add test/e2e/util/scheduler_injector.go, it can modify the schedulerName of the pod non-invasively without copying the k8s source code. And I add a new Dynamic Provision e2e testing case

Which issue(s) this PR fixes:

Fixes #4480

Special notes for your reviewer:

Does this PR introduce a user-facing change?


@volcano-sh-bot volcano-sh-bot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 24, 2025
@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 24, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @JesseStutler, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical panic within the Volcano scheduler's volume binding logic and significantly improves the robustness and accuracy of its end-to-end (e2e) testing for volume-related functionalities. It ensures that e2e tests correctly interact with the Volcano scheduler and expands test coverage to include dynamic volume provisioning scenarios.

Highlights

  • Fixes Volume Binding Panic: Implemented a safeguard in the volume binder's hasEnoughCapacity function to prevent a panic that occurred when CSI storage capacity checking was disabled. The fix ensures the function returns early if capacityCheckEnabled is false, avoiding nil pointer dereferences.
  • Enhances E2E Test Scheduler Injection: Introduced a new utility (test/e2e/util/scheduler_injector.go) that wraps the Kubernetes clientset. This utility automatically injects the Volcano scheduler name into pod specifications during e2e test execution, ensuring that all pods created by the tests are correctly scheduled by Volcano.
  • Adds New Dynamic Provisioning E2E Test: A new end-to-end test case has been added under the 'Dynamic Provisioning' suite to validate the successful provisioning of storage using the Kind cluster's standard storage class (rancher.io/local-path) when scheduled by Volcano.
  • Configurable Ignored Provisioners: Extended the Volcano scheduler's configuration to allow specifying storage provisioners that should be ignored. This is exposed via Helm chart values and utilized in e2e test scripts (e.g., ignoring kubernetes.io/no-provisioner) to provide better control over the test environment.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@JesseStutler
Copy link
Member Author

@volcano-sh-bot
Copy link
Contributor

@JesseStutler: GitHub didn't allow me to request PR reviews from the following users: ouyangshengjia.

Note that only volcano-sh members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @Monokaix @ouyangshengjia @wangyang0616

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a panic in the volume binder and improves the e2e test suite for volume binding. The fix for the panic is direct and correct, and the introduction of the schedulerInjectingClientset is a clean, non-invasive method to ensure pods in tests are scheduled by Volcano. The new dynamic provisioning test case is a valuable addition. The suggestion improves the robustness of the new Helm chart configuration.


e2e-test-schedulingbase: images
E2E_TYPE=SCHEDULINGBASE ./hack/run-e2e-kind.sh
E2E_TYPE=SCHEDULINGBASE IGNORED_PROVISIONERS="kubernetes.io/no-provisioner" ./hack/run-e2e-kind.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ignore this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in volume binding e2e testing, setupStorageClass uses the no-provisioner storageclass to set up:
https://github.com/JesseStutler/volcano/blob/3aa260cf38ed112a1764473a316e2f280ae4571c/test/e2e/schedulingbase/volume_binding.go#L451-L462
If I don't set ignored-provisioners here,

func (sc *SchedulerCache) addPodCSIVolumesToTask(pi *schedulingapi.TaskInfo) error {
volumes, err := sc.getPodCSIVolumes(pi.Pod)
if err != nil {
klog.Errorf("got pod csi attachment persistent volumes count error: %s", err.Error())
return err
}
for key, count := range volumes {
pi.Resreq.AddScalar(key, float64(count))
}
return nil
}
will add an attachment resource for pods requesting PVCs, then overused validation in proportion plugin will reject the pod to be scheduled because the default queue didn't set attachment resource quota.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a comment here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK done

})
})

f.Context("Dynamic Provisioning", f.WithSlow(), func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a dedicated case just for volcano?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, since we do not have a cloud vendor offering PVs, I use kind's "standard" storageclass to provide PVs

}

func (p *schedulerInjectingPodInterface) Create(ctx context.Context, pod *v1core.Pod, opts metav1.CreateOptions) (*v1core.Pod, error) {
if pod.Spec.SchedulerName == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if pods already set a schedulerName?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think these pods should be scheduled by other schedulers, such as kube-scheduler, but in our use cases there are no cases specify schedulerName to default-scheduler, it should be fine. And currently scheduler_injector is only used in volume_binding e2e testing cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schedulerName field is set by kube-scheduler's e2e test, right?
If so, the schedulerName will be set to default-scheduler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No in volume binding e2e testing, the pod functions related to creation (copied from the kubernetes repo) do not set the schedulerName. My injector is to inject the schedulerName before issuing the pod creation request. If the schedulerName is set to volcano, the kube-apiserver will not set the default value for this pod.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't determine the behavior of k8s e2e, so it's better to rewrite this field directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Updated

)

ginkgo.BeforeEach(func(ctx context.Context) {
f.ClientSet = e2eutil.NewSchedulerInjectingClientset(f.ClientSet, e2eutil.SchedulerName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about set a parameter --scheduler-name="volcano,defaule-scheduler" to let volcano schedule all pods directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may cause kube-scheduler and volcano to compete for scheduling pods and cause strange behavior, set the schedulerName directly will make sure everything is correct and scheduled by volcano

@JesseStutler JesseStutler changed the title Fix volume binding panic and volume binding e2e testing Fix volume binding e2e testing Jul 26, 2025
@JesseStutler JesseStutler requested a review from Monokaix July 28, 2025 06:52
1. Set pod schedulerName to volcano
2. Add ignored-provisioner for volume binding e2e testing
3. Add dynamic provision e2e testing case

Signed-off-by: JesseStutler <[email protected]>
@Monokaix
Copy link
Member

/approve

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Monokaix

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 30, 2025
@hwdef
Copy link
Member

hwdef commented Aug 1, 2025

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 1, 2025
@volcano-sh-bot volcano-sh-bot merged commit 90a6735 into volcano-sh:master Aug 1, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Volcano scheduler panic when scheduling Pods with delayed binding PVCs

4 participants