Skip to content

Conversation

@JesseStutler
Copy link
Member

What type of PR is this?

/kind bug

What this PR does / why we need it:

Currently, if a user doesn't configure predicates plugin and then submit a simple VC job, the scheduler will panic. We need to fix it.

Which issue(s) this PR fixes:

Fixes # #4421

Special notes for your reviewer:

I also add a unit test to test if we only specify nodeorder plugin and doesn't configure predicates plugin whether will meet panic

Does this PR introduce a user-facing change?

NONE

@volcano-sh-bot volcano-sh-bot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 2, 2025
@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 2, 2025
@JesseStutler JesseStutler force-pushed the fix_metrics_init branch 2 times, most recently from c7d6e9b to ac2498e Compare July 3, 2025 01:32
// It is safe here to directly use the state to run plugins because we have already initialized the cycle state
// for each pending pod when open session and will not meet nil state
state := ssn.GetCycleState(task.UID)
metrics.Register()
Copy link
Member

@Monokaix Monokaix Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the metrics include? Is it necessary for volcano, or volcano already has these metrics? It's exposed by kube-scheduler framework, may not useful for volcano, we should check that.

Copy link
Member Author

@JesseStutler JesseStutler Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we only need Goroutines metric in kube-scheduler, so now I init this metric(override the gloabl variable ) in InitKubeSchedulerRelatedMetrics and this metrics won't be registered and exported, only to avoid panic when we need to execute kube-scheduler related plugins

@@ -0,0 +1,141 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to this pr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used to judge when we disable predicates plugin, only contain the nodeorder plugin, the scheduler won't panic

// related plugins. And there is no need to export these metrics, therefore currently initialization is enough.
func InitKubeSchedulerRelatedMetrics() {
k8smetrics.Goroutines = metrics.NewGaugeVec(
&metrics.GaugeOpts{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goroutine metrics already included in volcano metrics?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not yet, it's used to record how many goroutines are simultaneously executing in a plugin:
image
image
Do we need to register and export it?

@Monokaix
Copy link
Member

Monokaix commented Jul 7, 2025

/approve

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Monokaix

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 7, 2025
@Monokaix
Copy link
Member

Monokaix commented Jul 7, 2025

backport to release1.12.

@hwdef
Copy link
Member

hwdef commented Jul 9, 2025

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 9, 2025
@volcano-sh-bot volcano-sh-bot merged commit e1b63e1 into volcano-sh:master Jul 9, 2025
18 checks passed
volcano-sh-bot added a commit that referenced this pull request Jul 17, 2025
…4422-origin-release-1.12

Automated cherry pick of #4422: Move kube-scheduler related metrics initilization to server.go to avoid panic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants