Skip to content

PodGroup stuck in Pending Phase  #3910

@caushie-akamai

Description

@caushie-akamai

Description

We are currently running the Kubeflow spark-operator on Linode LKE kubernetes with auto-scaling enabled.

I have noticed that when trying to trigger a large auto-scaling (i.e. trying to trigger to use all possible nodes in a node-pool) Volcano pods are stuck in Pending phase and PodGroup is also stuck on pending.
If I reduce the RAM and CPU request to half, the scaling up will then be triggered and the PodGroup is successful. Not sure why or how the calculation is done in the volcano side.
My volcano config is:

actions: "enqueue, allocate,preempt, backfill"
tiers:
- plugins:
  - name: priority
  - name: conformance
- plugins:
  - name: overcommit
    arguments:  
    overcommit-factor: 15.0
  - name: drf
    enablePreemptable: false
  - name: predicates
  - name: capacity
  - name: nodeorder
  - name: binpack

My queue:

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: myqueue
spec:
  reclaimable: true
  weight: 1
  capability:
    cpu: "200"
    memory: "1200G"
status:
  state: Open

My K8s cluster has auto-scaling enabled in the node-pool with Min 1 and Max 10 nodes (16C,300GB Ram)

My spark job conf:

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-sleep-2
  namespace: spark
spec:
  type: Python
  mode: cluster
  image: "redacted-private-repo/spark:k8s-3.5.1"
  imagePullSecrets:
    - "regcred"
  imagePullPolicy: Always
  mainApplicationFile: "local:///opt/spark/work-dir/sleep_forever.py"
  sparkVersion: "3.5.1"
  batchScheduler: volcano
  batchSchedulerOptions:
    priorityClassName: urgent
    queue: myqueue
  restartPolicy:
    type: Never
  driver:
    cores: 2
    memory: "8G"
    labels:
      version: 3.5.1
    serviceAccount: spark
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: "node.kubernetes.io/instance-type"
                  operator: In
                  values:
                    - "g6-dedicated-32"
  executor:
    cores: 15
    instances: 8
    memory: "200G"
    labels:
      version: 3.5.1
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: "node.kubernetes.io/instance-type"
                  operator: In
                  values:
                    - "g7-highmem-16"

I have tried various volcano-scheduler.conf options but the same error persists.
PodGroup reports:
1/1 tasks in gang unschedulable: pod group is not ready, 1 Pending, 1 minAvailable; Pending: 1 Unschedulable
queue resource quota insufficient

apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  name: spark-spark-sleep-2-pg
  namespace: spark
.
.
.
.
status:
  phase: Pending
spec:
  minMember: 1
  minResources:
    cpu: '122'
    memory: 1608G
  priorityClassName: urgent
  queue: myqueue

Is anyone aware on how to fix this issue? I removed gang scheduling plugin as per #2558 but that did not work.

Describe the results you received and expected

PodGroup stuck on Pending

What version of Volcano are you using?

1.10

Any other relevant information

k8s 1.29

Metadata

Metadata

Labels

kind/bugCategorizes issue or PR as related to a bug.priority/high

Type

Projects

Status

In review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions