PodGroup stuck in Pending Phase 

### Description

We are currently running the Kubeflow spark-operator on Linode LKE kubernetes with auto-scaling enabled.

I have noticed that when trying to trigger a large auto-scaling (i.e. trying to trigger to use all possible nodes in a node-pool) Volcano pods are stuck in Pending phase and PodGroup is also stuck on pending. 
If I reduce the RAM and CPU request to half, the scaling up will then be triggered and the PodGroup is successful. Not sure why or how the calculation is done in the volcano side.
My volcano config is: 
```
actions: "enqueue, allocate,preempt, backfill"
tiers:
- plugins:
  - name: priority
  - name: conformance
- plugins:
  - name: overcommit
    arguments:  
    overcommit-factor: 15.0
  - name: drf
    enablePreemptable: false
  - name: predicates
  - name: capacity
  - name: nodeorder
  - name: binpack
```
My queue: 
```
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: myqueue
spec:
  reclaimable: true
  weight: 1
  capability:
    cpu: "200"
    memory: "1200G"
status:
  state: Open
```
My K8s cluster has auto-scaling enabled in the node-pool with Min 1 and Max 10 nodes (16C,300GB Ram) 

My spark job conf: 
```
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-sleep-2
  namespace: spark
spec:
  type: Python
  mode: cluster
  image: "redacted-private-repo/spark:k8s-3.5.1"
  imagePullSecrets:
    - "regcred"
  imagePullPolicy: Always
  mainApplicationFile: "local:///opt/spark/work-dir/sleep_forever.py"
  sparkVersion: "3.5.1"
  batchScheduler: volcano
  batchSchedulerOptions:
    priorityClassName: urgent
    queue: myqueue
  restartPolicy:
    type: Never
  driver:
    cores: 2
    memory: "8G"
    labels:
      version: 3.5.1
    serviceAccount: spark
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: "node.kubernetes.io/instance-type"
                  operator: In
                  values:
                    - "g6-dedicated-32"
  executor:
    cores: 15
    instances: 8
    memory: "200G"
    labels:
      version: 3.5.1
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: "node.kubernetes.io/instance-type"
                  operator: In
                  values:
                    - "g7-highmem-16"
```
I have tried various volcano-scheduler.conf options but the same error persists. 
PodGroup reports:
`1/1 tasks in gang unschedulable: pod group is not ready, 1 Pending, 1 minAvailable; Pending: 1 Unschedulable`
`queue resource quota insufficient`
```
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  name: spark-spark-sleep-2-pg
  namespace: spark
.
.
.
.
status:
  phase: Pending
spec:
  minMember: 1
  minResources:
    cpu: '122'
    memory: 1608G
  priorityClassName: urgent
  queue: myqueue
```

Is anyone aware on how to fix this issue? I removed gang scheduling plugin as per https://github.com/volcano-sh/volcano/issues/2558 but that did not work. 

### Describe the results you received and expected

PodGroup stuck on Pending

### What version of Volcano are you using?

1.10

### Any other relevant information

k8s 1.29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PodGroup stuck in Pending Phase #3910

Description

Describe the results you received and expected

What version of Volcano are you using?

Any other relevant information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PodGroup stuck in Pending Phase #3910

Description

Description

Describe the results you received and expected

What version of Volcano are you using?

Any other relevant information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions