Skip to content

nutanix upgrade 4.18 -> 4.19 #65740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 23, 2025

Conversation

skordas
Copy link
Contributor

@skordas skordas commented Jun 5, 2025

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 5, 2025
@skordas
Copy link
Contributor Author

skordas commented Jun 5, 2025

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-nutanix-4.19-nightly-x86-loaded-upgrade-from-4.18-loaded-upgrade-418to419-24nodes

@openshift-ci-robot
Copy link
Contributor

@skordas: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@skordas
Copy link
Contributor Author

skordas commented Jun 5, 2025

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-nutanix-4.19-nightly-x86-loaded-upgrade-from-4.18-loaded-upgrade-418to419-24nodes

@openshift-ci-robot
Copy link
Contributor

@skordas: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@skordas
Copy link
Contributor Author

skordas commented Jun 5, 2025

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-nutanix-4.19-nightly-x86-loaded-upgrade-from-4.18-loaded-upgrade-418to419-24nodes

@openshift-ci-robot
Copy link
Contributor

@skordas: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Copy link
Contributor

@mehabhalodiya mehabhalodiya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try re-running with the suggested changes, and we will check if the error persists? I think the same worked for an IBM profile, so let's see how it goes with a Nutanix profile.

memory: 200Mi
tests:
- as: loaded-upgrade-418to419-24nodes
cluster: build01
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this cluster field.

@skordas
Copy link
Contributor Author

skordas commented Jun 6, 2025

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-nutanix-4.19-nightly-x86-loaded-upgrade-from-4.18-loaded-upgrade-418to419-24nodes

@openshift-ci-robot
Copy link
Contributor

@skordas: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@skordas
Copy link
Contributor Author

skordas commented Jun 9, 2025

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-nutanix-4.19-nightly-x86-loaded-upgrade-from-4.18-loaded-upgrade-418to419-24nodes

@openshift-ci-robot
Copy link
Contributor

@skordas: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@skordas
Copy link
Contributor Author

skordas commented Jun 11, 2025

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-nutanix-4.19-nightly-x86-loaded-upgrade-from-4.18-loaded-upgrade-418to419-24nodes

@openshift-ci-robot
Copy link
Contributor

@skordas: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@skordas
Copy link
Contributor Author

skordas commented Jun 12, 2025

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-nutanix-4.19-nightly-x86-loaded-upgrade-from-4.18-loaded-upgrade-418to419-24nodes

@openshift-ci-robot
Copy link
Contributor

@skordas: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@skordas
Copy link
Contributor Author

skordas commented Jun 12, 2025

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-nutanix-4.19-nightly-x86-loaded-upgrade-from-4.18-loaded-upgrade-418to419-24nodes

@openshift-ci-robot
Copy link
Contributor

@skordas: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@skordas
Copy link
Contributor Author

skordas commented Jun 13, 2025

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-nutanix-4.19-nightly-x86-loaded-upgrade-from-4.18-loaded-upgrade-418to419-24nodes

@openshift-ci-robot
Copy link
Contributor

@skordas: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@skordas
Copy link
Contributor Author

skordas commented Jun 13, 2025

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-nutanix-4.19-nightly-x86-loaded-upgrade-from-4.18-loaded-upgrade-418to419-24nodes

@openshift-ci-robot
Copy link
Contributor

@skordas: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci openshift-ci bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 13, 2025
@skordas
Copy link
Contributor Author

skordas commented Jun 13, 2025

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-nutanix-4.19-nightly-x86-loaded-upgrade-from-4.18-loaded-upgrade-418to419-24nodes

@openshift-ci-robot
Copy link
Contributor

@skordas: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 13, 2025
@skordas
Copy link
Contributor Author

skordas commented Jun 18, 2025

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-nutanix-4.19-nightly-x86-loaded-upgrade-from-4.18-loaded-upgrade-418to419-24nodes

@openshift-ci-robot
Copy link
Contributor

@skordas: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@skordas
Copy link
Contributor Author

skordas commented Jun 18, 2025

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-nutanix-4.19-nightly-x86-loaded-upgrade-from-4.18-loaded-upgrade-418to419-24nodes

@openshift-ci-robot
Copy link
Contributor

@skordas: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

env:
COMPUTE_CPU: "8"
COMPUTE_MEMORY: "32000"
COMPUTE_REPLICAS: "3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add GC: "false" as we want to keep the workload during the upgrade.

Also, maybe you can add these params and try

OPENSHIFT_INFRA_NODE_INSTANCE_MEMORYSIZE: 64Gi
OPENSHIFT_INFRA_NODE_INSTANCE_VCPU: "16"
SET_ENV_BY_PLATFORM: custom

Because I see the error

Events:
  Type     Reason        Age   From               Message
  ----     ------        ----  ----               -------
  Warning  FailedCreate  136m  nutanixcontroller  ci-op-gmvftkmm-53c18-vs8np-worker-1-77gnn: reconciler failed to Create machine: failed to create VM: ci-op-gmvftkmm-53c18-vs8np-worker-1-77gnn failed to create the vm: error_detail: INVALID_ARGUMENT: Invalid Argument: 6
  :No host has enough available resources for VM bd737d9f-14b3-47d2-9528-7ec3eaa8622f., progress_message: create_vm
  Warning  FailedUpdate  98s (x273 over 135m)  nutanixcontroller  ci-op-gmvftkmm-53c18-vs8np-worker-1-77gnn: reconciler failed to Update machine: The retrieved VM "ci-op-gmvftkmm-53c18-vs8np-worker-1-77gnn" has ERROR state. error: [{"message": "Invalid Argument: 6\n  :No host has enough available resources for VM bd737d9f-14b3-47d2-9528-7ec3eaa8622f.", "reason": "INVALID_ARGUMENT"}]
error: all 24 nodes didn't become READY in time, failing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mehabhalodiya
This test is very unstable - every time is fails on different step. Last time was with no enough resources for scaling from 3 to 24 - but that is issue of not enough resources. - not of step.
I added GC variable, but the rest are for additional infra nodes - I'm not adding this for this test (test configuration will not pass make jobs make update verification.

- ref: workers-scale
- chain: openshift-qe-cluster-density-v2
- chain: openshift-upgrade-qe-sanity
- ref: openshift-qe-connectivity-check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove - ref: openshift-qe-connectivity-check, as it is not required; this step is typically used in an IPsec cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this step as suggested.

@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@skordas: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-nutanix-4.19-nightly-x86-loaded-upgrade-from-4.18-loaded-upgrade-418to419-24nodes openshift-eng/ocp-qe-perfscale-ci presubmit Presubmit changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@skordas
Copy link
Contributor Author

skordas commented Jun 20, 2025

/pj-rehearse pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-nutanix-4.19-nightly-x86-loaded-upgrade-from-4.18-loaded-upgrade-418to419-24nodes

@openshift-ci-robot
Copy link
Contributor

@skordas: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Copy link
Contributor

@mehabhalodiya mehabhalodiya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thank you!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 23, 2025
Copy link
Contributor

openshift-ci bot commented Jun 23, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mehabhalodiya, skordas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mehabhalodiya
Copy link
Contributor

/pj-rehearse ack

@openshift-ci-robot
Copy link
Contributor

@mehabhalodiya: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Jun 23, 2025
Copy link
Contributor

openshift-ci bot commented Jun 23, 2025

@skordas: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit a889caf into openshift:master Jun 23, 2025
16 checks passed
liweinan pushed a commit to liweinan/release that referenced this pull request Aug 5, 2025
* nutanix upgrade 4.18 -> 4.19

* Mulitzone istallation

* New flow for upgrade

* Debugging  - something is wrong here

* Revert "Debugging  - something is wrong here"

This reverts commit bad7647.

* adding GC variable to test, removing step

* fix for metadata
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants