-
Notifications
You must be signed in to change notification settings - Fork 788
Description
Description
@jandubois suggested I raise an issue here as well with regards to runfinch/finch#1632
It appears that if the Fedora mirrors are unavailable or inaccessible, dnf needs-restarting will return a non-zero exit code which the cloudinit scripts determine to be the same as needing a reboot.
This results in no visible logging (that I can see) and the VM getting stuck in a reboot loop, chewing up significant host resources and being very difficult to debug.
From the original issue, this was my analysis:
When running using a Corporate Proxy that has its own SSL certificates, bringing up a Finch VM is problematic. Right now, upon first boot, we observe that finch vm init just hangs for about 15 minutes and then crashes.
Upon pulling all of this configuration and code to pieces, I found that the problem lies within the ISO that is downloaded. The script causing us problems is the following:
#!/bin/sh
# SPDX-FileCopyrightText: Copyright The Lima Authors
# SPDX-License-Identifier: Apache-2.0
set -eux
# Check if cloud-init forgot to reboot_if_required
# (only implemented for apt at the moment, not dnf)
if command -v dnf >/dev/null 2>&1; then
# dnf-utils needs to be installed, for needs-restarting
if dnf -h needs-restarting >/dev/null 2>&1; then
# needs-restarting returns "false" if needed (!)
if ! dnf needs-restarting -r >/dev/null 2>&1; then
systemctl reboot
fi
fi
fi
Specifically, take not of the if ! dnf needs-restarting -r >/dev/null 2>&1; then systemctl reboot. Whilst it is true that dnf needs-restarting will return a non-zero exit code if we need to reboot, it also returns a non-zero exit code if it failed to complete.
It turns out that dnf needs-restarting dials out to the Fedora repository mirrors... under a corporate proxy that operates on L3/L4 (e.g. as part of a ZTNA), this won't work. You'll just get the following output (which is somewhat unhelpfully suppressed and sent to /dev/null here):
└─[127] <> docker run --rm -it fedora
[root@97829c00283b /]# dnf needs-restarting
Updating and loading repositories:
Fedora 42 - aarch64 - Updates ???% [ <=> ] | 0.0 B/s | 0.0 B | 00m01s
>>> Curl error (60): SSL peer certificate or SSH remote key was not OK for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f42&arch=aarch64 [SSL certificate problem: u
>>> Curl error (60): SSL peer certificate or SSH remote key was not OK for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f42&arch=aarch64 [SSL certificate problem: u
>>> Curl error (60): SSL peer certificate or SSH remote key was not OK for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f42&arch=aarch64 [SSL certificate problem: u
>>> Curl error (60): SSL peer certificate or SSH remote key was not OK for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f42&arch=aarch64 [SSL certificate problem: u
>>> Curl error (60): SSL peer certificate or SSH remote key was not OK for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f42&arch=aarch64 [SSL certificate problem: u
...
This then exits with a non-zero exit code.
This means if you have no side-loaded CA certificates, finch vm init will get stuck in a loop of repeatedly restarting the VM every 5 seconds or so, while providing no output of what the issue is, since everything is sent to /dev/null.