A high-availability Kubernetes cluster running on KVM/libvirt virtual machines, featuring 3 control plane nodes, 2 worker nodes, and an HAProxy load balancer - all automated with Vagrant and Ansible.
┌──────────────────┐
│ HAProxy LB │
│ 192.168.100.10 │
└────────┬─────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ cp1 │ │ cp2 │ │ cp3 │
│ .100.11 │ │ .100.12 │ │ .100.13 │
│ Master │ │ Master │ │ Master │
└─────────┘ └─────────┘ └─────────┘
┌─────────────────────────────────────┐
│ │
┌────▼────┐ ┌────▼────┐
│ w1 │ │ w2 │
│ .100.21 │ │ .100.22 │
│ Worker │ │ Worker │
└─────────┘ └─────────┘
| Node Type | Hostname | IP | vCPUs | RAM | Disk | Role |
|---|---|---|---|---|---|---|
| Load Balancer | lb | 192.168.100.10 | 1 | 512MB | 10GB | HAProxy (API endpoint) |
| Control Plane | cp1 | 192.168.100.11 | 2 | 3GB | 20GB | Kubernetes master |
| Control Plane | cp2 | 192.168.100.12 | 2 | 3GB | 20GB | Kubernetes master |
| Control Plane | cp3 | 192.168.100.13 | 2 | 3GB | 20GB | Kubernetes master |
| Worker | w1 | 192.168.100.21 | 2 | 4GB | 30GB | Application workloads |
| Worker | w2 | 192.168.100.22 | 2 | 4GB | 30GB | Application workloads |
Total Resources: 10 vCPUs, 17GB RAM, 110GB disk
- Virtual Network: 192.168.100.0/24 (isolated, NAT-ed)
- Control Plane Endpoint: 192.168.100.10:6443 (HAProxy VIP)
- Pod Network CIDR: 192.168.0.0/16 (Calico VXLAN)
- Service CIDR: 10.96.0.0/12 (default)
- Hypervisor: KVM/libvirt
- Provisioning: Vagrant (vagrant-libvirt plugin)
- Base OS: Rocky Linux 9.3
- Automation: Ansible
- Version: 1.34.3 (configurable in
ansible/inventory/group_vars/all.yml) - Container Runtime: containerd 2.2.0
- CNI: Calico v3.29.1 (VXLAN mode, no BGP)
- Load Balancer: HAProxy 2.4+
- 3-node etcd cluster (quorum-based, can lose 1 node)
- 3 API servers (load balanced via HAProxy)
- Stacked control plane (etcd runs on control plane nodes)
- Automatic certificate distribution (via kubeadm --upload-certs)
- OS: Linux (tested on Fedora/Ultramarine)
- CPU: x86_64 with virtualization support (Intel VT-x or AMD-V)
- RAM: 20GB+ available (32GB recommended)
- Disk: 120GB+ free space
- KVM: Loaded and functional
Install on your host machine:
# Fedora/RHEL/Rocky/Ultramarine
sudo dnf install -y \
libvirt \
libvirt-devel \
qemu-kvm \
ruby-devel \
gcc \
make \
ansible
# Enable and start libvirt
sudo systemctl enable --now libvirtd
# Install Vagrant
# Download from https://developer.hashicorp.com/vagrant/downloads
# Or use package manager
# Install vagrant-libvirt plugin
vagrant plugin install vagrant-libvirtVerify installation:
virsh list --all # Should work without errors
kvm-ok || grep -E 'vmx|svm' /proc/cpuinfo # Verify virtualization
vagrant --version # Should show version
ansible --version # Should show versioncd /path/to/this/repo
vagrant upThis creates all 6 VMs (takes 5-10 minutes). Vagrant will:
- Download Rocky Linux 9 base image (first time only)
- Create VMs with specified resources
- Configure static IPs on isolated network
- Set up SSH keys for password-less access
Generate SSH config for Ansible:
vagrant ssh-config > .vagrant/ssh-configRun the common setup playbook (installs containerd, Kubernetes packages, configures firewall):
ansible-playbook ansible/playbooks/common.ymlConfigure the load balancer:
ansible-playbook ansible/playbooks/loadbalancer.ymlInitialize the first control plane node:
ansible-playbook ansible/playbooks/kubeadm-init.ymlJoin remaining control planes and workers:
ansible-playbook ansible/playbooks/join-nodes.ymlThis automatically:
- Joins cp2 and cp3 as additional control planes
- Joins w1 and w2 as worker nodes
- Fetches the kubeconfig to
.kube/configlocally
ansible-playbook ansible/playbooks/cni-calico.ymlThis installs Calico in VXLAN mode (no BGP complexity).
ansible-playbook ansible/playbooks/verify-cluster.ymlOr manually:
export KUBECONFIG=$(pwd)/.kube/config
kubectl get nodes
kubectl get pods -AExpected output:
NAME STATUS ROLES AGE VERSION
cp1 Ready <none> 1h v1.34.3
cp2 Ready control-plane 1h v1.34.3
cp3 Ready control-plane 1h v1.34.3
w1 Ready <none> 1h v1.34.3
w2 Ready <none> 1h v1.34.3
.
├── Vagrantfile # VM definitions
├── ansible.cfg # Ansible configuration
├── ansible/
│ ├── inventory/
│ │ ├── hosts.ini # Inventory (groups: control_planes, workers, loadbalancer)
│ │ └── group_vars/
│ │ └── all.yml # Global variables (versions, CIDRs)
│ └── playbooks/
│ ├── common.yml # Node setup (containerd, k8s packages, firewall)
│ ├── loadbalancer.yml # HAProxy configuration
│ ├── kubeadm-init.yml # Initialize first control plane
│ ├── join-nodes.yml # Join additional nodes
│ ├── remove-calico.yml # Remove Calico CNI
│ ├── cni-calico.yml # Install Calico (VXLAN mode)
│ └── verify-cluster.yml # Comprehensive health checks
├── templates/
│ ├── containerd-config.toml.j2 # Containerd config (systemd cgroup)
│ ├── haproxy.cfg.j2 # HAProxy config (roundrobin to API servers)
│ ├── hosts-block.j2 # /etc/hosts entries for all nodes
│ └── kubeadm-config.yaml.j2 # Cluster configuration
└── README.md
Vagrant creates 6 VMs on an isolated libvirt network with NAT for internet access. Each VM:
- Gets a static IP assignment
- Has SSH keys pre-configured
- Runs Rocky Linux 9 base OS
- Uses thin-provisioned qcow2 disks
common.yml prepares all Kubernetes nodes:
- Disables swap (required by kubelet)
- Loads kernel modules (overlay, br_netfilter, vxlan)
- Configures sysctl for networking (IP forwarding, bridge netfilter)
- Installs containerd with systemd cgroup driver
- Installs kubelet, kubeadm, kubectl
- Opens firewall ports:
- All nodes: 10250 (kubelet), 4789 (VXLAN)
- Control planes: 6443 (API), 2379-2381 (etcd), 179 (BGP)
- Workers: 30000-32767 (NodePort)
loadbalancer.yml configures HAProxy:
- Listens on 192.168.100.10:6443
- Load balances to all 3 API servers (roundrobin)
- Health checks: TCP connection to port 6443
- SELinux configuration for port binding
kubeadm-init.yml on cp1:
- Generates kubeadm config with:
- Control plane endpoint: 192.168.100.10:6443
- Pod subnet: 192.168.0.0/16 (Calico)
- Kubernetes version: 1.34.3
- Initializes cluster with
--upload-certs(distributes certs for joining) - Creates join commands for control planes and workers
join-nodes.yml:
- Retrieves join tokens from cp1
- Joins cp2, cp3 as control planes (with --control-plane flag)
- Joins w1, w2 as workers
- Fetches kubeconfig to local machine
cni-calico.yml:
- Downloads Calico v3.29.1 manifest
- Configures VXLAN mode (no BGP):
CALICO_IPV4POOL_IPIP: NeverCALICO_IPV4POOL_VXLAN: Always
- Patches readiness probe (felix-only, no BIRD check)
- Waits for Calico DaemonSet rollout
etcd Quorum:
- 3 members = can lose 1 and maintain quorum
- etcd-cp1: 192.168.100.11:2380
- etcd-cp2: 192.168.100.12:2380
- etcd-cp3: 192.168.100.13:2380
API Server Load Balancing:
- HAProxy distributes requests across 3 API servers
- If one control plane fails, API remains accessible
- Clients connect to VIP (192.168.100.10:6443)
Scheduler & Controller Manager:
- Run on all 3 control planes
- Leader election ensures only one is active
- Automatic failover if leader fails
Edit ansible/inventory/group_vars/all.yml:
kubernetes_version: "1.31.4" # Change thisThen re-run:
ansible-playbook ansible/playbooks/common.ymlEdit ansible/inventory/group_vars/all.yml:
pod_network_cidr: "10.244.0.0/16" # For FlannelNote: Must be done BEFORE cluster initialization. To change after, requires cluster re-creation.
Edit Vagrantfile:
NODES = {
"cp1" => { ip: "192.168.100.11", memory: 4096, cpus: 4, disk: "30G" },
# ...
}Destroy and recreate VMs:
vagrant destroy -f
vagrant upIf CNI has issues:
ansible-playbook ansible/playbooks/remove-calico.yml
ansible-playbook ansible/playbooks/cni-calico.yml# All nodes should be Ready
kubectl get nodes
# All system pods should be Running
kubectl get pods -n kube-system
# Check etcd cluster
kubectl exec -n kube-system etcd-cp1 -- etcdctl \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key \
member list -w table
# Check HAProxy stats
vagrant ssh lb -c "sudo systemctl status haproxy"- Edit
Vagrantfileto add new worker definition - Run
vagrant up <new-worker-name> - Add to
ansible/inventory/hosts.iniunder[workers] - Run
ansible-playbook ansible/playbooks/common.yml --limit=<new-worker> - Generate join command on cp1:
vagrant ssh cp1 -c "sudo kubeadm token create --print-join-command" - SSH to new worker and run join command
Check Calico pods:
kubectl get pods -n kube-system -l k8s-app=calico-nodeIf not ready, check logs:
kubectl logs -n kube-system <calico-node-pod> -c calico-nodeCommon fix: Remove and reinstall Calico
ansible-playbook ansible/playbooks/remove-calico.yml
ansible-playbook ansible/playbooks/cni-calico.ymlCheck HAProxy:
vagrant ssh lb -c "sudo systemctl status haproxy"
vagrant ssh lb -c "sudo tail -f /var/log/haproxy.log"Check control plane API servers:
for i in 1 2 3; do
vagrant ssh cp$i -c "sudo systemctl status kube-apiserver"
doneTest connection:
curl -k https://192.168.100.10:6443/healthzCheck Calico:
kubectl get pods -n kube-system -l k8s-app=calico-node -o wideVerify VXLAN mode:
kubectl get cm -n kube-system calico-config -o yaml | grep calico_backend
# Should show: vxlan (not bird)Check firewall on nodes:
vagrant ssh cp1 -c "sudo firewall-cmd --list-all"
# Should include: 4789/udp (VXLAN)Check etcd members:
kubectl exec -n kube-system etcd-cp1 -- etcdctl \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key \
endpoint status --cluster -w tableShould show 3 healthy members.
Re-run common playbook to ensure all ports are open:
ansible-playbook ansible/playbooks/common.ymlvagrant destroy -fThis removes all VMs but keeps:
- Downloaded base box (for faster re-creation)
- Vagrantfile and Ansible playbooks
- Any saved kubeconfig in
.kube/
vagrant destroy -f
rm -rf .vagrant .kube
vagrant box remove generic/rocky9# Deploy sample app
kubectl create deployment nginx --image=nginx:alpine --replicas=3
kubectl expose deployment nginx --port=80 --type=NodePort
# Get NodePort
kubectl get svc nginx
# Access from host
NODE_PORT=$(kubectl get svc nginx -o jsonpath='{.spec.ports[0].nodePort}')
curl http://192.168.100.21:$NODE_PORT # w1's IP# Watch nodes
watch kubectl get nodes
# In another terminal, power off a control plane
vagrant halt cp2
# Cluster should remain operational with cp1 and cp3
# Bring it back
vagrant up cp2Since this is a full multi-master cluster, you can test:
- Leader election: Deploy apps with leader-election
- etcd operations: Backup, restore, member management
- HA configurations: Disruption budgets, rolling updates across AZs
- Network policies: Calico supports advanced network policies
- Storage: Add persistent volumes via NFS or local storage
Increase VM CPU priority:
# Edit Vagrantfile, add to provider block:
lv.cpu_mode = 'host-passthrough'
lv.cpus = 4 # Increase from 2Reduce logging verbosity:
# Edit /var/lib/kubelet/config.yaml on nodes
logging:
verbosity: 2 # Default is 4This is a LAB environment. For production:
- TLS certificates: Default certs are auto-generated. Use proper PKI
- RBAC: Configure proper role-based access control
- Network policies: Implement strict pod network policies
- Secrets encryption: Enable encryption at rest for secrets
- Audit logging: Enable and configure audit logs
- kubeconfig: Protect the admin kubeconfig (stored in
.kube/config) - Host firewall: The VMs' firewall is configured but test thoroughly
- Updates: Keep Kubernetes and OS packages updated
- Kubernetes Documentation
- kubeadm HA Setup
- Calico Documentation
- HAProxy Configuration
- Vagrant libvirt Provider
This is a learning/lab environment. Use at your own risk.
This is a personal lab setup. Feel free to fork and modify for your needs.
Questions? Check the troubleshooting section or review the playbooks in ansible/playbooks/ for implementation details.