Multi-Master Kubernetes Lab Cluster

A high-availability Kubernetes cluster running on KVM/libvirt virtual machines, featuring 3 control plane nodes, 2 worker nodes, and an HAProxy load balancer - all automated with Vagrant and Ansible.

Architecture

Cluster Topology

                    ┌──────────────────┐
                    │   HAProxy LB     │
                    │ 192.168.100.10   │
                    └────────┬─────────┘
                             │
         ┌───────────────────┼───────────────────┐
         │                   │                   │
    ┌────▼────┐         ┌────▼────┐        ┌────▼────┐
    │  cp1    │         │  cp2    │        │  cp3    │
    │ .100.11 │         │ .100.12 │        │ .100.13 │
    │ Master  │         │ Master  │        │ Master  │
    └─────────┘         └─────────┘        └─────────┘
    
         ┌─────────────────────────────────────┐
         │                                     │
    ┌────▼────┐                           ┌────▼────┐
    │   w1    │                           │   w2    │
    │ .100.21 │                           │ .100.22 │
    │ Worker  │                           │ Worker  │
    └─────────┘                           └─────────┘

Node Specifications

Node Type	Hostname	IP	vCPUs	RAM	Disk	Role
Load Balancer	lb	192.168.100.10	1	512MB	10GB	HAProxy (API endpoint)
Control Plane	cp1	192.168.100.11	2	3GB	20GB	Kubernetes master
Control Plane	cp2	192.168.100.12	2	3GB	20GB	Kubernetes master
Control Plane	cp3	192.168.100.13	2	3GB	20GB	Kubernetes master
Worker	w1	192.168.100.21	2	4GB	30GB	Application workloads
Worker	w2	192.168.100.22	2	4GB	30GB	Application workloads

Total Resources: 10 vCPUs, 17GB RAM, 110GB disk

Network Configuration

Virtual Network: 192.168.100.0/24 (isolated, NAT-ed)
Control Plane Endpoint: 192.168.100.10:6443 (HAProxy VIP)
Pod Network CIDR: 192.168.0.0/16 (Calico VXLAN)
Service CIDR: 10.96.0.0/12 (default)

Technology Stack

Infrastructure

Hypervisor: KVM/libvirt
Provisioning: Vagrant (vagrant-libvirt plugin)
Base OS: Rocky Linux 9.3
Automation: Ansible

Kubernetes

Version: 1.34.3 (configurable in ansible/inventory/group_vars/all.yml)
Container Runtime: containerd 2.2.0
CNI: Calico v3.29.1 (VXLAN mode, no BGP)
Load Balancer: HAProxy 2.4+

High Availability Features

3-node etcd cluster (quorum-based, can lose 1 node)
3 API servers (load balanced via HAProxy)
Stacked control plane (etcd runs on control plane nodes)
Automatic certificate distribution (via kubeadm --upload-certs)

Prerequisites

Host System Requirements

OS: Linux (tested on Fedora/Ultramarine)
CPU: x86_64 with virtualization support (Intel VT-x or AMD-V)
RAM: 20GB+ available (32GB recommended)
Disk: 120GB+ free space
KVM: Loaded and functional

Software Dependencies

Install on your host machine:

# Fedora/RHEL/Rocky/Ultramarine
sudo dnf install -y \
  libvirt \
  libvirt-devel \
  qemu-kvm \
  ruby-devel \
  gcc \
  make \
  ansible

# Enable and start libvirt
sudo systemctl enable --now libvirtd

# Install Vagrant
# Download from https://developer.hashicorp.com/vagrant/downloads
# Or use package manager

# Install vagrant-libvirt plugin
vagrant plugin install vagrant-libvirt

Verify installation:

virsh list --all                    # Should work without errors
kvm-ok || grep -E 'vmx|svm' /proc/cpuinfo  # Verify virtualization
vagrant --version                   # Should show version
ansible --version                   # Should show version

Quick Start

1. Provision Virtual Machines

cd /path/to/this/repo
vagrant up

This creates all 6 VMs (takes 5-10 minutes). Vagrant will:

Download Rocky Linux 9 base image (first time only)
Create VMs with specified resources
Configure static IPs on isolated network
Set up SSH keys for password-less access

Generate SSH config for Ansible:

vagrant ssh-config > .vagrant/ssh-config

2. Configure Nodes

Run the common setup playbook (installs containerd, Kubernetes packages, configures firewall):

ansible-playbook ansible/playbooks/common.yml

Configure the load balancer:

ansible-playbook ansible/playbooks/loadbalancer.yml

3. Bootstrap Kubernetes Cluster

Initialize the first control plane node:

ansible-playbook ansible/playbooks/kubeadm-init.yml

Join remaining control planes and workers:

ansible-playbook ansible/playbooks/join-nodes.yml

This automatically:

Joins cp2 and cp3 as additional control planes
Joins w1 and w2 as worker nodes
Fetches the kubeconfig to .kube/config locally

4. Install CNI (Calico)

ansible-playbook ansible/playbooks/cni-calico.yml

This installs Calico in VXLAN mode (no BGP complexity).

5. Verify Cluster

ansible-playbook ansible/playbooks/verify-cluster.yml

Or manually:

export KUBECONFIG=$(pwd)/.kube/config
kubectl get nodes
kubectl get pods -A

Expected output:

NAME   STATUS   ROLES           AGE   VERSION
cp1    Ready    <none>          1h    v1.34.3
cp2    Ready    control-plane   1h    v1.34.3
cp3    Ready    control-plane   1h    v1.34.3
w1     Ready    <none>          1h    v1.34.3
w2     Ready    <none>          1h    v1.34.3

Project Structure

.
├── Vagrantfile                      # VM definitions
├── ansible.cfg                      # Ansible configuration
├── ansible/
│   ├── inventory/
│   │   ├── hosts.ini               # Inventory (groups: control_planes, workers, loadbalancer)
│   │   └── group_vars/
│   │       └── all.yml             # Global variables (versions, CIDRs)
│   └── playbooks/
│       ├── common.yml              # Node setup (containerd, k8s packages, firewall)
│       ├── loadbalancer.yml        # HAProxy configuration
│       ├── kubeadm-init.yml        # Initialize first control plane
│       ├── join-nodes.yml          # Join additional nodes
│       ├── remove-calico.yml       # Remove Calico CNI
│       ├── cni-calico.yml          # Install Calico (VXLAN mode)
│       └── verify-cluster.yml      # Comprehensive health checks
├── templates/
│   ├── containerd-config.toml.j2   # Containerd config (systemd cgroup)
│   ├── haproxy.cfg.j2              # HAProxy config (roundrobin to API servers)
│   ├── hosts-block.j2              # /etc/hosts entries for all nodes
│   └── kubeadm-config.yaml.j2      # Cluster configuration
└── README.md

How It Works

1. VM Provisioning (Vagrant)

Vagrant creates 6 VMs on an isolated libvirt network with NAT for internet access. Each VM:

Gets a static IP assignment
Has SSH keys pre-configured
Runs Rocky Linux 9 base OS
Uses thin-provisioned qcow2 disks

2. System Configuration (Ansible)

common.yml prepares all Kubernetes nodes:

Disables swap (required by kubelet)
Loads kernel modules (overlay, br_netfilter, vxlan)
Configures sysctl for networking (IP forwarding, bridge netfilter)
Installs containerd with systemd cgroup driver
Installs kubelet, kubeadm, kubectl
Opens firewall ports:
- All nodes: 10250 (kubelet), 4789 (VXLAN)
- Control planes: 6443 (API), 2379-2381 (etcd), 179 (BGP)
- Workers: 30000-32767 (NodePort)

loadbalancer.yml configures HAProxy:

Listens on 192.168.100.10:6443
Load balances to all 3 API servers (roundrobin)
Health checks: TCP connection to port 6443
SELinux configuration for port binding

3. Cluster Initialization

kubeadm-init.yml on cp1:

Generates kubeadm config with:
- Control plane endpoint: 192.168.100.10:6443
- Pod subnet: 192.168.0.0/16 (Calico)
- Kubernetes version: 1.34.3
Initializes cluster with --upload-certs (distributes certs for joining)
Creates join commands for control planes and workers

join-nodes.yml:

Retrieves join tokens from cp1
Joins cp2, cp3 as control planes (with --control-plane flag)
Joins w1, w2 as workers
Fetches kubeconfig to local machine

4. CNI Installation

cni-calico.yml:

Downloads Calico v3.29.1 manifest
Configures VXLAN mode (no BGP):
- CALICO_IPV4POOL_IPIP: Never
- CALICO_IPV4POOL_VXLAN: Always
Patches readiness probe (felix-only, no BIRD check)
Waits for Calico DaemonSet rollout

5. High Availability Mechanics

etcd Quorum:

3 members = can lose 1 and maintain quorum
etcd-cp1: 192.168.100.11:2380
etcd-cp2: 192.168.100.12:2380
etcd-cp3: 192.168.100.13:2380

API Server Load Balancing:

HAProxy distributes requests across 3 API servers
If one control plane fails, API remains accessible
Clients connect to VIP (192.168.100.10:6443)

Scheduler & Controller Manager:

Run on all 3 control planes
Leader election ensures only one is active
Automatic failover if leader fails

Configuration

Changing Kubernetes Version

Edit ansible/inventory/group_vars/all.yml:

kubernetes_version: "1.31.4"  # Change this

Then re-run:

ansible-playbook ansible/playbooks/common.yml

Changing Pod Network CIDR

Edit ansible/inventory/group_vars/all.yml:

pod_network_cidr: "10.244.0.0/16"  # For Flannel

Note: Must be done BEFORE cluster initialization. To change after, requires cluster re-creation.

Changing VM Resources

Edit Vagrantfile:

NODES = {
  "cp1" => { ip: "192.168.100.11", memory: 4096, cpus: 4, disk: "30G" },
  # ...
}

Destroy and recreate VMs:

vagrant destroy -f
vagrant up

Maintenance

Remove and Reinstall Calico

If CNI has issues:

ansible-playbook ansible/playbooks/remove-calico.yml
ansible-playbook ansible/playbooks/cni-calico.yml

Check Cluster Health

# All nodes should be Ready
kubectl get nodes

# All system pods should be Running
kubectl get pods -n kube-system

# Check etcd cluster
kubectl exec -n kube-system etcd-cp1 -- etcdctl \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/peer.crt \
  --key=/etc/kubernetes/pki/etcd/peer.key \
  member list -w table

# Check HAProxy stats
vagrant ssh lb -c "sudo systemctl status haproxy"

Adding More Worker Nodes

Edit Vagrantfile to add new worker definition
Run vagrant up <new-worker-name>
Add to ansible/inventory/hosts.ini under [workers]
Run ansible-playbook ansible/playbooks/common.yml --limit=<new-worker>

Generate join command on cp1:

vagrant ssh cp1 -c "sudo kubeadm token create --print-join-command"

SSH to new worker and run join command

Troubleshooting

Nodes Not Ready

Check Calico pods:

kubectl get pods -n kube-system -l k8s-app=calico-node

If not ready, check logs:

kubectl logs -n kube-system <calico-node-pod> -c calico-node

Common fix: Remove and reinstall Calico

ansible-playbook ansible/playbooks/remove-calico.yml
ansible-playbook ansible/playbooks/cni-calico.yml

API Server Unreachable

Check HAProxy:

vagrant ssh lb -c "sudo systemctl status haproxy"
vagrant ssh lb -c "sudo tail -f /var/log/haproxy.log"

Check control plane API servers:

for i in 1 2 3; do
  vagrant ssh cp$i -c "sudo systemctl status kube-apiserver"
done

Test connection:

curl -k https://192.168.100.10:6443/healthz

Pods Can't Communicate

Check Calico:

kubectl get pods -n kube-system -l k8s-app=calico-node -o wide

Verify VXLAN mode:

kubectl get cm -n kube-system calico-config -o yaml | grep calico_backend
# Should show: vxlan (not bird)

Check firewall on nodes:

vagrant ssh cp1 -c "sudo firewall-cmd --list-all"
# Should include: 4789/udp (VXLAN)

etcd Cluster Issues

Check etcd members:

kubectl exec -n kube-system etcd-cp1 -- etcdctl \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/peer.crt \
  --key=/etc/kubernetes/pki/etcd/peer.key \
  endpoint status --cluster -w table

Should show 3 healthy members.

Firewall Blocking Traffic

Re-run common playbook to ensure all ports are open:

ansible-playbook ansible/playbooks/common.yml

Cleanup

Destroy All VMs

vagrant destroy -f

This removes all VMs but keeps:

Downloaded base box (for faster re-creation)
Vagrantfile and Ansible playbooks
Any saved kubeconfig in .kube/

Complete Cleanup

vagrant destroy -f
rm -rf .vagrant .kube
vagrant box remove generic/rocky9

Common Use Cases

Testing Application Deployments

# Deploy sample app
kubectl create deployment nginx --image=nginx:alpine --replicas=3
kubectl expose deployment nginx --port=80 --type=NodePort

# Get NodePort
kubectl get svc nginx

# Access from host
NODE_PORT=$(kubectl get svc nginx -o jsonpath='{.spec.ports[0].nodePort}')
curl http://192.168.100.21:$NODE_PORT  # w1's IP

Testing HA Failover

# Watch nodes
watch kubectl get nodes

# In another terminal, power off a control plane
vagrant halt cp2

# Cluster should remain operational with cp1 and cp3
# Bring it back
vagrant up cp2

Experimenting with Kubernetes Features

Since this is a full multi-master cluster, you can test:

Leader election: Deploy apps with leader-election
etcd operations: Backup, restore, member management
HA configurations: Disruption budgets, rolling updates across AZs
Network policies: Calico supports advanced network policies
Storage: Add persistent volumes via NFS or local storage

Performance Tuning

Host Machine

Increase VM CPU priority:

# Edit Vagrantfile, add to provider block:
lv.cpu_mode = 'host-passthrough'
lv.cpus = 4  # Increase from 2

Kubernetes

Reduce logging verbosity:

# Edit /var/lib/kubelet/config.yaml on nodes
logging:
  verbosity: 2  # Default is 4

Security Considerations

This is a LAB environment. For production:

TLS certificates: Default certs are auto-generated. Use proper PKI
RBAC: Configure proper role-based access control
Network policies: Implement strict pod network policies
Secrets encryption: Enable encryption at rest for secrets
Audit logging: Enable and configure audit logs
kubeconfig: Protect the admin kubeconfig (stored in .kube/config)
Host firewall: The VMs' firewall is configured but test thoroughly
Updates: Keep Kubernetes and OS packages updated

References

License

This is a learning/lab environment. Use at your own risk.

Contributing

This is a personal lab setup. Feel free to fork and modify for your needs.

Questions? Check the troubleshooting section or review the playbooks in ansible/playbooks/ for implementation details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ansible		ansible
templates		templates
.gitignore		.gitignore
README.md		README.md
Vagrantfile		Vagrantfile
ansible.cfg		ansible.cfg

e4c5/ansible-k8s

Folders and files

Latest commit

History

Repository files navigation

Multi-Master Kubernetes Lab Cluster

Architecture

Cluster Topology

Node Specifications

Network Configuration

Technology Stack

Infrastructure

Kubernetes

High Availability Features

Prerequisites

Host System Requirements

Software Dependencies

Quick Start

1. Provision Virtual Machines

2. Configure Nodes

3. Bootstrap Kubernetes Cluster

4. Install CNI (Calico)

5. Verify Cluster

Project Structure

How It Works

1. VM Provisioning (Vagrant)

2. System Configuration (Ansible)

3. Cluster Initialization

4. CNI Installation

5. High Availability Mechanics

Configuration

Changing Kubernetes Version

Changing Pod Network CIDR

Changing VM Resources

Maintenance

Remove and Reinstall Calico

Check Cluster Health

Adding More Worker Nodes

Troubleshooting

Nodes Not Ready

API Server Unreachable

Pods Can't Communicate

etcd Cluster Issues

Firewall Blocking Traffic

Cleanup

Destroy All VMs

Complete Cleanup

Common Use Cases

Testing Application Deployments

Testing HA Failover

Experimenting with Kubernetes Features

Performance Tuning

Host Machine

Kubernetes

Security Considerations

References

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages