This project will guide students through the complete lifecycle of setting up a Kubernetes cluster on Azure using Terraform, configuring advanced Kubernetes features, and deploying a complex microservices application. The goal is to provide a comprehensive understanding of Kubernetes, including its architecture, networking, storage, security, and monitoring.
-
Infrastructure Setup with Terraform
-
Create Terraform Configuration
Create a
main.tffile to provision 3 Azure VMs: -
Create a new directory for your project and navigate into it:
mkdir azure-kubernetes && cd azure-kubernetes
-
Create a file named
providers.tfwith the following content:terraform { required_providers { azurerm = { source = "hashicorp/azurerm" version = "=3.112.0" } } } provider "azurerm" { features {} } -
Create a file named
main.tfand add the Terraform configuration for the Azure resources. This includes:- Resource Group
- Virtual Network
- Subnet
- Network Security Group
- 3 Virtual Machines
- Public IPs
- Network Interfaces
# Create a resource group resource "azurerm_resource_group" "rg" { name = "kubernetes-resource-group" location = "East US" } # Create a virtual network resource "azurerm_virtual_network" "vnet" { name = "kubernetes-vnet" address_space = ["10.0.0.0/16"] location = azurerm_resource_group.rg.location resource_group_name = azurerm_resource_group.rg.name } # Create a subnet resource "azurerm_subnet" "subnet" { name = "kubernetes-subnet" resource_group_name = azurerm_resource_group.rg.name virtual_network_name = azurerm_virtual_network.vnet.name address_prefixes = ["10.0.1.0/24"] } # Create a Network Security Group resource "azurerm_network_security_group" "nsg" { name = "my-nsg" location = azurerm_resource_group.rg.location resource_group_name = azurerm_resource_group.rg.name security_rule { name = "SSH" priority = 1001 direction = "Inbound" access = "Allow" protocol = "Tcp" source_port_range = "*" destination_port_range = "22" source_address_prefix = "*" destination_address_prefix = "*" } } # Create 3 VMs resource "azurerm_linux_virtual_machine" "vm" { count = 3 name = "kubernetes-node-${count.index + 1}" resource_group_name = azurerm_resource_group.rg.name location = azurerm_resource_group.rg.location size = "Standard_B2s" admin_username = "adminuser" network_interface_ids = [ azurerm_network_interface.nic[count.index].id, ] admin_ssh_key { username = "adminuser" public_key = file("~/.ssh/id_rsa.pub") # Path to your local public key } os_disk { caching = "ReadWrite" storage_account_type = "Standard_LRS" } source_image_reference { publisher = "Canonical" offer = "UbuntuServer" sku = "18.04-LTS" version = "latest" } } # Create public IPs resource "azurerm_public_ip" "public_ip" { count = 3 name = "public-ip-${count.index + 1}" location = azurerm_resource_group.rg.location resource_group_name = azurerm_resource_group.rg.name allocation_method = "Dynamic" } # Create network interfaces for VMs resource "azurerm_network_interface" "nic" { count = 3 name = "my-nic-${count.index + 1}" location = azurerm_resource_group.rg.location resource_group_name = azurerm_resource_group.rg.name ip_configuration { name = "internal" subnet_id = azurerm_subnet.subnet.id private_ip_address_allocation = "Dynamic" public_ip_address_id = azurerm_public_ip.public_ip[count.index].id } } # Associate NSG with network interfaces resource "azurerm_network_interface_security_group_association" "nsg_association" { count = 3 network_interface_id = azurerm_network_interface.nic[count.index].id network_security_group_id = azurerm_network_security_group.nsg.id } # Output public IP addresses output "vm_public_ips" { value = azurerm_public_ip.public_ip[*].ip_address } -
Initialize Terraform:
terraform init
-
Review the planned changes:
terraform plan
-
Apply the Terraform configuration:
terraform apply
-
Note down the public IP addresses of the VMs from the Terraform output. (output part still isn’t working will fix that, but for now go to the azure portal and get the public ips)
-
-
Kubernetes Cluster Setup
-
Install Container Runtime
SSH into each VM and install containerd:
sudo apt-get update sudo apt-get install -y containerd sudo mkdir -p /etc/containerd sudo containerd config default | sudo tee /etc/containerd/config.toml sudo systemctl restart containerd -
Install Kubernetes Components
sudo apt-get update sudo apt-get install -y apt-transport-https ca-certificates curl gpg sudo mkdir /etc/apt/keyrings curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list sudo apt-get update sudo apt-get install -y kubelet kubeadm kubectl sudo apt-mark hold kubelet kubeadm kubectl
-
Initialize Kubernetes Master Node
-
On the first VM (master node):
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-cert-extra-sans=<control-plane-public-ip>
-
Set up kubectl for the non-root user:
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
-
Install a CNI plugin (e.g., canal(calico for policy and flannel for networking):
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/canal.yaml
-
-
Join Worker Nodes
On the master node, run:
kubeadm token create --print-join-comma
Copy the output and run it on each worker node with sudo.
-
Accessing the Cluster from your local machine
Creating the kubeconfig file
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
Copy this kubeconfig file to your local machine in the location ~/.kube/config and replace the server address with the public IP of the master node
-
Test if everything works fine
kubectl create -f https://raw.githubusercontent.com/NillsF/blog/master/kubeadm/azure-vote.yml kubectl port-forward service/azure-vote-front 8080:80
Give it a couple seconds for the pods to start, and then connect to localhost:8080 in your browser and you’ll see the azure-vote application running on your kubernetes cluster:
![demo application home page]](<assets/microservecs-demo app.png>)
-
-
Installing Kubernetes Dashboard
- Add the helm chart repo
- Install kubernetes via helm chart
# Add kubernetes-dashboard repository helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/ # Deploy a Helm Release named "kubernetes-dashboard" using the kubernetes-dashboard chart helm upgrade --install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --create-namespace --namespace kubernetes-dashboard
kubectl -n kubernetes-dashboard port-forward svc/kubernetes-dashboard-kong-proxy 8443:443
Creation of bearer token and service account is needed.
# Create a Bearer token kubectl -n NAMESPACE create token SERVICE_ACCOUNT -
Advanced Kubernetes Configuration
-
Control Plan Isolation
Control plane isolation in Kubernetes is crucial for several reasons:
- Security:The control plane nodes host critical components of the Kubernetes cluster, such as the API server, scheduler, and controller manager. Isolating these components from regular workloads helps prevent potential security breaches or unauthorized access.
- Stability:By keeping application workloads separate from control plane components, you reduce the risk of resource contention that could impact the stability and performance of core cluster functions.
- Resource management:Control plane nodes often have specific resource requirements. Isolating them ensures that these resources are dedicated to running the cluster's core components without interference from other workloads.
- Scalability:Separating the control plane allows for independent scaling of control plane components and worker nodes, which is especially important in large clusters.
- Upgrades and maintenance:Isolation makes it easier to perform upgrades or maintenance on control plane components without affecting application workloads.
- Compliance:Some regulatory standards or organizational policies may require separation of control and data planes for better governance and risk management.
To simulate Importance of Control Plane isolation we will allow scheduling of pods on the control plane node and try to fetch Cluster specific data from the control plane node.
-
By default, your cluster will not schedule Pods on the control plane nodes for security reasons. If you want to be able to schedule Pods on the control plane nodes, for example for a single machine Kubernetes cluster, run:
kubectl taint nodes --all node-role.kubernetes.io/control-plane:NoSchedule-
The output will look something like:
node "test-01" untainted ...This will remove the
node-role.kubernetes.io/control-plane:NoScheduletaint from any nodes that have it, including the control plane nodes, meaning that the scheduler will then be able to schedule Pods everywhere.Additionally, you can execute the following command to remove the
node.kubernetes.io/exclude-from-external-load-balancerslabel from the control plane node, which excludes it from the list of backend servers:kubectl label nodes --all node.kubernetes.io/exclude-from-external-load-balancers-
-
Let’s now create a pod to get control plane resources
apiVersion: v1 kind: Pod metadata: name: malicious-pod spec: containers: - name: alpine image: alpine command: ["/bin/sh", "-c"] args: - | apk add --no-cache curl && \ while true; do \ curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \ -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \ https://kubernetes.default.svc.cluster.local:443/api; \ sleep 5; \ done restartPolicy: Alwayskubectl apply -f malicious-pod.yaml
-
Observe the Impact
Monitor the logs of the malicious pod to see if it can access the API server.
kubectl logs malicious-pod
kshitijdhara@192 Cloud admin project % kubectl delete pod busybox pod "busybox" deleted kshitijdhara@192 Cloud admin project % kubectl apply -f control-plane-isolation.yaml pod/malicious-pod created kshitijdhara@192 Cloud admin project % kubeclt logs malicious-pod zsh: command not found: kubeclt kshitijdhara@192 Cloud admin project % kubectl logs malicious-pod fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/main/x86_64/APKINDEX.tar.gz fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/community/x86_64/APKINDEX.tar.gz (1/10) Installing ca-certificates (20240705-r0) (2/10) Installing brotli-libs (1.1.0-r2) (3/10) Installing c-ares (1.28.1-r0) (4/10) Installing libunistring (1.2-r0) (5/10) Installing libidn2 (2.3.7-r0) (6/10) Installing nghttp2-libs (1.62.1-r0) (7/10) Installing libpsl (0.21.5-r1) (8/10) Installing zstd-libs (1.5.6-r0) (9/10) Installing libcurl (8.9.0-r0) (10/10) Installing curl (8.9.0-r0) Executing busybox-1.36.1-r29.trigger Executing ca-certificates-20240705-r0.trigger OK: 13 MiB in 24 packages % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed { "kind": "APIVersions", "versions": [ "v1" ], "serverAddressByClientCIDRs": [ { "clientCIDR": "0.0.0.0/0", "serverAddress": "10.0.1.6:6443" } ] 100 180 100 180 0 0 9193 0 --:--:-- --:--:-- --:--:-- 9473 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 180 100 180 0 0 12124 0 --:--:-- --:--:-- --:--:-- 12857 }{ "kind": "APIVersions", "versions": [ "v1" ], "serverAddressByClientCIDRs": [ { "clientCIDR": "0.0.0.0/0", "serverAddress": "10.0.1.6:6443" } ] % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed }{ "kind": "APIVersions", "versions": [ "v1" ], "serverAddressByClientCIDRs": [ { "clientCIDR": "0.0.0.0/0", "serverAddress": "10.0.1.6:6443" } ] 100 180 100 180 0 0 6273 0 --:--:-- --:--:-- --:--:-- 6428
-
Reapply Control Plane Isolation (Best Practice)
Reapply the taint to the control plane node to prevent scheduling of pods.
kubectl taint nodes --all node-role.kubernetes.io/control-plane=:NoSchedule
What did we learn:
- Security Risks: By allowing pods to run on control plane nodes, you expose critical components to potential security breaches, as demonstrated by the malicious pod accessing the API server.
- Resource Contention: Running workloads on control plane nodes can lead to resource contention, affecting the stability and performance of the control plane components.
- Best Practices: Reapplying the taint ensures that control plane nodes are dedicated to managing the cluster, maintaining security and stability
-
Create HA Cluster
Topology
-
create 5 nodes on azure vm by editing the terrform code vm count from 3 → 5
-
the setup will have 2 control plane nodes and 3 worker nodes
-
Configure Load Balancer
- Create an Azure Load Balancer.
- Configure health probes and load balancing rules for ports 6443 and 443.
- Ensure the load balancer can communicate with all control plane nodes on port 6443.
-
Add the first control plane node to the load balancer, and test the connection:
nc -v <LOAD_BALANCER_IP> <PORT>
A connection refused error is expected because the API server is not yet running. A timeout, however, means the load balancer cannot communicate with the control plane node. If a timeout occurs, reconfigure the load balancer to communicate with the control plane node.
-
Add the remaining control plane nodes to the load balancer target group.
-
Initialise the first control plane node SSH into the first control plane node and initialize the control plane:
sudo kubeadm init --control-plane-endpoint "<LOAD_BALANCER_IP>:6443" --upload-certs --pod-network-cidr=10.244.0.0/16The output looks similar to:
... You can now join any number of control-plane node by running the following command on each as a root: kubeadm join 192.168.0.200:6443 --token 9vr73a.a8uxyaju799qwdjv --discovery-token-ca-cert-hash sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866 --control-plane --certificate-key f8902e114ef118304e561c3ecd4d0b543adc226b7a07f675f56564185ffe0c07 Please note that the certificate-key gives access to cluster sensitive data, keep it secret! As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use kubeadm init phase upload-certs to reload certs afterward. Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.0.200:6443 --token 9vr73a.a8uxyaju799qwdjv --discovery-token-ca-cert-hash sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866
-
Copy this output to a text file. You will need it later to join control plane and worker nodes to the cluster.
-
When
-upload-certsis used withkubeadm init, the certificates of the primary control plane are encrypted and uploaded in thekubeadm-certsSecret. -
To re-upload the certificates and generate a new decryption key, use the following command on a control plane node that is already joined to the cluster:
sudo kubeadm init phase upload-certs --upload-certs
-
-
Steps for the rest of the control plane nodes
For each additional control plane node you should:
-
Execute the join command that was previously given to you by the
kubeadm initoutput on the first node. It should look something like this:sudo kubeadm join 192.168.0.200:6443 --token 9vr73a.a8uxyaju799qwdjv --discovery-token-ca-cert-hash sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866 --control-plane --certificate-key f8902e114ef118304e561c3ecd4d0b543adc226b7a07f675f56564185ffe0c07
- The
-control-planeflag tellskubeadm jointo create a new control plane. - The
-certificate-key ...will cause the control plane certificates to be downloaded from thekubeadm-certsSecret in the cluster and be decrypted using the given key.
- The
You can join multiple control-plane nodes in parallel.
-
-
Continue with join the worker nodes to the Cluster
-
-
Implement Ingress Controllers
-
Install NGINX Ingress Controller:This installs the NGINX Ingress Controller with 2 replicas for high availability.
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm repo update helm install ingress-nginx ingress-nginx/ingress-nginx \ --namespace ingress-nginx \ --create-namespace \ --set controller.replicaCount=2 \ --set controller.nodeSelector."kubernetes\.io/os"=linux \ --set defaultBackend.nodeSelector."kubernetes\.io/os"=linux
-
Verify the installation:
kubectl get pods -n ingress-nginx
-
Get the External IP of the Ingress Controller:Look for the
ingress-nginx-controllerservice and note its External IP.kubectl get services -n ingress-nginx
-
Create an Ingress Resource:Create a file named
example-ingress.yaml:Replaceyour-domain.comandyour-servicewith your actual domain and service name.apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: example-ingress annotations: kubernetes.io/ingress.class: nginx spec: rules: - http: paths: - path: / pathType: Prefix backend: service: name: service port: number: 80
-
Apply the Ingress Resource:
kubectl apply -f example-ingress.yaml
-
Test the Ingress:Access your application using the domain you configured.
-
-
Security Best Practices
-
Implement Role-Based Access Control (RBAC)
-
Create a Service Account:
apiVersion: v1 kind: ServiceAccount metadata: name: my-service-account namespace: default
-
Create a Role or ClusterRole:
For namespace-specific permissions (Role):
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: default name: pod-reader rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "watch", "list"]
For cluster-wide permissions (ClusterRole):
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: pod-reader rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "watch", "list"]
-
Create a RoleBinding or ClusterRoleBinding:
For Role:
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: read-pods namespace: default subjects: - kind: ServiceAccount name: my-service-account namespace: default roleRef: kind: Role name: pod-reader apiGroup: rbac.authorization.k8s.io
For ClusterRole:
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: read-pods subjects: - kind: ServiceAccount name: my-service-account namespace: default roleRef: kind: ClusterRole name: pod-reader apiGroup: rbac.authorization.k8s.io
-
Apply the configurations:
kubectl apply -f service-account.yaml kubectl apply -f role.yaml kubectl apply -f role-binding.yaml
-
Use the Service Account in a Pod:
apiVersion: v1 kind: Pod metadata: name: my-pod spec: serviceAccountName: my-service-account containers: - name: my-container image: busybox command: ['sh', '-c', 'echo Hello Kubernetes! && sleep 3600']
-
Verify RBAC:
kubectl auth can-i get pods --as=system:serviceaccount:default:my-service-account
-
-
Network Policies
-
Create a Namespace
Create a namespace for your application:
kubectl create namespace my-namespace
-
Create a Basic Network Policy
Create a file named deny-all-ingress.yaml to deny all ingress traffic by default:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: deny-all-ingress namespace: my-namespace spec: podSelector: {} policyTypes: - Ingress
Apply the Network Policy:
kubectl apply -f deny-all-ingress.yaml
-
Allow Specific Ingress Traffic
Create a file named allow-nginx-ingress.yaml to allow ingress traffic to a specific pod:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-nginx-ingress namespace: my-namespace spec: podSelector: matchLabels: app: nginx policyTypes: - Ingress ingress: - from: - podSelector: matchLabels: app: allowed-app ports: - protocol: TCP port: 80
Apply the Network Policy:
kubectl apply -f allow-nginx-ingress.yaml
-
Allow Egress Traffic
Create a file named allow-egress.yaml to allow egress traffic from a specific pod:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-egress namespace: my-namespace spec: podSelector: matchLabels: app: nginx policyTypes: - Egress egress: - to: - podSelector: matchLabels: app: external-service ports: - protocol: TCP port: 443
Apply the Network Policy:
kubectl apply -f allow-egress.yaml
-
Verify Network Policies
Deploy pods and verify that the network policies are enforced. For example, deploy an nginx pod:
apiVersion: v1 kind: Pod metadata: name: nginx namespace: my-namespace labels: app: nginx spec: containers: - name: nginx image: nginx
Apply the pod configuration:
kubectl apply -f nginx-pod.yaml
Deploy another pod to test the ingress policy:
apiVersion: v1 kind: Pod metadata: name: test-pod namespace: my-namespace labels: app: allowed-app spec: containers: - name: busybox image: busybox command: ['sh', '-c', 'sleep 3600']
Apply the pod configuration:
kubectl apply -f test-pod.yaml
-
Test Network Policies
Exec into the test pod and try to access the nginx pod:
kubectl exec -it test-pod -n my-namespace -- wget -qO- http://nginx-pod-ipIf the network policies are correctly applied, the request should succeed.
-
-
-
Disaster Recovery and Backup
- Set Up etcd Backup
-
Ensure etcd Access:Make sure you have access to the etcd nodes. In our setup, etcd runs on the control plane nodes.
-
Install etcdctl:On each control plane node, install etcdctl if not already present:
ETCD_VER=v3.5.1 wget https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz tar xzvf etcd-${ETCD_VER}-linux-amd64.tar.gz sudo mv etcd-${ETCD_VER}-linux-amd64/etcdctl /usr/local/bin
-
Set up Azure Blob Storage:Create an Azure Storage Account and a container for storing backups.
-
Install Azure CLI:Install Azure CLI on the control plane nodes:
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash -
Create a Backup Script:Create a script named
etcd-backup.sh: (Can be a task)#!/bin/bash *# Set variables* DATE=$(date +"%Y%m%d_%H%M%S") ETCDCTL_API=3 ETCD_ENDPOINTS="https://127.0.0.1:2379" ETCDCTL_CERT="/etc/kubernetes/pki/etcd/server.crt" ETCDCTL_KEY="/etc/kubernetes/pki/etcd/server.key" ETCDCTL_CACERT="/etc/kubernetes/pki/etcd/ca.crt" BACKUP_DIR="/tmp/etcd_backup" STORAGE_ACCOUNT_NAME="your_storage_account_name" CONTAINER_NAME="your_container_name" *# Create backup* mkdir -p $BACKUP_DIR etcdctl --endpoints=$ETCD_ENDPOINTS \ --cert=$ETCDCTL_CERT \ --key=$ETCDCTL_KEY \ --cacert=$ETCDCTL_CACERT \ snapshot save $BACKUP_DIR/etcd-snapshot-$DATE.db *# Compress backup* tar -czvf $BACKUP_DIR/etcd-snapshot-$DATE.tar.gz -C $BACKUP_DIR etcd-snapshot-$DATE.db *# Upload to Azure Blob Storage* az storage blob upload --account-name $STORAGE_ACCOUNT_NAME \ --container-name $CONTAINER_NAME \ --name etcd-snapshot-$DATE.tar.gz \ --file $BACKUP_DIR/etcd-snapshot-$DATE.tar.gz *# Clean up local files* rm -rf $BACKUP_DIR
-
Make the Script Executable:
chmod +x etcd-backup.sh
-
Set up Azure Credentials:Authenticate with Azure:
az login
-
Schedule the Backup:Use cron to schedule regular backups. Edit the crontab:Add a line to run the backup daily at 2 AM: ( can be a task )
crontab -e
0 2 * * * /path/to/etcd-backup.sh
-
Test the Backup:Run the script manually to ensure it works:
./etcd-backup.sh
-
Implement Backup Rotation:Add logic to the script to delete old backups from Azure Blob Storage. ( can be a task )
-
- Set Up etcd Backup
-
Performance Tuning and Optimization
- Node Affinity and Anti-Affinity
-
Ensure you have a 5-node Kubernetes cluster running on Azure VMs. Then label the nodes to demonstrate affinity concepts:
kubectl label nodes kubernetes-node-1 disktype=ssd zone=us-east-1a kubectl label nodes kubernetes-node-2 disktype=hdd zone=us-east-1b kubectl label nodes kubernetes-node-3 disktype=ssd zone=us-east-1b kubectl label nodes kubernetes-node-4 disktype=hdd zone=us-east-1a kubectl label nodes kubernetes-node-5 disktype=ssd zone=us-east-1c
-
Explain Node Affinity:Node affinity allows you to constrain which nodes your pod can be scheduled on based on node labels.
-
Demonstrate Required Node Affinity:Create a pod that must run on nodes with SSD:Apply and check where it's scheduled:
apiVersion: v1 kind: Pod metadata: name: nginx-ssd spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: disktype operator: In values: - ssd containers: - name: nginx image: nginx
kubectl apply -f nginx-ssd.yaml kubectl get pod nginx-ssd -o wide
-
Demonstrate Preferred Node Affinity:Create a pod that prefers nodes in us-east-1a but can run elsewhere:
textapiVersion: v1 kind: Pod metadata: name: nginx-preferred-zone spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: zone operator: In values: - us-east-1a containers: - name: nginx image: nginx
-
Explain Node Anti-Affinity:Node anti-affinity ensures that pods are not scheduled on nodes with certain labels.
-
Demonstrate Node Anti-Affinity:Create a pod that avoids nodes with HDD:
apiVersion: v1 kind: Pod metadata: name: nginx-anti-hdd spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: disktype operator: NotIn values: - hdd containers: - name: nginx image: nginx
-
Combine Affinity and Anti-Affinity:Create a pod that must run on SSD nodes but prefers not to be in us-east-1b:
apiVersion: v1 kind: Pod metadata: name: nginx-complex-affinity spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: disktype operator: In values: - ssd preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: zone operator: NotIn values: - us-east-1b containers: - name: nginx image: nginx
-
Demonstrate Pod Affinity:Pod affinity allows you to influence pod scheduling based on the labels on other pods running on the node.
apiVersion: v1 kind: Pod metadata: name: nginx-with-pod-affinity spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - web-store topologyKey: "kubernetes.io/hostname" containers: - name: nginx image: nginx
-
Demonstrate Pod Anti-Affinity:Pod anti-affinity helps in spreading pods across nodes.
apiVersion: v1 kind: Pod metadata: name: nginx-with-pod-anti-affinity spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - web-store topologyKey: "kubernetes.io/hostname" containers: - name: nginx image: nginx
-
Practical Exercise:Have students create a deployment that:
- Runs on SSD nodes
- Prefers us-east-1a zone
- Ensures no more than one pod per node
- Has pod anti-affinity with pods labeled "app=database"
-
Performance Impact:Discuss how affinity rules can impact cluster performance and resource utilization.https://overcast.blog/mastering-node-affinity-and-anti-affinity-in-kubernetes-db769af90f5c?gi=2015811deb17
-
Best Practices that can be discussed:
- Use node affinity for hardware requirements
- Use pod affinity for related services
- Use pod anti-affinity for high availability
- Be cautious with required rules as they can prevent scheduling
-
- Optimize Resource Requests and Limits ( better done in Cloud Native)
- Node Affinity and Anti-Affinity
-
Implement Auto-Scaling
-
Enable Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
-
Create a deployment with horizontal pod autoscaler
-
-
Configure Persistent Storage ( not that interesting only exec commands )
- Install Azure CSI Driver
- Create a storage class
- Create a Persistant Volume
- Creata a Persistant Volume Claim
- Attach Persistant volume Claim to a Pod
-
Set Up Monitoring with Prometheus and Grafana
-
Add the Prometheus community Helm repository:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update
-
Create a monitoring namespace:
kubectl create namespace monitoring
-
Create a
values-monitoring.yamlfile with the following content:prometheus: prometheusSpec: scrapeInterval: 10s evaluationInterval: 30s grafana: persistence: enabled: true dashboardProviders: dashboardproviders.yaml: apiVersion: 1 providers: - name: 'default' orgId: 1 folder: '' type: file disableDeletion: false editable: true options: path: /var/lib/grafana/dashboards/default dashboards: default: nginx-ingress: gnetId: 9614 revision: 1 datasource: Prometheus k8s-cluster: gnetId: 12575 revision: 1 datasource: Prometheus
-
Install the Prometheus stack using Helm:
helm install promstack prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --version 52.1.0 \ -f values-monitoring.yaml
-
Verify the installation:
kubectl get pods -n monitoring
-
Set up port-forwarding to access Grafana:
kubectl port-forward -n monitoring svc/promstack-grafana 3000:80
-
Access Grafana:Open a web browser and go to
http://localhost:3000. The default credentials are:- Username: admin
- Password: prom-operator (you should change this)
-
Configure Ingress for Grafana (optional):Create an Ingress resource for Grafana to access it externally. Create a file named
grafana-ingress.yaml:apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: grafana-ingress namespace: monitoring annotations: kubernetes.io/ingress.class: nginx spec: rules: - host: grafana.yourdomain.com http: paths: - path: / pathType: Prefix backend: service: name: promstack-grafana port: number: 80
Apply the Ingress:
kubectl apply -f grafana-ingress.yaml
-
Configure Prometheus to scrape NGINX Ingress Controller metrics:Create a file named
nginx-ingress-servicemonitor.yaml:apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: nginx-ingress-controller namespace: monitoring labels: release: promstack spec: selector: matchLabels: app.kubernetes.io/name: ingress-nginx namespaceSelector: matchNames: - ingress-nginx endpoints: - port: metrics interval: 15s
Apply the ServiceMonitor:
kubectl apply -f nginx-ingress-servicemonitor.yaml
-
-
-
Deploy a Complex Microservices Application
💡 Deploy a sample microservices application (e.g., Sock Shop):# Copyright 2018 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ---------------------------------------------------------- # WARNING: This file is autogenerated. Do not manually edit. # ---------------------------------------------------------- # [START gke_release_kubernetes_manifests_microservices_demo] --- apiVersion: apps/v1 kind: Deployment metadata: name: currencyservice labels: app: currencyservice spec: selector: matchLabels: app: currencyservice template: metadata: labels: app: currencyservice spec: serviceAccountName: currencyservice terminationGracePeriodSeconds: 5 securityContext: fsGroup: 1000 runAsGroup: 1000 runAsNonRoot: true runAsUser: 1000 containers: - name: server securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true image: gcr.io/google-samples/microservices-demo/currencyservice:v0.10.0 ports: - name: grpc containerPort: 7000 env: - name: PORT value: "7000" - name: DISABLE_PROFILER value: "1" readinessProbe: grpc: port: 7000 livenessProbe: grpc: port: 7000 resources: requests: cpu: 100m memory: 64Mi limits: cpu: 200m memory: 128Mi --- apiVersion: v1 kind: Service metadata: name: currencyservice labels: app: currencyservice spec: type: ClusterIP selector: app: currencyservice ports: - name: grpc port: 7000 targetPort: 7000 --- apiVersion: v1 kind: ServiceAccount metadata: name: currencyservice --- apiVersion: apps/v1 kind: Deployment metadata: name: loadgenerator labels: app: loadgenerator spec: selector: matchLabels: app: loadgenerator replicas: 1 template: metadata: labels: app: loadgenerator annotations: sidecar.istio.io/rewriteAppHTTPProbers: "true" spec: serviceAccountName: loadgenerator terminationGracePeriodSeconds: 5 restartPolicy: Always securityContext: fsGroup: 1000 runAsGroup: 1000 runAsNonRoot: true runAsUser: 1000 initContainers: - command: - /bin/sh - -exc - | MAX_RETRIES=12 RETRY_INTERVAL=10 for i in $(seq 1 $MAX_RETRIES); do echo "Attempt $i: Pinging frontend: ${FRONTEND_ADDR}..." STATUSCODE=$(wget --server-response http://${FRONTEND_ADDR} 2>&1 | awk '/^ HTTP/{print $2}') if [ $STATUSCODE -eq 200 ]; then echo "Frontend is reachable." exit 0 fi echo "Error: Could not reach frontend - Status code: ${STATUSCODE}" sleep $RETRY_INTERVAL done echo "Failed to reach frontend after $MAX_RETRIES attempts." exit 1 name: frontend-check securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true image: busybox:latest env: - name: FRONTEND_ADDR value: "frontend:80" containers: - name: main securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true image: gcr.io/google-samples/microservices-demo/loadgenerator:v0.10.0 env: - name: FRONTEND_ADDR value: "frontend:80" - name: USERS value: "10" resources: requests: cpu: 300m memory: 256Mi limits: cpu: 500m memory: 512Mi --- apiVersion: v1 kind: ServiceAccount metadata: name: loadgenerator --- apiVersion: apps/v1 kind: Deployment metadata: name: productcatalogservice labels: app: productcatalogservice spec: selector: matchLabels: app: productcatalogservice template: metadata: labels: app: productcatalogservice spec: serviceAccountName: productcatalogservice terminationGracePeriodSeconds: 5 securityContext: fsGroup: 1000 runAsGroup: 1000 runAsNonRoot: true runAsUser: 1000 containers: - name: server securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true image: gcr.io/google-samples/microservices-demo/productcatalogservice:v0.10.0 ports: - containerPort: 3550 env: - name: PORT value: "3550" - name: DISABLE_PROFILER value: "1" readinessProbe: grpc: port: 3550 livenessProbe: grpc: port: 3550 resources: requests: cpu: 100m memory: 64Mi limits: cpu: 200m memory: 128Mi --- apiVersion: v1 kind: Service metadata: name: productcatalogservice labels: app: productcatalogservice spec: type: ClusterIP selector: app: productcatalogservice ports: - name: grpc port: 3550 targetPort: 3550 --- apiVersion: v1 kind: ServiceAccount metadata: name: productcatalogservice --- apiVersion: apps/v1 kind: Deployment metadata: name: checkoutservice labels: app: checkoutservice spec: selector: matchLabels: app: checkoutservice template: metadata: labels: app: checkoutservice spec: serviceAccountName: checkoutservice securityContext: fsGroup: 1000 runAsGroup: 1000 runAsNonRoot: true runAsUser: 1000 containers: - name: server securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true image: gcr.io/google-samples/microservices-demo/checkoutservice:v0.10.0 ports: - containerPort: 5050 readinessProbe: grpc: port: 5050 livenessProbe: grpc: port: 5050 env: - name: PORT value: "5050" - name: PRODUCT_CATALOG_SERVICE_ADDR value: "productcatalogservice:3550" - name: SHIPPING_SERVICE_ADDR value: "shippingservice:50051" - name: PAYMENT_SERVICE_ADDR value: "paymentservice:50051" - name: EMAIL_SERVICE_ADDR value: "emailservice:5000" - name: CURRENCY_SERVICE_ADDR value: "currencyservice:7000" - name: CART_SERVICE_ADDR value: "cartservice:7070" resources: requests: cpu: 100m memory: 64Mi limits: cpu: 200m memory: 128Mi --- apiVersion: v1 kind: Service metadata: name: checkoutservice labels: app: checkoutservice spec: type: ClusterIP selector: app: checkoutservice ports: - name: grpc port: 5050 targetPort: 5050 --- apiVersion: v1 kind: ServiceAccount metadata: name: checkoutservice --- apiVersion: apps/v1 kind: Deployment metadata: name: shippingservice labels: app: shippingservice spec: selector: matchLabels: app: shippingservice template: metadata: labels: app: shippingservice spec: serviceAccountName: shippingservice securityContext: fsGroup: 1000 runAsGroup: 1000 runAsNonRoot: true runAsUser: 1000 containers: - name: server securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true image: gcr.io/google-samples/microservices-demo/shippingservice:v0.10.0 ports: - containerPort: 50051 env: - name: PORT value: "50051" - name: DISABLE_PROFILER value: "1" readinessProbe: periodSeconds: 5 grpc: port: 50051 livenessProbe: grpc: port: 50051 resources: requests: cpu: 100m memory: 64Mi limits: cpu: 200m memory: 128Mi --- apiVersion: v1 kind: Service metadata: name: shippingservice labels: app: shippingservice spec: type: ClusterIP selector: app: shippingservice ports: - name: grpc port: 50051 targetPort: 50051 --- apiVersion: v1 kind: ServiceAccount metadata: name: shippingservice --- apiVersion: apps/v1 kind: Deployment metadata: name: cartservice labels: app: cartservice spec: selector: matchLabels: app: cartservice template: metadata: labels: app: cartservice spec: serviceAccountName: cartservice terminationGracePeriodSeconds: 5 securityContext: fsGroup: 1000 runAsGroup: 1000 runAsNonRoot: true runAsUser: 1000 containers: - name: server securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true image: gcr.io/google-samples/microservices-demo/cartservice:v0.10.0 ports: - containerPort: 7070 env: - name: REDIS_ADDR value: "redis-cart:6379" resources: requests: cpu: 200m memory: 64Mi limits: cpu: 300m memory: 128Mi readinessProbe: initialDelaySeconds: 15 grpc: port: 7070 livenessProbe: initialDelaySeconds: 15 periodSeconds: 10 grpc: port: 7070 --- apiVersion: v1 kind: Service metadata: name: cartservice labels: app: cartservice spec: type: ClusterIP selector: app: cartservice ports: - name: grpc port: 7070 targetPort: 7070 --- apiVersion: v1 kind: ServiceAccount metadata: name: cartservice --- apiVersion: apps/v1 kind: Deployment metadata: name: redis-cart labels: app: redis-cart spec: selector: matchLabels: app: redis-cart template: metadata: labels: app: redis-cart spec: securityContext: fsGroup: 1000 runAsGroup: 1000 runAsNonRoot: true runAsUser: 1000 containers: - name: redis securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true image: redis:alpine ports: - containerPort: 6379 readinessProbe: periodSeconds: 5 tcpSocket: port: 6379 livenessProbe: periodSeconds: 5 tcpSocket: port: 6379 volumeMounts: - mountPath: /data name: redis-data resources: limits: memory: 256Mi cpu: 125m requests: cpu: 70m memory: 200Mi volumes: - name: redis-data emptyDir: {} --- apiVersion: v1 kind: Service metadata: name: redis-cart labels: app: redis-cart spec: type: ClusterIP selector: app: redis-cart ports: - name: tcp-redis port: 6379 targetPort: 6379 --- apiVersion: apps/v1 kind: Deployment metadata: name: emailservice labels: app: emailservice spec: selector: matchLabels: app: emailservice template: metadata: labels: app: emailservice spec: serviceAccountName: emailservice terminationGracePeriodSeconds: 5 securityContext: fsGroup: 1000 runAsGroup: 1000 runAsNonRoot: true runAsUser: 1000 containers: - name: server securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true image: gcr.io/google-samples/microservices-demo/emailservice:v0.10.0 ports: - containerPort: 8080 env: - name: PORT value: "8080" - name: DISABLE_PROFILER value: "1" readinessProbe: periodSeconds: 5 grpc: port: 8080 livenessProbe: periodSeconds: 5 grpc: port: 8080 resources: requests: cpu: 100m memory: 64Mi limits: cpu: 200m memory: 128Mi --- apiVersion: v1 kind: Service metadata: name: emailservice labels: app: emailservice spec: type: ClusterIP selector: app: emailservice ports: - name: grpc port: 5000 targetPort: 8080 --- apiVersion: v1 kind: ServiceAccount metadata: name: emailservice --- apiVersion: apps/v1 kind: Deployment metadata: name: paymentservice labels: app: paymentservice spec: selector: matchLabels: app: paymentservice template: metadata: labels: app: paymentservice spec: serviceAccountName: paymentservice terminationGracePeriodSeconds: 5 securityContext: fsGroup: 1000 runAsGroup: 1000 runAsNonRoot: true runAsUser: 1000 containers: - name: server securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true image: gcr.io/google-samples/microservices-demo/paymentservice:v0.10.0 ports: - containerPort: 50051 env: - name: PORT value: "50051" - name: DISABLE_PROFILER value: "1" readinessProbe: grpc: port: 50051 livenessProbe: grpc: port: 50051 resources: requests: cpu: 100m memory: 64Mi limits: cpu: 200m memory: 128Mi --- apiVersion: v1 kind: Service metadata: name: paymentservice labels: app: paymentservice spec: type: ClusterIP selector: app: paymentservice ports: - name: grpc port: 50051 targetPort: 50051 --- apiVersion: v1 kind: ServiceAccount metadata: name: paymentservice --- apiVersion: apps/v1 kind: Deployment metadata: name: frontend labels: app: frontend spec: selector: matchLabels: app: frontend template: metadata: labels: app: frontend annotations: sidecar.istio.io/rewriteAppHTTPProbers: "true" spec: serviceAccountName: frontend securityContext: fsGroup: 1000 runAsGroup: 1000 runAsNonRoot: true runAsUser: 1000 containers: - name: server securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true image: gcr.io/google-samples/microservices-demo/frontend:v0.10.0 ports: - containerPort: 8080 readinessProbe: initialDelaySeconds: 10 httpGet: path: "/_healthz" port: 8080 httpHeaders: - name: "Cookie" value: "shop_session-id=x-readiness-probe" livenessProbe: initialDelaySeconds: 10 httpGet: path: "/_healthz" port: 8080 httpHeaders: - name: "Cookie" value: "shop_session-id=x-liveness-probe" env: - name: PORT value: "8080" - name: PRODUCT_CATALOG_SERVICE_ADDR value: "productcatalogservice:3550" - name: CURRENCY_SERVICE_ADDR value: "currencyservice:7000" - name: CART_SERVICE_ADDR value: "cartservice:7070" - name: RECOMMENDATION_SERVICE_ADDR value: "recommendationservice:8080" - name: SHIPPING_SERVICE_ADDR value: "shippingservice:50051" - name: CHECKOUT_SERVICE_ADDR value: "checkoutservice:5050" - name: AD_SERVICE_ADDR value: "adservice:9555" - name: SHOPPING_ASSISTANT_SERVICE_ADDR value: "shoppingassistantservice:80" # # ENV_PLATFORM: One of: local, gcp, aws, azure, onprem, alibaba # # When not set, defaults to "local" unless running in GKE, otherwies auto-sets to gcp # - name: ENV_PLATFORM # value: "aws" - name: ENABLE_PROFILER value: "0" # - name: CYMBAL_BRANDING # value: "true" # - name: ENABLE_ASSISTANT # value: "true" # - name: FRONTEND_MESSAGE # value: "Replace this with a message you want to display on all pages." # As part of an optional Google Cloud demo, you can run an optional microservice called the "packaging service". # - name: PACKAGING_SERVICE_URL # value: "" # This value would look like "http://123.123.123" resources: requests: cpu: 100m memory: 64Mi limits: cpu: 200m memory: 128Mi --- apiVersion: v1 kind: Service metadata: name: frontend labels: app: frontend spec: type: ClusterIP selector: app: frontend ports: - name: http port: 80 targetPort: 8080 # --- # apiVersion: v1 # kind: Service # metadata: # name: frontend-external # labels: # app: frontend # spec: # type: LoadBalancer # selector: # app: frontend # ports: # - name: http # port: 80 # targetPort: 8080 --- apiVersion: v1 kind: ServiceAccount metadata: name: frontend --- apiVersion: apps/v1 kind: Deployment metadata: name: recommendationservice labels: app: recommendationservice spec: selector: matchLabels: app: recommendationservice template: metadata: labels: app: recommendationservice spec: serviceAccountName: recommendationservice terminationGracePeriodSeconds: 5 securityContext: fsGroup: 1000 runAsGroup: 1000 runAsNonRoot: true runAsUser: 1000 containers: - name: server securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true image: gcr.io/google-samples/microservices-demo/recommendationservice:v0.10.0 ports: - containerPort: 8080 readinessProbe: periodSeconds: 5 grpc: port: 8080 livenessProbe: periodSeconds: 5 grpc: port: 8080 env: - name: PORT value: "8080" - name: PRODUCT_CATALOG_SERVICE_ADDR value: "productcatalogservice:3550" - name: DISABLE_PROFILER value: "1" resources: requests: cpu: 100m memory: 220Mi limits: cpu: 200m memory: 450Mi --- apiVersion: v1 kind: Service metadata: name: recommendationservice labels: app: recommendationservice spec: type: ClusterIP selector: app: recommendationservice ports: - name: grpc port: 8080 targetPort: 8080 --- apiVersion: v1 kind: ServiceAccount metadata: name: recommendationservice --- apiVersion: apps/v1 kind: Deployment metadata: name: adservice labels: app: adservice spec: selector: matchLabels: app: adservice template: metadata: labels: app: adservice spec: serviceAccountName: adservice terminationGracePeriodSeconds: 5 securityContext: fsGroup: 1000 runAsGroup: 1000 runAsNonRoot: true runAsUser: 1000 containers: - name: server securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true image: gcr.io/google-samples/microservices-demo/adservice:v0.10.0 ports: - containerPort: 9555 env: - name: PORT value: "9555" resources: requests: cpu: 200m memory: 180Mi limits: cpu: 300m memory: 300Mi readinessProbe: initialDelaySeconds: 20 periodSeconds: 15 grpc: port: 9555 livenessProbe: initialDelaySeconds: 20 periodSeconds: 15 grpc: port: 9555 --- apiVersion: v1 kind: Service metadata: name: adservice labels: app: adservice spec: type: ClusterIP selector: app: adservice ports: - name: grpc port: 9555 targetPort: 9555 --- apiVersion: v1 kind: ServiceAccount metadata: name: adservice # [END gke_release_kubernetes_manifests_microservices_demo]
-
CI/CD Pipeline Integration ( in Cloud Native)
- Set Up a CI/CD Pipeline with Azure DevOps
-
Troubleshooting and Debugging
- Common Issues and Solutions
- Network issues
- Pod crashes
- Resource exhaustion
- Debugging Tools
- kubectl logs
- kubectl exec
- kubectl describe
- Common Issues and Solutions
-
Infrastructure Setup with Terraform
-
Checkpoint 1.1: Terraform Initialization
- Command:
terraform init - Expected Output: Initialization output confirming that Terraform has been successfully initialized, including downloading providers and setting up the backend.
Initializing the backend... Initializing provider plugins... - Reusing previous version of hashicorp/azurerm from the dependency lock file - Using previously-installed hashicorp/azurerm v3.112.0 Terraform has been successfully initialized! You may now begin working with Terraform. Try running "terraform plan" to see any changes that are required for your infrastructure. All Terraform commands should now work. If you ever set or change modules or backend configuration for Terraform, rerun this command to reinitialize your working directory. If you forget, other commands will detect it and remind you to do so if necessary.
- Command:
-
Checkpoint 1.2: Terraform Plan ( grader check)
- Command:
terraform plan - Expected Output: A detailed plan showing the resources to be created with no errors. Ensure that resources like VMs, VNet, Subnets, NSG, and Public IPs are listed.
compare the values with a pre-existing terraform plan -out file
- Command:
-
Checkpoint 1.3: Terraform Apply
- Command:
terraform apply - Expected Output: Confirmation that the resources have been successfully created, with output similar to "Apply complete! Resources: X added, 0 changed, 0 destroyed."
Apply complete! Resources: 22 added, 0 changed, 0 destroyed. Outputs: vm_private_ips = [ "10.0.1.6", "10.0.1.5", "10.0.1.4", ] vm_public_ips = [ "", "", "", ]
- Command:
-
Checkpoint 1.4: Retrieving Public IPs
- Command:
terraform refresh && terraform output vm_public_ips - Expected Output: List of public IP addresses for the VMs, confirming they are correctly provisioned.
Outputs: vm_private_ips = [ "10.0.1.6", "10.0.1.5", "10.0.1.4", ] vm_public_ips = [ "20.185.226.159", "20.185.226.247", "20.185.226.165", ] kshitijdhara@192 terraform % terraform output vm_public_ips [ "20.185.226.159", "20.185.226.247", "20.185.226.165", ]
- Command:
-
-
Kubernetes Cluster Setup
-
Checkpoint 2.1: Container Runtime Installation
- Command:
sudo systemctl status containerd - Expected Output:
Active: active (running)indicating thatcontainerdis correctly installed and running.
adminuser@kubernetes-node-1:~$ sudo systemctl status containerd ● containerd.service - containerd container runtime Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2024-08-13 07:24:51 UTC; 4min 8s ago Docs: https://containerd.io Main PID: 2276 (containerd) Tasks: 8 Memory: 19.6M CGroup: /system.slice/containerd.service └─2276 /usr/bin/containerd Aug 13 07:24:51 kubernetes-node-1 containerd[2276]: time="2024-08-13T07:24:51.180921147Z" level=info msg=serving... address=> Aug 13 07:24:51 kubernetes-node-1 containerd[2276]: time="2024-08-13T07:24:51.181022050Z" level=info msg="Start subscribing > Aug 13 07:24:51 kubernetes-node-1 containerd[2276]: time="2024-08-13T07:24:51.181069452Z" level=info msg="Start recovering s> Aug 13 07:24:51 kubernetes-node-1 containerd[2276]: time="2024-08-13T07:24:51.181136154Z" level=info msg="Start event monito> Aug 13 07:24:51 kubernetes-node-1 containerd[2276]: time="2024-08-13T07:24:51.181165954Z" level=info msg="Start snapshots sy> Aug 13 07:24:51 kubernetes-node-1 containerd[2276]: time="2024-08-13T07:24:51.181179855Z" level=info msg="Start cni network > Aug 13 07:24:51 kubernetes-node-1 containerd[2276]: time="2024-08-13T07:24:51.181191755Z" level=info msg="Start streaming se> Aug 13 07:24:51 kubernetes-node-1 systemd[1]: Started containerd container runtime. Aug 13 07:24:51 kubernetes-node-1 containerd[2276]: time="2024-08-13T07:24:51.182936906Z" level=info msg="containerd success"> adminuser@kubernetes-node-2:~$ sudo systemctl status containerd ● containerd.service - containerd container runtime Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2024-08-13 07:25:20 UTC; 4min 32s ago Docs: https://containerd.io Main PID: 2289 (containerd) Tasks: 8 Memory: 17.0M CGroup: /system.slice/containerd.service └─2289 /usr/bin/containerd Aug 13 07:25:20 kubernetes-node-2 containerd[2289]: time="2024-08-13T07:25:20.415039690Z" level=info msg="Start recovering state" Aug 13 07:25:20 kubernetes-node-2 containerd[2289]: time="2024-08-13T07:25:20.415381796Z" level=info msg="Start event monitor" Aug 13 07:25:20 kubernetes-node-2 containerd[2289]: time="2024-08-13T07:25:20.415525399Z" level=info msg="Start snapshots syncer" Aug 13 07:25:20 kubernetes-node-2 containerd[2289]: time="2024-08-13T07:25:20.415656701Z" level=info msg="Start cni network conf sync> Aug 13 07:25:20 kubernetes-node-2 containerd[2289]: time="2024-08-13T07:25:20.415777203Z" level=info msg="Start streaming server" Aug 13 07:25:20 kubernetes-node-2 containerd[2289]: time="2024-08-13T07:25:20.414907188Z" level=info msg=serving... address=/run/cont> Aug 13 07:25:20 kubernetes-node-2 containerd[2289]: time="2024-08-13T07:25:20.416018608Z" level=info msg=serving... address=/run/cont> Aug 13 07:25:20 kubernetes-node-2 systemd[1]: Started containerd container runtime. Aug 13 07:25:20 kubernetes-node-2 containerd[2289]: time="2024-08-13T07:25:20.418502952Z" level=info msg="containerd successfully boot"> adminuser@kubernetes-node-3:~$ sudo systemctl status containerd ● containerd.service - containerd container runtime Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2024-08-13 07:25:20 UTC; 4min 57s ago Docs: https://containerd.io Main PID: 2311 (containerd) Tasks: 8 Memory: 20.5M CGroup: /system.slice/containerd.service └─2311 /usr/bin/containerd Aug 13 07:25:20 kubernetes-node-3 containerd[2311]: time="2024-08-13T07:25:20.489667404Z" level=info msg="Start recovering state" Aug 13 07:25:20 kubernetes-node-3 containerd[2311]: time="2024-08-13T07:25:20.489793305Z" level=info msg=serving... address=/run/cont> Aug 13 07:25:20 kubernetes-node-3 containerd[2311]: time="2024-08-13T07:25:20.489912306Z" level=info msg=serving... address=/run/cont> Aug 13 07:25:20 kubernetes-node-3 containerd[2311]: time="2024-08-13T07:25:20.489919306Z" level=info msg="Start event monitor" Aug 13 07:25:20 kubernetes-node-3 containerd[2311]: time="2024-08-13T07:25:20.490127308Z" level=info msg="Start snapshots syncer" Aug 13 07:25:20 kubernetes-node-3 containerd[2311]: time="2024-08-13T07:25:20.490213509Z" level=info msg="Start cni network conf sync> Aug 13 07:25:20 kubernetes-node-3 containerd[2311]: time="2024-08-13T07:25:20.490291809Z" level=info msg="Start streaming server" Aug 13 07:25:20 kubernetes-node-3 systemd[1]: Started containerd container runtime. Aug 13 07:25:20 kubernetes-node-3 containerd[2311]: time="2024-08-13T07:25:20.492462228Z" level=info msg="containerd successfully boot">
- Command:
-
Checkpoint 2.2: Kubernetes Components Installation
- Command:
kubectl version --client && kubeadm version - Expected Output: Versions of
kubectl,kubeadm, andkubelet, confirming they are installed correctly.
kubectl version --client && kubeadm version Client Version: v1.30.3 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 kubeadm version: &version.Info{Major:"1", Minor:"30", GitVersion:"v1.30.3", GitCommit:"6fc0a69044f1ac4c13841ec4391224a2df241460", GitTreeState:"clean", BuildDate:"2024-07-16T23:53:15Z", GoVersion:"go1.22.5", Compiler:"gc", Platform:"linux/amd64"}
- Command:
-
Checkpoint 2.3: Kubernetes Master Node Initialization
- Command:
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-cert-extra-sans=<control-plane-public-ip> - Expected Output: Output showing successful initialization, including commands to join worker nodes and set up
kubectlfor the non-root user.
adminuser@kubernetes-node-1:~$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-cert-extra-sans=20.185.226.159 [init] Using Kubernetes version: v1.30.3 [preflight] Running pre-flight checks [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' W0813 07:32:39.072627 3920 checks.go:844] detected that the sandbox image "registry.k8s.io/pause:3.8" of the container runtime is inconsistent with that used by kubeadm.It is recommended to use "registry.k8s.io/pause:3.9" as the CRI sandbox image. [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "ca" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes-node-1 kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.1.6 20.185.226.159] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-ca" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Generating "etcd/ca" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [kubernetes-node-1 localhost] and IPs [10.0.1.6 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [kubernetes-node-1 localhost] and IPs [10.0.1.6 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "sa" key and public key [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "super-admin.conf" kubeconfig file [kubeconfig] Writing "kubelet.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Starting the kubelet [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests" [kubelet-check] Waiting for a healthy kubelet. This can take up to 4m0s [kubelet-check] The kubelet is healthy after 1.000903519s [api-check] Waiting for a healthy API server. This can take up to 4m0s [api-check] The API server is healthy after 9.001058741s [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster [upload-certs] Skipping phase. Please see --upload-certs [mark-control-plane] Marking the node kubernetes-node-1 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers] [mark-control-plane] Marking the node kubernetes-node-1 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule] [bootstrap-token] Using token: zoba9m.qsukilco910pmc78 [bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles [bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes [bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials [bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token [bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster [bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace [kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key [addons] Applied essential addon: CoreDNS [addons] Applied essential addon: kube-proxy Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Alternatively, if you are the root user, you can run: export KUBECONFIG=/etc/kubernetes/admin.conf You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following on each as root: kubeadm join 10.0.1.6:6443 --token zoba9m.qsukilco910pmc78 \ --discovery-token-ca-cert-hash sha256:527f5023c79be823e2bfd6b601fa2ba986e7241846b8f41a0a10f39c5499916a
- Command:
-
Checkpoint 2.4: Setting up kubectl for Non-Root User
- Command:
kubectl get nodes - Expected Output: List of nodes with the master node in a
Not Readystate.
kubectl get nodes NAME STATUS ROLES AGE VERSION kubernetes-node-1 NotReady control-plane 4m v1.30.3
- Command:
-
Checkpoint 2.5: CNI Plugin Installation
- Command:
kubectl get pods -n kube-system - Expected Output: List of pods in the
kube-systemnamespace, including the CNI plugin (e.g., canal) in aRunningstate.
kubectl get nodes -w NAME STATUS ROLES AGE VERSION kubernetes-node-1 NotReady control-plane 5m15s v1.30.3 kubernetes-node-1 Ready control-plane 5m28s v1.30.3 kubernetes-node-1 Ready control-plane 5m29s v1.30.3
- Command:
-
Checkpoint 2.6: Worker Nodes Joining ( grader check)
- Command:
kubectl get nodes - Expected Output: List of nodes, including all worker nodes in a
Readystate.
sudo kubeadm join 10.0.1.6:6443 --token zoba9m.qsukilco910pmc78 --discovery-token-ca-cert-hash sha256:527f5023c79be823e2bfd6b601fa2ba986e7241846b8f41a0a10f39c5499916a [preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-check] Waiting for a healthy kubelet. This can take up to 4m0s [kubelet-check] The kubelet is healthy after 1.001805469s [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
kubectl get nodes NAME STATUS ROLES AGE VERSION kubernetes-node-1 Ready control-plane 7m28s v1.30.3 kubernetes-node-2 Ready <none> 15s v1.30.3 kubernetes-node-3 NotReady <none> 9s v1.30.3
- Command:
-
Checkpoint 2.7: Cluster Access from Local Machine ( grader check)
- Command:
kubectl get nodes(from the local machine) - Expected Output: Same list of nodes as above, confirming successful remote access.
kshitijdhara@192 ~ % kubectl get nodes NAME STATUS ROLES AGE VERSION kubernetes-node-1 Ready control-plane 11m v1.30.3 kubernetes-node-2 Ready <none> 4m13s v1.30.3 kubernetes-node-3 Ready <none> 4m7s v1.30.3
- Command:
-
Checkpoint 2.8: Testing Deployment ( grader check for deployment running and accesible)
-
Command:
kubectl create -f https://raw.githubusercontent.com/Azure-Samples/azure-voting-app-redis/master/azure-vote-all-in-one-redis.yaml kubectl port-forward service/azure-vote-front 8080:80
-
Expected Output: Access to the Azure Vote application via
http://localhost:8080in the browser.
kubectl get deployments NAME READY UP-TO-DATE AVAILABLE AGE azure-vote-back 1/1 1 1 6m56s azure-vote-front 1/1 1 1 6m52s kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE azure-vote-back ClusterIP 10.109.221.107 <none> 6379/TCP 7m21s azure-vote-front LoadBalancer 10.103.69.76 <pending> 80:31571/TCP 7m21s kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 20m kubectl port-forward service/azure-vote-front 8080:80 Forwarding from 127.0.0.1:8080 -> 80 Forwarding from [::1]:8080 -> 80
Its is expected that the azure-vote-front service external-ip to be in pending state as we haven’t configure a load balancer configuration for the kube-system so it won’t be able to set up an external loadbalancer
-
-
-
Kubernetes Dashboard Installation
- Checkpoint 3.1: Helm Repo Addition
- Command:
helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/ && helm repo update - Expected Output:
Repository 'kubernetes-dashboard' addedfollowed by... has been successfully updated.
- Command:
- Checkpoint 3.2: Dashboard Deployment
- Command:
helm install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard - Expected Output:
Release "kubernetes-dashboard" has been installed. Happy Helming!
- Command:
- Checkpoint 3.1: Helm Repo Addition
-
Advanced Kubernetes Configuration
-
Checkpoint 4.1: Control Plane Isolation Simulation ( grader check)
-
Check if the taints have been removed
kshitijdhara@192 terraform % kubectl taint nodes --all node-role.kubernetes.io/control-plane:NoSchedule- node/kubernetes-node-1 untainted taint "node-role.kubernetes.io/control-plane:NoSchedule" not found taint "node-role.kubernetes.io/control-plane:NoSchedule" not found kshitijdhara@192 terraform % kubectl label nodes --all node.kubernetes.io/exclude-from-external-load-balancers- node/kubernetes-node-1 unlabeled label "node.kubernetes.io/exclude-from-external-load-balancers" not found. node/kubernetes-node-2 not labeled label "node.kubernetes.io/exclude-from-external-load-balancers" not found. node/kubernetes-node-3 not labeled kshitijdhara@192 terraform % kubectl get nodes -o jsonpath='{.items[*].spec.taints}' kshitijdhara@192 terraform %
-
Apply and check the logs of the malicious pod
fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/main/x86_64/APKINDEX.tar.gz fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/community/x86_64/APKINDEX.tar.gz (1/10) Installing ca-certificates (20240705-r0) (2/10) Installing brotli-libs (1.1.0-r2) (3/10) Installing c-ares (1.28.1-r0) (4/10) Installing libunistring (1.2-r0) (5/10) Installing libidn2 (2.3.7-r0) (6/10) Installing nghttp2-libs (1.62.1-r0) (7/10) Installing libpsl (0.21.5-r1) (8/10) Installing zstd-libs (1.5.6-r0) (9/10) Installing libcurl (8.9.0-r0) (10/10) Installing curl (8.9.0-r0) Executing busybox-1.36.1-r29.trigger Executing ca-certificates-20240705-r0.trigger OK: 13 MiB in 24 packages % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 180 100 180 0 0 9373 0 --:--:-- --:--:-- --:--:-- 10000 { "kind": "APIVersions", "versions": [ "v1" ], "serverAddressByClientCIDRs": [ { "clientCIDR": "0.0.0.0/0", "serverAddress": "10.0.1.6:6443" } ] % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed }{ "kind": "APIVersions", "versions": [ "v1" ], "serverAddressByClientCIDRs": [ { "clientCIDR": "0.0.0.0/0", "serverAddress": "10.0.1.6:6443" } ] 100 180 100 180 0 0 6810 0 --:--:-- --:--:-- --:--:-- 7200
-
Check if the taints have been reapplied
kshitijdhara@192 Cloud admin project % kubectl get nodes -o jsonpath='{.items[*].metadata.name} {.items[*].spec.taints}{"\n"}' kubernetes-node-1 kubernetes-node-2 kubernetes-node-3 [{"effect":"NoSchedule","key":"node-role.kubernetes.io/control-plane"}]
-
-
Checkpoint 4.2: HA Cluster Load Balancer
-
Command:
nc -v <LOAD_BALANCER_IP> 6443
-
Expected Output:
Connection refused, confirming that the load balancer can reach the control plane node but the API server isn't running yet.
-
-
-
Implement Ingress Controllers
- Checkpoint 5.1: Ingress Controller Installation
- Command:
helm install ingress-nginx ingress-nginx/ingress-nginx --namespace ingress-nginx --create-namespace - Expected Output: Successful installation with
Release "ingress-nginx" installed.
- Command:
- Checkpoint 5.2: Ingress Resource Creation
- Command:
kubectl get ingress - Expected Output: List of Ingress resources with their respective rules.
- Command:
- Checkpoint 5.1: Ingress Controller Installation
-
Security Best Practices
-
Checkpoint 6.1: RBAC Configuration
-
Command:
kubectl auth can-i get pods --as=system:serviceaccount:default:my-service-account
-
Expected Output:
yes, confirming that the RBAC configuration is correct. -
For wrong RBAC Configuration, output is
kshitijdhara@192 Cloud admin project % kubectl auth can-i get pods --as=system:serviceaccount:default:my-service-account no
-
For correct RBAC configuration, output is
kshitijdhara@192 Cloud admin project % kubectl auth can-i get pods --as=system:serviceaccount:default:my-service-account yes
-
-
Checkpoint 6.2: Network Policy Enforcement
-
Command:
kubectl exec -it test-pod -n my-namespace -- wget -qO- http://nginx-pod-ip -
Expected Output: The command should succeed if network policies are correctly applied.
kshitijdhara@192 Cloud admin project % kubectl exec -it test-pod -n my-namespace -- wget -qO- http://10.244.2.11 <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> html { color-scheme: light dark; } body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
if there are any errors in the network policies then it will timeout
-
-
-
Disaster Recovery and Backup
-
Checkpoint 7.1: etcdctl Installation
-
Command:
etcdctl --version -
Expected Output: The version information for
etcdctl, confirming it is installed.etcdctl version etcdctl version: 3.5.1 API version: 3.5
-
-
Checkpoint 7.2: Backup Script Execution
- Command:
./etcd-backup.sh - Expected Output: Successful creation and upload of an etcd snapshot, confirmed by script output.
adminuser@kubernetes-node-1:~$ ./etcd-backup.sh Error: open /etc/kubernetes/pki/etcd/server.key: permission denied tar: etcd-snapshot-20240813_105358.db: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors There are no credentials provided in your command and environment, we will query for account key for your storage account. It is recommended to provide --connection-string, --account-key or --sas-token in your command as credentials. You also can add `--auth-mode login` in your command to use Azure Active Directory (Azure AD) for authorization if your login account is assigned required RBAC roles. For more information about RBAC roles in storage, visit https://docs.microsoft.com/azure/storage/common/storage-auth-aad-rbac-cli. In addition, setting the corresponding environment variables can avoid inputting credentials in your command. Please use --help to get more information about environment variable usage. Finished[#############################################################] 100.0000% { "client_request_id": "581775ac-5962-11ef-bc2a-c9add2c8b0e5", "content_md5": "8sxJhPxQ2Ezl7PFOuIaLXQ==", "date": "2024-08-13T10:53:59+00:00", "encryption_key_sha256": null, "encryption_scope": null, "etag": "\"0x8DCBB863CCA04F3\"", "lastModified": "2024-08-13T10:53:59+00:00", "request_id": "459e5cb0-e01e-0045-076f-edc4ed000000", "request_server_encrypted": true, "version": "2022-11-02", "version_id": null }
- Command:
-
Checkpoint 7.3: Cron Schedule
- Command:
crontab -l - Expected Output: A line showing the backup script scheduled at 2 AM daily.
# Edit this file to introduce tasks to be run by cron. # # Each task to run has to be defined through a single line # indicating with different fields when the task will be run # and what command to run for the task # # To define the time you can provide concrete values for # minute (m), hour (h), day of month (dom), month (mon), # and day of week (dow) or use '*' in these fields (for 'any'). # # Notice that tasks will be started based on the cron's system # daemon's notion of time and timezones. # # Output of the crontab jobs (including errors) is sent through # email to the user the crontab file belongs to (unless redirected). # # For example, you can run a backup of all your user accounts # at 5 a.m every week with: # 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/ # # For more information see the manual pages of crontab(5) and cron(8) # # m h dom mon dow command 0 2 * * * /path/to/etcd-backup.sh
- Command:
-
-
Performance Tuning and Optimization
-
Checkpoint 8.1: Node Affinity and Anti-Affinity
-
Command:
kubectl get pods -o wide --field-selector spec.nodeName=<node-name>
-
Expected Output: Pods should be scheduled according to affinity and anti-affinity rules.
nginx-anti-hdd 1/1 Running 0 20s 10.244.2.13 kubernetes-node-3 <none> <none> nginx-complex-affinity 1/1 Running 0 20s 10.244.2.14 kubernetes-node-3 <none> <none> nginx-preferred-zone 1/1 Running 0 20s 10.244.1.12 kubernetes-node-2 <none> <none> nginx-ssd 1/1 Running 0 52s 10.244.2.12 kubernetes-node-3 <none> <none> nginx-with-pod-affinity 0/1 Pending 0 19s <none> <none> <none> <none> nginx-with-pod-anti-affinity 1/1 Running 0 18s 10.244.1.13 kubernetes-node-2 <none> <none>
-
-
Checkpoint 8.2: Autoscaling Verification
-
Command:
kubectl get hpa
-
Expected Output: List of Horizontal Pod Autoscalers showing current and desired pod counts.
-
-
