Pod Disruptions and Budgets

This lab/demo covers Kubernetes’ PodDisruptionBudgets, which allows an application deployer to control how the application handles voluntary disruptions. Voluntary disruptions include scheduled maintenance-type events, including node rehydration. Kubernetes also has involuntary distruptions, e.g. a node failing due to hardware error.

It is not possible to budget for these involuntary disruptions, but they are considered in the calculations for voluntary disruptions. I.e., if you’ve specified that your application needs to have 5 replicas running, and can only tolerate 1 replica being down, if a node with one of those replicas fails, you will not be able to also voluntarily disrupt additional replicas until the pod that was running on the failed node is executing properly on another node.

See the Kubernetes documentation for disruptions and disruption budgets for more information.

Prerequisites

  • An application deployed on a Critical Stack cluster with at least 2 replicas (Deployment or StatefulSet)
  • kubectl with administrative access to the cluster
  • Copied pod disruption budget example. Save this locally to your folder.
  • Copied deployment example. Save this locally to your folder.

Getting Started

PodDisruptionBudgets use the same ‘.spec.selector’ logic as Deployments to determine what set of pods they apply to. This will typically be a matchLabels selector.

The other required field in the ‘.spec’ is one of minAvailable or maxUnavailable. These fields can take either integer values, or percentage values to specify a percentage of the total number of pods matched by the selector. You cannot specify both of these in a single PodDisruptionBudget.

Specifying the PodDisruptionBudget

This example YAML selects the pdb-example application. It specifies that the application can only tolerate one pod being out of service at a time.

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: example-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: pdb-example

Applying this policy would be done with a kubectl command:

$ kubectl -n development create -f example-pdb.yaml
poddisruptionbudget.policy/example-pdb created

Demonstration

In order to demonstrate the effect of the above PodDisruptionBudget, it is easiest to create a Deployment with an anti-affinity policy to prevent two pods from the same application from running on the same worker node, scale up the Deployment to have 1 worker per node, and then attempt to ‘drain’ two worker nodes. The first drain should succeed, as the application can tolerate one pod being down per its disruption budget, but the second ‘drain’ will stall until the first evicted pod can be re-scheduled elsewhere.

Deploy a 3-pod application on a 3-node cluster

$ kubectl -n development create -f pdb-example-deploy.yaml
deployment.apps/example-deployment created

Confirm each pod is on a separate node because of AntiAffinity

$ kubectl -n development get pods -o wide --selector app=pdb-example
NAME                                  READY   STATUS    RESTARTS   AGE   IP             NODE                             NOMINATED NODE   READINESS GATES
example-deployment-6bdf844f44-9r4lp   1/1     Running   0          11s   10.253.4.93    ip-10-194-186-46.ec2.internal    <none>           <none>
example-deployment-6bdf844f44-c4chg   1/1     Running   0          11s   10.253.5.84    ip-10-194-184-102.ec2.internal   <none>           <none>
example-deployment-6bdf844f44-ffhqb   1/1     Running   0          11s   10.253.3.164   ip-10-194-185-167.ec2.internal   <none>           <none>

Drain one of the nodes to evict a pod

$ kubectl drain ip-10-194-184-102.ec2.internal --ignore-daemonsets
node/ip-10-194-184-102.ec2.internal cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/cilium-xl5fn, kube-system/kube-proxy-c6vwh
evicting pod "example-deployment-6bdf844f44-c4chg"
pod/example-deployment-6bdf844f44-c4chg evicted
node/ip-10-194-184-102.ec2.internal evicted

Check where pods are running now

Note that the new -hd4dl pod is in Pending status because of the AntiAffinity policy

$ kubectl -n development get pods -o wide --selector app=pdb-example
NAME                                  READY   STATUS    RESTARTS   AGE    IP             NODE                             NOMINATED NODE   READINESS GATES
example-deployment-6bdf844f44-9r4lp   1/1     Running   0          103s   10.253.4.93    ip-10-194-186-46.ec2.internal    <none>           <none>
example-deployment-6bdf844f44-ffhqb   1/1     Running   0          103s   10.253.3.164   ip-10-194-185-167.ec2.internal   <none>           <none>
example-deployment-6bdf844f44-hd4dl   0/1     Pending   0          52s    <none>         <none>                           <none>           <none>

Attempt to drain another node

$ kubectl drain ip-10-194-186-46.ec2.internal --ignore-daemonsets
node/ip-10-194-186-46.ec2.internal cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/cilium-pb5xd, kube-system/kube-proxy-wsn74
evicting pod "example-deployment-6bdf844f44-9r4lp"
evicting pod "nginx-ingress-controller-7b4888464f-fszn6"
evicting pod "coredns-86c58d9df4-mjmkh"
error when evicting pod "example-deployment-6bdf844f44-9r4lp" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
pod/coredns-86c58d9df4-mjmkh evicted
evicting pod "example-deployment-6bdf844f44-9r4lp"
error when evicting pod "example-deployment-6bdf844f44-9r4lp" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod "example-deployment-6bdf844f44-9r4lp"
error when evicting pod "example-deployment-6bdf844f44-9r4lp" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod "example-deployment-6bdf844f44-9r4lp"
error when evicting pod "example-deployment-6bdf844f44-9r4lp" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

In a separate terminal, uncordon the first node to let it schedule

$ kubectl uncordon ip-10-194-184-102.ec2.internal
node/ip-10-194-184-102.ec2.internal uncordoned

Back on first terminal, observe that the drain finally completes

Once the -hd4dl is able to be rescheduled on the uncordoned node, the PodDisruptionBudget for the application will now allow the eviction of another pod.

evicting pod "example-deployment-6bdf844f44-9r4lp"
error when evicting pod "example-deployment-6bdf844f44-9r4lp" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod "example-deployment-6bdf844f44-9r4lp"
error when evicting pod "example-deployment-6bdf844f44-9r4lp" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod "example-deployment-6bdf844f44-9r4lp"
pod/example-deployment-6bdf844f44-9r4lp evicted
node/ip-10-194-186-46.ec2.internal evicted

Confirm the nodes that have pods running

Note that we still have one Pending because of AntiAffinity, but that it’s a different one than above

$ kubectl -n development get pods -o wide --selector app=pdb-example
NAME                                  READY   STATUS    RESTARTS   AGE     IP             NODE                             NOMINATED NODE   READINESS GATES
example-deployment-6bdf844f44-ffhqb   1/1     Running   0          9m10s   10.253.3.164   ip-10-194-185-167.ec2.internal   <none>           <none>
example-deployment-6bdf844f44-glpr7   0/1     Pending   0          2m59s   <none>         <none>                           <none>           <none>
example-deployment-6bdf844f44-hd4dl   1/1     Running   0          8m19s   10.253.5.137   ip-10-194-184-102.ec2.internal   <none>           <none>

Finally, uncordon our most-recently drained node and observe that we’re back to three Running pods

$ kubectl uncordon ip-10-194-186-46.ec2.internal
node/ip-10-194-186-46.ec2.internal uncordoned
$ kubectl -n development get pods -o wide --selector app=pdb-example
NAME                                  READY   STATUS    RESTARTS   AGE     IP             NODE                             NOMINATED NODE   READINESS GATES
example-deployment-6bdf844f44-ffhqb   1/1     Running   0          10m     10.253.3.164   ip-10-194-185-167.ec2.internal   <none>           <none>
example-deployment-6bdf844f44-glpr7   1/1     Running   0          4m7s    10.253.4.222   ip-10-194-186-46.ec2.internal    <none>           <none>
example-deployment-6bdf844f44-hd4dl   1/1     Running   0          9m27s   10.253.5.137   ip-10-194-184-102.ec2.internal   <none>           <none>

Summary

Great work, in this lab you learned how to configure a Kubernetes pod distruption budget to control how applications handle voluntary disruptions.