Preemption is the process of terminating Pods with lower Priority so that Pods with higher Priority can schedule on Nodes.
When Pods are created, they go to a queue and wait to be scheduled. The scheduler picks a Pod from the queue and tries to schedule it on a Node. If no Node is found that satisfies all the specified requirements of the Pod, preemption logic is triggered for the pending Pod.
Stuff you wanna know:
- When Pods are preempted, the victims get their graceful termination period. They have that much time to finish their work and exit. If they don’t, they are killed.
- While the preemptor Pod is waiting for the victims to go away, a higher priority Pod may be created that fits on the same Node. In this case, the scheduler will schedule the higher priority Pod instead of the preemptor.
- A PodDisruptionBudget (PDB) allows application owners to limit the number of Pods of a replicated application that are down simultaneously from voluntary disruptions. Kubernetes supports PDB when preempting Pods, but respecting PDB is best effort.
- A Node is considered for preemption only when the answer to this question is yes: “If all the Pods with lower priority than the pending Pod are removed from the Node, can the pending Pod be scheduled on the Node?”
- Preemption does not necessarily remove all lower-priority Pods. If the pending Pod can be scheduled by removing fewer than all lower-priority Pods, then only a portion of the lower-priority Pods are removed.
- Preemption removes existing Pods from a cluster under resource pressure to make room for higher priority pending Pods. If you give high priorities to certain Pods by mistake, these unintentionally high priority Pods may cause preemption in your cluster.
- When a Pod is preempted, there will be events recorded for the preempted Pod.
- Preemption should happen only when a cluster does not have enough resources for a Pod. In such cases, preemption happens only when the priority of the pending Pod (preemptor) is higher than the victim Pods.
- Preemption must not happen when there is no pending Pod, or when the pending Pods have equal or lower priority than the victims.
- When there are multiple nodes available for preemption, the scheduler tries to choose the node with a set of Pods with lowest priority.
More stuff:
- Preemption — https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#preemption
- How to use Pod Priority and Preemption — https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#how-to-use-priority-and-preemption
- Non-preempting PriorityClass — https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#non-preempting-priority-class
- Pod priority — https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#pod-priority
- Preemption limitations — https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#limitations-of-preemption
- Preemption Troubleshooting — https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#troubleshooting
- Scheduling, Preemption, and Eviction — https://kubernetes.io/docs/concepts/scheduling-eviction/
- Pod Priority and Preemption in Kubernetes — https://kubernetes.io/blog/2019/04/16/pod-priority-and-preemption-in-kubernetes/
- Including pod priority in pod scheduling decisions (Red Hat OpenShift) — https://docs.openshift.com/container-platform/4.7/nodes/pods/nodes-pods-priority.html
- Pod Priority and Preemption (VMware Tanzu) — https://tanzu.vmware.com/developer/guides/workload-tenancy-priority-preemption/
- Setting Pod Priority (IBM Cloud) — https://cloud.ibm.com/docs/containers?topic=containers-pod_priority
- Get the most out of Google Kubernetes Engine with Priority and Preemption — https://cloud.google.com/blog/products/gcp/get-the-most-out-of-google-kubernetes-engine-with-priority-and-preemption
- Device Priority and Preemption (Palo Alto Networks) — https://docs.paloaltonetworks.com/cn-series/10-2/cn-series-deployment/secure-kubernetes-workloads-with-cn-series/deploy-the-cn-series-firewalls/high-availability-for-cn-series-firewall-on-aws/device-priority-and-preemption