logo

Kubernetes and Checkpoint Restore

2021-10-14

Authors:   Adrian Reber


Summary

The presentation discusses the implementation of forensic container checkpointing in Kubernetes and other container engines, allowing for the analysis of containers without stopping them. The implementation involves taking a checkpoint of a running container and analyzing it in a sandbox environment.
  • Forensic container checkpointing allows for the analysis of containers without stopping them
  • The implementation involves taking a checkpoint of a running container and analyzing it in a sandbox environment
  • The checkpoint archive is only readable by root to ensure security
  • The use cases for forensic container checkpointing include reboot and save state, quick startup, and analyzing containers for potential issues
The presenter shared an anecdote about a company that uses privileged containers in Kubernetes to start a pre-initialized Matlab container for customers, as Matlab takes a long time to start up. With forensic container checkpointing, the company could take a checkpoint of the pre-initialized container and start it up quickly for customers without the need for privileged containers.

Abstract

Over 6 years ago a ticket (#3949) was opened asking for Pod migration in Kubernetes and until now there is no support in Kubernetes to migrate a container. Container migration is based on checkpointing and restoring containers and checkpointing and restoring containers is one the main reasons Checkpoint/Restore in User-Space (CRIU) exists. Although container migration is always viewed as an outlier or corner case of containers, because containers are supposed to be stateless, CRIU continues to get better at container migration and even if containers are supposed to be stateless, CRIU still sees growing interest in its container migration features and especially the integration in container runtimes. This talk wants to present the multiple use cases for checkpointing and restoring containers. The talk wants to give a technical background how CRIU is enabling container runtimes to checkpoint and restore containers and the plan how to integrate checkpoint and restore into Kubernetes.

Materials: