The presentation discusses disaster recovery of stateful applications in a multi-cluster environment using replication capable storage systems like Ceph/Rook.
- Disaster recovery is important to ensure business continuity in case of data center loss.
- Regional disaster recovery involves two separate remote sites with high network latency and two separate Kubernetes clusters.
- Replication capable storage systems like Ceph/Rook can be leveraged to provide disaster recovery of workloads across clusters.
- A multi-cluster control plane is required to enable one-click disaster recovery solution for stateful workloads.
- Volume replication and volume application class are added to the standard CSI API to enhance capabilities.
- Dynamic provisioning requires creating a matching PV in the recovery site and connecting it to the replicated volume.
- Multi-cluster management requires equivalent cluster configurations and deployment of custom resources and operators on all clusters.