The presentation discusses the design principles and architecture of a cloud-native Spark on Kubernetes platform, highlighting the benefits of cloud and Kubernetes and the need for auto-scaling based on cost-saving and elasticity.
- Cloud and Kubernetes can solve problems of legacy infrastructure by providing on-demand, elastic, and scalable resources with strong resource isolation and cutting-edge security techniques.
- Design principles include fully embracing public cloud and cognitive way of thinking, containerization for elasticity and reproducibility, and decoupling compute and storage for independent scaling.
- The architecture of the cloud-native Spark on Kubernetes platform involves multiple Spark Kubernetes clusters, a Spark service gateway, and a multi-tenant platform with advanced features such as physical isolation and min/max capacity setting.
- Auto-scaling is necessary for cost-saving and elasticity, and the presentation discusses the design of reactive auto-scaling and its productionization.
- The platform has been running in production for a year, supporting many business-critical workloads for Apple AML.
The presenter mentions that in the cloud-native world, upgrading infrastructure no longer requires risky in-place upgrades. Instead, a completely new environment can be spun up and traffic gradually rolled over, with the ability to switch back instantaneously if there are any issues. This flexibility is a huge win for DevOps.