The talk discusses scaling machine learning with Apache Spark on Kubernetes, including considerations and best practices for end users and advice for those migrating from YARN with HDFS to Kubernetes. The talk covers how to effectively deploy new enhancements of Spark on Kube, like shuffle tracking and graceful decommissioning, as well as when not to use this.
- Introduction of speakers and their backgrounds
- Recap of Spark architecture
- Confusion around when to use Spark for machine learning
- Spark is a powerful tool for machine learning
- Considerations and best practices for end users of Spark on Kubernetes
- Advice for those migrating from YARN with HDFS to Kubernetes
- Effective deployment of new enhancements of Spark on Kube
- When not to use Spark for machine learning
The speakers mention that they have a confession to make - the presentation is a recording of their past selves. They encourage the audience to ask questions in the chat and assure them that they will be available to answer them.