logo

Scaling Apache Spark on Kube to Apple Scale

Authors:   Holden Karau, Amanda Moran


Summary

The talk discusses scaling machine learning with Apache Spark on Kubernetes, including considerations and best practices for end users and advice for those migrating from YARN with HDFS to Kubernetes. The talk covers how to effectively deploy new enhancements of Spark on Kube, like shuffle tracking and graceful decommissioning, as well as when not to use this.
  • Introduction of speakers and their backgrounds
  • Recap of Spark architecture
  • Confusion around when to use Spark for machine learning
  • Spark is a powerful tool for machine learning
  • Considerations and best practices for end users of Spark on Kubernetes
  • Advice for those migrating from YARN with HDFS to Kubernetes
  • Effective deployment of new enhancements of Spark on Kube
  • When not to use Spark for machine learning
The speakers mention that they have a confession to make - the presentation is a recording of their past selves. They encourage the audience to ask questions in the chat and assure them that they will be available to answer them.

Abstract

Amanda and Holden will explore the customer workloads that easily ported to Apache Spark on Kubernetes, and which ones had more difficulty. The goal of this talk is to help the audience in their journey as either the operators of an Apache Spark-Kubernetes platform or as an end user. Considerations and best practices for end users of an Apache Spark on Kubernetes platform will be discussed. Additional advice for folks migrating from YARN with HDFS to Kubernetes will be included. This talk will include how to effectively deploy the new enhancements of Spark on Kube, like shuffle tracking and graceful decommissioning, as well as when not to use this.

Materials:

Tags: