logo

Tower of Babel: Making Apache Spark, Kubeflow, and Kubernetes Play Nice

2022-05-18

Authors:   Holden Karau


Summary

The presentation discusses the challenges of working with big data matrices and how Apache Spark, Apache Mahout, Kubeflow, and Kubernetes can be used together to solve these challenges.
  • Kubernetes allows for elastic scaling but has limitations when it comes to fitting large matrices in memory
  • Apache Spark and Mahout can distribute matrices across an unbounded number of pods/nodes
  • Kubeflow can be used to make the process easily reproducible
  • The presentation provides an anecdote about using these tools to denoise DICOM images of lungs of COVID patients
The presentation discusses how the speaker and her co-author used Apache Spark and Mahout to denoise DICOM images of lungs of COVID patients and published their Pipeline with Kubeflow to make the process easily repeatable. This could help doctors in more resource-limited hospitals, as well as other researchers seeking to automate the detection of COVID.

Abstract

Working with big data matrices is challenging, Kubernetes allows users to elastically scale, but can only have a pod as large as a node, which may not be large enough to fit the matrix in memory. While Kubernetes allows for other paradigms on top of it which allows pods to coordinate on individual jobs, setting them up and making them play nice with ML platforms is not straightforward. Using Apache Spark and Apache Mahout we can work with matrices of any dimension and distribute them across an unbounded number of pods/nodes, and we can use Kubeflow to make our work quickly and easily reproducible. In this talk, we’ll discuss how we used Apache Spark and Mahout to denoise DICOM images of lungs of COVID patients and published our Pipeline with Kubeflow to make the process easily repeatable which could help doctors in more resource limited hospitals, as well as other researchers seeking to automate the detection of COVID.Click here to view captioning/translation in the MeetingPlay platform!

Materials:

Post a comment