The presentation discusses the challenges of working with big data matrices and how Apache Spark, Apache Mahout, Kubeflow, and Kubernetes can be used together to solve these challenges.
- Kubernetes allows for elastic scaling but has limitations when it comes to fitting large matrices in memory
- Apache Spark and Mahout can distribute matrices across an unbounded number of pods/nodes
- Kubeflow can be used to make the process easily reproducible
- The presentation provides an anecdote about using these tools to denoise DICOM images of lungs of COVID patients
The presentation discusses how the speaker and her co-author used Apache Spark and Mahout to denoise DICOM images of lungs of COVID patients and published their Pipeline with Kubeflow to make the process easily repeatable. This could help doctors in more resource-limited hospitals, as well as other researchers seeking to automate the detection of COVID.