The presentation discusses the challenges of working with big data matrices and how Apache Spark, Apache Mahout, Kubeflow, and Kubernetes can be used together to solve these challenges.
- Kubernetes allows for elastic scaling but has limitations when it comes to fitting large matrices in memory
- Apache Spark and Mahout can distribute matrices across an unbounded number of pods/nodes
- Kubeflow can be used to make the process easily reproducible
- The presentation provides an anecdote about using these tools to denoise DICOM images of lungs of COVID patients