logo

Unlimited Data Science Libraries, One Container Image, No Installation!

2022-05-18

Authors:   Marcel Hild, Kenneth Hoste


Summary

The presentation discusses the challenges faced by data scientists in a cloud-native environment and how Open Data Hub and Red Hat Open Shift Data Science can help overcome these challenges.
  • Data scientists face challenges in a cloud-native environment due to the lack of control over the environment and the need for specialized software
  • Open Data Hub and Red Hat Open Shift Data Science provide a best-of-breed distribution of common data science tools in a cloud-native context
  • The presentation includes a demo of how to identify dog breeds using Open Data Hub and how to add additional software to the mounted volume
  • Red Hat Open Shift Data Science can be consumed as a service on cloud.reddit.com and integrates with other vendors
The presenter, who has a background in HPC system administration, highlights the convergence of supercomputers on Linux and the influx of additional users from various scientific fields. He emphasizes the importance of performance in HPC and the need for easy-to-use systems for scientists to conduct their research.

Abstract

Kubernetes' agility, versatility, and resource scaling make it a platform of choice for data science, especially for shared environments. However, data scientists often need to work with lots of different libraries, languages, and applications, often with multiple versions. Conventional approaches, with a legion of tailored images or a huge 20GB golden image, do not match the reality of production. In this session, we will demonstrate how you can leverage the concept of environment modules inside Kubernetes to solve the challenges of synchronously managing multiple containers of different types, making thousands of scientific libraries, languages and packages dynamically available in a simple way. Inspired by work done and heavily used in the High Performance Computing (HPC) community, we will share a specific implementation that brings this production-proven architecture to Kubernetes and talk about how you can implement it in your own environment.Click here to view captioning/translation in the MeetingPlay platform!

Materials: