logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Diana Atanasova, Julius von Kohout
2023-04-21

tldr - powered by Generative AI

The presentation discusses security issues in Kubeflow pipelines and proposes solutions to address them.
  • Rootless containers can solve the issue of containers running as root, but there is a limitation in building OCI containers rootless
  • The controllers in Kubeflow pipelines run as cluster admin, which is a security risk
  • Namespace sharing can also be a security risk as collaborators gain access to service accounts
  • Solutions proposed include reducing the complexity of controllers and using reduced cluster roles
  • The presentation highlights the progress made in Kubeflow security, such as authentication and machine-to-machine authentication
Authors: Maciej Mazur, Andreea Munteanu
2023-04-20

tldr - powered by Generative AI

The presentation discusses the use of secure MLOps in the life science industry, with a focus on protecting patient privacy and complying with industry standards.
  • Tokenization is used to protect patient privacy by changing personally identifiable information to a token based on a hardware security key.
  • Strict confinement features of micro-kubernetes distribution are used to ensure tamper-proof tokenization.
  • Confidential computing is used to expand local Kubernetes clusters in a safe way by creating a VM on a public cloud and utilizing open enclave and open source projects to configure the confidential compute and underlying hardware features.
  • The benefits of using public clouds for research use cases are discussed, including the ability to spike up capacity when training a bigger model.
  • The presentation emphasizes the importance of using secure MLOps to comply with industry standards and protect patient privacy.
Authors: Jihye Choi
2022-10-28

tldr - powered by Generative AI

The conference presentation discusses two technologies, Mig and GPUdirect RDMA, for efficient use of GPU resources in AI and HPC tasks. Mig allows for splitting one unit of GPU into multiple instances, while GPUdirect RDMA enables efficient distributed processing. The presentation includes a POC result for each technology and highlights some points to consider for Kubernetes testing.
  • Mig technology allows for efficient use of GPU resources by splitting one unit of GPU into multiple instances
  • GPUdirect RDMA enables efficient distributed processing for deep learning tasks
  • POC results show that Mig technology is suitable for model development and inference tasks, while GPUdirect RDMA is suitable for larger scale tasks
  • Points to consider for Kubernetes testing are discussed in the presentation
Authors: Erik Jacobs
2022-10-28

tldr - powered by Generative AI

The presentation discusses the use of Kubernetes for running HPC workloads, specifically using OpenFOAM as an example. The speaker emphasizes the importance of tuning and optimizing the instance types and pods used for the job. They also mention potential future developments, such as using Nvidia GPUs and exploring new schedulers.
  • Kubernetes can be used for running HPC workloads, but tuning and optimization are crucial
  • OpenFOAM was used as an example of an MPI job that can be run on Kubernetes
  • Future developments include using Nvidia GPUs and exploring new schedulers
Authors: Charles Adetiloye, Keith Mattix
2022-06-23

tldr - powered by Generative AI

Kubeflow Metal is a new way of deploying Kubeflow onto a Kubernetes cluster on bare metal servers, providing a low friction, high velocity way to deploy an ML platform in an easy, experimental on-prem environment.
  • Kubeflow Metal is a terraform module that deploys Kubeflow on a Kubernetes cluster on bare metal servers
  • It is a cheaper alternative to cloud infrastructure with a fixed cost
  • It allows for quick bootstrapping of an ML environment or infrastructure for a team
  • Deployment is elastic and easily scalable
  • It can be used for plugging into a CI/CD process
  • It is useful for cases where data cannot be moved to the cloud, such as financial or insurance data
  • Kubeflow Metal is looking for people to help improve the project
Authors: Christian Kadner
2022-06-23

tldr - powered by Generative AI

The Q4 Pipelines team proposes a new component registry to address problems with authoring, publishing, and maintaining components. The registry will have a unified YAML format, versioning and tagging capabilities, and direct integration with the Q4 Pipelines SDK. Third-party registries can also implement the server-side of the API. The Machine Learning Exchange is an example of a registry that is implementing the new protocol. It offers various asset types, including pipelines, components, models, data sets, and notebooks. Watson Studio Pipelines is also in open beta and provides a canvas for running experiments and integrating notebooks.
  • Q4 Pipelines proposes a new component registry to address problems with authoring, publishing, and maintaining components
  • The registry will have a unified YAML format, versioning and tagging capabilities, and direct integration with the Q4 Pipelines SDK
  • Third-party registries can also implement the server-side of the API
  • The Machine Learning Exchange is an example of a registry that is implementing the new protocol and offers various asset types
  • Watson Studio Pipelines is in open beta and provides a canvas for running experiments and integrating notebooks
Authors: Dejan Golubovic, Daniel Holmberg
2022-05-19

tldr - powered by Generative AI

Machine learning can improve results in studying subatomic particles, and Kubeflow can help run machine learning workloads.
  • Using machine learning can improve results in studying subatomic particles, as demonstrated by the jet energy regression example
  • Kubeflow can help run machine learning workloads
  • Challenges in implementing the demo included finding the correct version of the Triton server image and customizing TensorBoard
  • Possible improvements include profile replication across multiple clusters, making pipelines namespace, and adding limit range resources to profiles
Authors: Holden Karau
2022-05-18

tldr - powered by Generative AI

The presentation discusses the challenges of working with big data matrices and how Apache Spark, Apache Mahout, Kubeflow, and Kubernetes can be used together to solve these challenges.
  • Kubernetes allows for elastic scaling but has limitations when it comes to fitting large matrices in memory
  • Apache Spark and Mahout can distribute matrices across an unbounded number of pods/nodes
  • Kubeflow can be used to make the process easily reproducible
  • The presentation provides an anecdote about using these tools to denoise DICOM images of lungs of COVID patients
Authors: Keshi Dai, Jonathan Jin
2021-10-15

tldr - powered by Generative AI

Improving observability and reliability in a multi-cluster environment through infrastructure as code and custom metrics
  • Investing in observability and reliability preemptively before experiencing issues
  • Using infrastructure as code, specifically Terraform and Argo CD, to manage multi-cluster deployments and ensure consistency
  • Creating custom metrics, such as Kubeflow state metrics, to track specific product needs and enable effective SLOs and alerts