logo

Lightning Talk: A Component Registry for Kubeflow Pipelines

2022-06-23

Authors:   Christian Kadner


Summary

The Q4 Pipelines team proposes a new component registry to address problems with authoring, publishing, and maintaining components. The registry will have a unified YAML format, versioning and tagging capabilities, and direct integration with the Q4 Pipelines SDK. Third-party registries can also implement the server-side of the API. The Machine Learning Exchange is an example of a registry that is implementing the new protocol. It offers various asset types, including pipelines, components, models, data sets, and notebooks. Watson Studio Pipelines is also in open beta and provides a canvas for running experiments and integrating notebooks.
  • Q4 Pipelines proposes a new component registry to address problems with authoring, publishing, and maintaining components
  • The registry will have a unified YAML format, versioning and tagging capabilities, and direct integration with the Q4 Pipelines SDK
  • Third-party registries can also implement the server-side of the API
  • The Machine Learning Exchange is an example of a registry that is implementing the new protocol and offers various asset types
  • Watson Studio Pipelines is in open beta and provides a canvas for running experiments and integrating notebooks
The Machine Learning Exchange offers a user-friendly UI where users can search for pipelines, see details, and launch them directly from the platform. It also provides a similar experience for components, models, data sets, and notebooks. Users can even run notebooks as part of pipelines. Watson Studio Pipelines also offers a canvas for running experiments and integrating notebooks.

Abstract

Kubeflow Pipelines are widely used to orchestrate machine learning (ML) workflows on Kubernetes. Pipelines and individual pipeline stages are often worked on collaboratively. To facilitate that process Kubeflow Pipelines support re-usable components, self-contained sets of code that performs one step in the ML workflow, like data preprocessing, data transformation, model training, and model serving. There is a rich set of components from community and vendors. What has been missing from the ecosystem however, is a registry for sharing reusable components with the public or among teams of data scientists. Thus many of the common tasks required to run ML workflows on Kubernetes like creating secrets, persistent volume claims, config maps have to be implemented again and again. A component registry can provide a rich catalog of components to solve those common tasks and ease the burden of creating ML workflows on Kubernetes.

Materials: