Panel: Building a Resilient MLOps Strategy Through Dataset Management

Conference: Transform X 2021

2021-10-07

Authors: Chun Jiang, Alessya (Labzhinova) Visnjic, Adrian Macneil, Ville Tuulos, Elliot Branson

Summary

Importance of structured and quality data cataloging for machine learning in production

Structured and easily queryable location for data cataloging is important
Quality of data should be known to avoid wasting time on processing and feature processing
Catch regressions early by putting checks upstream in the build process
Lock device version for on-device logging
Record metadata for debugging purposes
Involve subject matter experts for debugging machine learning models

In medical devices, not recording the device can lead to surprises during deployment and loss of control over the resolution of x-rays captured. Properly trained personnel can affect the quality of the model in handheld ultrasound devices. Subject matter experts are important in debugging machine learning models.

Abstract

Dataset debugging, versioning, and augmentation is essential to building successful ML pipelines and models. Even with the robust training, optimization of AI models deployed with well defined CI/CD operational pipelines, high-quality data remains an essential and critical part of the whole AI development process. Learn how different organizations collaborate to improve their datasets and debug errors in their data. See how doing so helps then unlock higher accuracies as well as other key benefits and efficiencies.

Materials:

Tags: