logo

Panel: Building a Resilient MLOps Strategy Through Dataset Management

Conference:  Transform X 2021

2021-10-07

Authors:   Chun Jiang, Alessya (Labzhinova) Visnjic, Adrian Macneil, Ville Tuulos, Elliot Branson


Summary

Importance of structured and quality data cataloging for machine learning in production
  • Structured and easily queryable location for data cataloging is important
  • Quality of data should be known to avoid wasting time on processing and feature processing
  • Catch regressions early by putting checks upstream in the build process
  • Lock device version for on-device logging
  • Record metadata for debugging purposes
  • Involve subject matter experts for debugging machine learning models
In medical devices, not recording the device can lead to surprises during deployment and loss of control over the resolution of x-rays captured. Properly trained personnel can affect the quality of the model in handheld ultrasound devices. Subject matter experts are important in debugging machine learning models.

Abstract

Dataset debugging, versioning, and augmentation is essential to building successful ML pipelines and models. Even with the robust training, optimization of AI models deployed with well defined CI/CD operational pipelines, high-quality data remains an essential and critical part of the whole AI development process. Learn how different organizations collaborate to improve their datasets and debug errors in their data. See how doing so helps then unlock higher accuracies as well as other key benefits and efficiencies.

Materials:

Post a comment

Related work

Conference:  Transform X 2021
Authors: Jack Guo, Anitha Vijayakumar, Vishnu Rachakonda, Oleg Avdeëv
2021-10-07

Conference:  Transform X 2022
Authors: Mostafa Rohaninejad, Ariana Eisenstein, Louis Tremblay, Jack Guo, Russell Kaplan
2022-10-19

Authors: Jakub Piotr Cłapa, Marcus Edel
2022-06-23


Conference:  Transform X 2022
Authors: Susan Zhang, Faisal Siddiqi, Bryan Catanzaro, Erhan Bas, Elliot Branson
2022-10-19