Panel: Building a Resilient MLOps Strategy Through Dataset Management

Conference:  Transform X 2021


Authors:   Chun Jiang, Alessya (Labzhinova) Visnjic, Adrian Macneil, Ville Tuulos, Elliot Branson


Importance of structured and quality data cataloging for machine learning in production
  • Structured and easily queryable location for data cataloging is important
  • Quality of data should be known to avoid wasting time on processing and feature processing
  • Catch regressions early by putting checks upstream in the build process
  • Lock device version for on-device logging
  • Record metadata for debugging purposes
  • Involve subject matter experts for debugging machine learning models
In medical devices, not recording the device can lead to surprises during deployment and loss of control over the resolution of x-rays captured. Properly trained personnel can affect the quality of the model in handheld ultrasound devices. Subject matter experts are important in debugging machine learning models.


Dataset debugging, versioning, and augmentation is essential to building successful ML pipelines and models. Even with the robust training, optimization of AI models deployed with well defined CI/CD operational pipelines, high-quality data remains an essential and critical part of the whole AI development process. Learn how different organizations collaborate to improve their datasets and debug errors in their data. See how doing so helps then unlock higher accuracies as well as other key benefits and efficiencies.


Post a comment

Related work

Conference:  Transform X 2021
Authors: Jack Guo, Anitha Vijayakumar, Vishnu Rachakonda, Oleg Avdeëv

Authors: Jakub Piotr Cłapa, Marcus Edel

Conference:  Transform X 2022
Authors: Mostafa Rohaninejad, Ariana Eisenstein, Louis Tremblay, Jack Guo, Russell Kaplan