logo

How Scale Uses ML to Improve Label Quality and Labeler Productivity

Conference:  Transform X 2021

2021-10-07

Authors:   Aerin Kim


Summary

ML linters and other mechanisms enhance labeler productivity when labeling complex images and scenes, resulting in higher quality data for customers.
  • Quality is important in ML and affects precision, recall, and IOU.
  • Scale AI published four papers this year, including a dataset on Fitzpatrick skin type and a Reddit comment and reply dataset.
  • Scale AI's 3D annotation platform and ML-powered linters catch incorrect annotations.
  • ML linters and other mechanisms improve labeler productivity and result in higher quality data for customers.
One example of the qualitative study involved setting thresholds for the lender's sensitivity to avoid false positives. The team ran both quantitative and qualitative experiments to set two thresholds: the model prediction score and the jitter threshold in the XY plane. They plotted multiple curves with different thresholds and observed that the curves were fairly stable to jitter within one meter range. They set the jitter threshold as one meter. Another example involved using lighter data to catch missing poles, which were initially flagged as false negatives. The linter was able to detect the missing poles and improve the annotation quality.

Abstract

Engineering leader, Aerin Kim, will present a brief summary of ongoing ML research at Scale AI, then deep dive into ML linters and other mechanisms that reliably enhance labeler productivity when labeling complex images and scenes. Aerin will showcase complex scenarios like 3D LiDAR bounding box classification as well as 2D semantic segmentation. When assisted by an ML model, labelers typically generate higher quality data for our customers than they might consistently do on their own.

Materials:

Post a comment

Related work