Authors: Mostafa Rohaninejad, Ariana Eisenstein, Louis Tremblay, Jack Guo, Russell Kaplan
Machine learning leaders from robotics (Covariant), home automation (Resideo), autonomous delivery (Nuro), and warehouse automation (Pickle Robot) sit down with Russell Kaplan, Scale’s Director of Engineering, to share their approaches to dataset management. Pickle Robot CTO Ariana Eisenstein will share how she thinks about modulating quantities from different data sources like synthetic and public open datasets with real-world data for training datasets. Mostafa Rohaninejad, Founding Research Scientist at Covariant, will describe how the object “picking” problem requires synthetic data for unsafe scenarios and how he also incorporates structured and time-series data—supervised and unsupervised learning should go hand-in-hand. Jack Guo, Head of Perception at Nuro, will explain how it’s essential to have tools and mechanisms to automatically highlight recorded data that deviates from the norm, especially if it was captured in a new location. Like Rohaninejad, he will stress the importance of simulation as a component of successful reinforcement learning. Louis Tremblay, AI/ML Engineering Leader at Resideo, will explain how security cameras in the home represent an even more unbounded environment than do warehouses. The group will also discuss why maintaining separate datasets and training pipelines for different customers is both costly and incurs additional technical debt over time. Testing on fault-tolerant customers first before deploying to the wider fleet is also important. Scale’s Kaplan will share how, in his experience, when metrics and anecdotes seem at odds, it makes sense to re-think the metrics and establish new ones.