The presentation discusses the development of data-centric AI and provides tips for its implementation, with a focus on unstructured data.
- Data-centric AI is becoming more widespread and systematic in its approach
- Consistent labeling of data is crucial for learning algorithms to work effectively
- Error analysis and engineering examples are important for structured data
- Data augmentation and noise examples can be useful for unstructured data
- Focusing on subsets of data can improve performance
The speaker discusses the importance of consistent labeling of data for learning algorithms to work effectively, using the example of visual defect inspection in manufacturing. Inconsistent labeling can lead to inaccurate results, even among human expert inspectors. By re-labeling images based on scratch length, the data set can be made more consistent and accurate.