Authors: Susan Zhang, Faisal Siddiqi, Bryan Catanzaro, Erhan Bas, Elliot Branson
Join this enterprise-focused, spirited discussion on how best to train, use, and fine-tune foundation models in the enterprise. Elliot Branson, Director of Machine Learning & Engineering, Scale AI, will moderate the panel with industry experts from AWS, NVIDIA, Netflix, and Meta.Erhan Bas, formerly Applied Scientist at Amazon Web Services and now at Scale, shares his perspective on training large language models (LLMs). Bryan Catanzaro, Vice President of Applied Deep Learning Research at NVIDIA, shares how the GPU manufacturer is targeting foundation models as a core workflow for enterprise customers. Faisal Siddiqi, Director of Machine Learning Platform at Netflix, will share how his company is using foundation models to analyze highly produced video content. Susan Zhang, Researcher at Facebook AI Research (FAIR), a division of Meta, will share insights from training and fine-tuning Meta’s OPT model.Members of the panel will share how they scale their training across multiple nodes, attempt to avoid overfitting by mitigating data quality issues early on, and address bias in models trained on a large internet-based text corpus. The panelists will discuss the compute cost inherent in training an LLM from scratch, how to avoid costly and tedious hyperparameter optimization, the need to mitigate training failure risk in clusters with thousands of GPUs, including sticking to synchronous gradient descent, and the need for extremely fast storage devices to save and load training checkpoints.