logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Jihye Choi
2022-10-28

tldr - powered by Generative AI

The conference presentation discusses two technologies, Mig and GPUdirect RDMA, for efficient use of GPU resources in AI and HPC tasks. Mig allows for splitting one unit of GPU into multiple instances, while GPUdirect RDMA enables efficient distributed processing. The presentation includes a POC result for each technology and highlights some points to consider for Kubernetes testing.
  • Mig technology allows for efficient use of GPU resources by splitting one unit of GPU into multiple instances
  • GPUdirect RDMA enables efficient distributed processing for deep learning tasks
  • POC results show that Mig technology is suitable for model development and inference tasks, while GPUdirect RDMA is suitable for larger scale tasks
  • Points to consider for Kubernetes testing are discussed in the presentation
Conference:  Transform X 2022
Authors: Susan Zhang, Faisal Siddiqi, Bryan Catanzaro, Erhan Bas, Elliot Branson
2022-10-19

Join this enterprise-focused, spirited discussion on how best to train, use, and fine-tune foundation models in the enterprise. Elliot Branson, Director of Machine Learning & Engineering, Scale AI, will moderate the panel with industry experts from AWS, NVIDIA, Netflix, and Meta.Erhan Bas, formerly Applied Scientist at Amazon Web Services and now at Scale, shares his perspective on training large language models (LLMs). Bryan Catanzaro, Vice President of Applied Deep Learning Research at NVIDIA, shares how the GPU manufacturer is targeting foundation models as a core workflow for enterprise customers. Faisal Siddiqi, Director of Machine Learning Platform at Netflix, will share how his company is using foundation models to analyze highly produced video content. Susan Zhang, Researcher at Facebook AI Research (FAIR), a division of Meta, will share insights from training and fine-tuning Meta’s OPT model.Members of the panel will share how they scale their training across multiple nodes, attempt to avoid overfitting by mitigating data quality issues early on, and address bias in models trained on a large internet-based text corpus. The panelists will discuss the compute cost inherent in training an LLM from scratch, how to avoid costly and tedious hyperparameter optimization, the need to mitigate training failure risk in clusters with thousands of GPUs, including sticking to synchronous gradient descent, and the need for extremely fast storage devices to save and load training checkpoints.
Authors: Jose Navarro, Prayana Galih
2022-05-18

The adoption of MLOps practices and tooling by organizations has considerably reduced the pain points to productionise Machine Learning models. However, with the increase of the number of models available by a company to deploy, the diversity of frameworks used to train those models and the different infrastructure required to run each model, new challenges arise for Machine Learning Platform teams e.g: How can we deploy new models from the same or different frameworks concurrently? How can we improve throughput and optimize resource utilization in our serving infrastructure, especially GPUs? Cookpad ML Platform Engineers will talk in this session how Triton Inference Server, an open-source model serving tool from Nvidia, can simplify the process of model deployment and optimise the resource utilisation by efficiently supporting concurrent models on single GPU or CPU, and multi-GPU servers.Click here to view captioning/translation in the MeetingPlay platform!