logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Alejandro Saucedo, Elena Neroslavskaya
2022-05-18

tldr - powered by Generative AI

The presentation covers machine learning acceleration at scale, optimization of models, deployment to Kubernetes, and introduction of production cloud native tooling.
  • Running ML server locally is important to ensure everything works and debug any issues before deployment to production.
  • Other resources for CI/CD for production machine learning at scale, production machine learning monitoring, machine learning security, and machine learning ecosystem and operations.
  • Collaboration with Hugging Face team to access a pre-trained GPT2 model using their Transformers library.
  • Optimization of the model using ONNX serialization format.
  • Deployment to Kubernetes cluster after testing locally to ensure it works.
  • Anecdote about a computationally intensive dungeon crawler game that uses AI model for personalization.