Deep learning is pushing the limits of what AI can do: from natural language processing to computer vision and autonomous vehicles. Scaling deep learning to multiple GPUs and multiple machines has become critical to reduce training time and solve ever bigger problems. Horovod is a popular open source framework to distribute and scale the training of TensorFlow, PyTorch, and MXNet models. On the verge of the Horovod's v1.0 release, we look back at Horovod's journey and the lessons learned putting deep learning training in production; from its open source debut in 2017, to its presence in every DL ecosystem since joining the Linux Foundation. We will explain the motivations and key innovations that fueled the development of Horovod and achieved new records in deep learning performance benchmarks. Finally, we'll walk through practical examples to demonstrate how you can scale your models to train on hundreds of GPUs with Horovod, and explain how Horovod fits into production ML workflows running on diverse platforms such as Kubernetes, Spark, Ray, and Slurm.