Lessons From Scheduling 20 Million Windows Containers a Month

Conference: KubeCon + CloudNativeCon North America 2022

2022-10-28

Authors: Julian Portillo

Summary

Challenges and considerations in migrating Windows workloads to Kubernetes

Migrating Windows workloads to Kubernetes requires paying back tech debt and adjusting architecture
Scaling up Windows containers can lead to long pull times and node failures
There is a lack of common open source tools for Windows containers
Performance testing and system design changes can help mitigate migration pains

The speaker's company attempted to package Windows processes into containers and run them on Windows nodes, but encountered common networking failures and container start issues due to the lack of visibility and open source tools for Windows containers. They also faced challenges with scaling up and pulling large amounts of data, leading to node failures. The speaker suggests testing assumptions and making system design changes to avoid these pains.

Abstract

Relativity schedules almost a million Windows containers per day to a globally distributed set of Kubernetes clusters. Two years ago we started to break apart our enterprise .NET monolith into microservices hosted on Kubernetes. At that time our developers had a multi-month release cadence. Now we have automated vulnerability patching, can do zero downtime migrations of workloads between clusters, have automated failover for critical services in the event of regional failures, and have have happy developers who can test and push to production immediately. How did we get here? By covering a rocky road full of issues. Come learn from our mistakes so you don't have to repeat them. We will talk about application and orchestration design patterns that have been successful for our teams, custom operators for Windows node problem identification that we have have built and found useful, and monitoring patterns that have helped us stay ahead of issues.

Materials:

Tags: