GitLab.com migrated to Kubernetes using multiple clusters to save costs and improve network traffic control.
- GitLab.com needed to move from virtual machines to Kubernetes as they grew past 10 million projects hosted
- They used GKE to migrate stateless services and split regional GKE clusters into multiple zonal clusters for better network traffic control
- Multiple clusters allowed for more efficient maintenance procedures, testing cluster configurations, and mitigating incidents
- The solution may not work for everyone and network egress problems can occur in other workloads outside of Kubernetes
To mitigate NAT port exhaustion, GitLab.com modified their CI pipelines to deploy to one cluster at a time instead of in parallel. They also used the ability to test cluster configurations to gain more efficiency by lowering the number of pods for a particular workload. Additionally, they created a maintenance procedure to remove traffic from entire clusters to deploy fixes without causing incidents.
As we grew past 10 million projects hosted on GitLab.com, it was clear that we needed to move from our fleet of chef-managed virtual machines to Kubernetes. Using GKE, migration started with stateless services like the GitLab Container Registry, Background processing, and Git requests. With over 100 Terabyte of daily Git data, regional GKE clusters provide the convenience of spanning multiple availability zones for redundancy, but cross availability zone egress was a concern. Splitting the regional GKE cluster into multiple zonal clusters for services that use a lot of bandwidth gave much more control over cross availability zone network traffic. In this talk, you will learn more about our journey and efforts with how we are shifting traffic to the new clusters.