Building a Multi Cluster/Env Service Mesh at Airbnb


Authors:   Stephen Chan, Weibo He


Airbnb's experience of building a multi-cluster/multi-environment service mesh on top of Istio
  • Airbnb migrated from monolith architecture to SOA and majority of workloads from EC2 to Kubernetes
  • Legacy in-house service mesh no longer met their needs
  • Adopted Istio as the foundation for their next generation service mesh
  • Established confidence in Istio and started full speed migration
  • Multi-cluster requirement led to adoption of external control plane and flat network model
  • Multi-environment support includes multi-tier mesh, mesh expansion, and external services
Airbnb faced scalability issues in their Kubernetes usage and made the decision to horizontally scale out by distributing IPs. They also leveraged a new VPC feature called prefix delegation to reduce their mapping usage. They adopted the external control plane and flat network model for Istio deployments, which provided better security, isolation from data plane workloads, and easier operation of Istio upgrades. They also followed a multi-tier concept to minimize the blast radius of changes and ran automated functional tests on their sandbox tier to verify mesh features they depend on in production.


Tutorials and demos are great, but how do real organizations implement service meshes at scale? In this talk, we will discuss some of the problems Airbnb is solving with their service mesh based on Istio. Make sure you attend if you’re interested in building out a service mesh at your own company and interested in ways to adapt to your own requirements. We will walk through: - Partitioning workloads across multiple clusters and how to manage the mesh. - Testing mesh upgrades reliably with multiple environments. - Expanding the mesh to legacy, non-container workloads. - Routing traffic between regions, not just clusters, securely.


Post a comment

Related work