The presentation discusses the reliability of running Cluster Autoscaler in production and provides insights on monitoring and debugging tools.
- Cluster Autoscaler's primary job is to ensure that all pods can schedule
- Metrics such as pending pod metrics are useful for monitoring Cluster Autoscaler's performance
- Cluster Autoscaler should be run on dedicated nodes or on the control plane VMs to prevent issues with scaling down
- Testing configurations before using them in production is recommended
- Ignoring certain flags can have significant side effects
- Auto scaling can vary significantly at scale and should be tested
The speaker, who has been part of the GKE team running thousands of instances of Cluster Autoscaler, recommends running it on dedicated nodes or on the control plane VMs to avoid issues with scaling down. He also warns against ignoring certain flags, such as the ignore 10 flag, which can have significant side effects. Testing configurations before using them in production is also recommended.