Supporting long-lived pods using a simple Kubernetes webhook
- Some applications like distributed caches and batch workers require a long lifespan
- Slack uses an admission webhook to inject tolerations in pods and a custom service taints nodes with their uptime to support long-lived pods
- The solution involves a two-sided system and a symbiotic node tainting system
- Limitations include lack of monitoring tools to measure success