logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Suneeta Mall
2021-10-13

tldr - powered by Generative AI

The presentation discusses the investigation into a Kubernetes cluster where pods were getting OOMKilled with error code 137 and the steps taken to identify and mitigate the issue.
  • The investigation began when a new application was deployed onto a self-managed Kubernetes cluster and pods were getting OOMKilled with error code 137.
  • The investigation identified that the process was being killed repeatedly potentially because it was a memory hogger.
  • The investigation found that the process was being killed by the OS kernel and disabling the overcommitment was a temporary solution.
  • The actual fix was to reduce the memory footprint of the application and guarantee the resource quality of service and resource requirements on the pod.
  • The presentation also discussed the different levels of the container runtime and the role of Kubernetes in managing container processes on multiple hosts.