logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Arnaud Meukam, Davanum Srinivas
2023-04-21

tldr - powered by Generative AI

The presentation discusses the Kubernetes infrastructure project and its focus on cost optimization and multi-cloud approach to provide CI for the community.
  • The Kubernetes infrastructure project relies on donations from cloud providers such as GCP and AWS to bootstrap infrastructure.
  • The project is working on a multi-cloud approach to provide CI for the community and ensure compatibility and conformance with other projects in the CNCF landscape.
  • The project is also working with third parties such as Fastly to provide access to different services.
  • Contributor experience is handled by the SIG Contributor Experience, which has full ownership of moderation on different communication platforms.
  • The project is unable to directly talk to cloud providers and relies on the CNCF to interact with them.
Authors: Sanjay Pujare, Costin Manolache
2023-04-21

Kubernetes is suitable for stateless services and one of its benefits is seamless autoscaling of infrastructure in response to varying load. The resulting elasticity enables users to optimize their infrastructure and minimize costs. But what do you do if your application is stateful - for example it requires maintaining stateful sessions between clients and servers and you are using a service mesh? In this talk we will cover stateful applications where clients can create and maintain persistent sessions in the presence of load balancing where Istio routes individual RPCs to various backends. Client creates a persistent session and expects all RPCs in that session to go to a particular backend because the backend has the “state” related to that session and is not able to share the state with other backends. Note that consistent hashing or rendezvous load balancing don’t quite work because session persistence is broken when the set of backends changes. The feature uses HTTP Cookies where persistent or stateful sessions are achieved by communicating with the load balancer via cookies in Istio and proxyless gRPC. We also cover the use-case of “draining backends” where backends are gradually removed as part of downsizing the infrastructure but without breaking session persistence.
Authors: Shivanshu Raj Shrivastava
2023-04-20

tldr - powered by Generative AI

The presentation discusses the implementation of structured and contextual logging in Kubernetes to make logs more queryable and provide essential information about Kubernetes objects.
  • Structured logging with key-value pairs and references to Kubernetes objects
  • Contextual logging to retain context from parent to leaf and share information between different go routines
  • API changes required for implementation
  • Goal is not to remove Klog and Klog will remain in use
Authors: Jerome Kuptz, Ameen Radwan
2022-10-28

tldr - powered by Generative AI

Cello is a Cloud agnostic tool that abstracts the nitty-gritty parts of deployment away from developers, allowing them to use tools like Jenkins and GitHub to create their resources or application code. The tool is designed to support multiple Cloud providers, but currently only supports AWS.
  • Cello was developed to support Intuit's 6,000 engineers who traditionally picked their own deployment mechanisms.
  • The tool abstracts the credential provider and token rotation processes away from developers.
  • Developers interact with Cello through tooling within Jenkins and an onboarding UI, and mostly interact with GitHub and their code.
  • Cello's long-term plan is to support multiple Cloud providers, but currently only supports AWS.
  • Future plans for Cello include a user interface for easier deployment and operation processes.
Authors: Shweta Vohra
2022-10-28

tldr - powered by Generative AI

PKI and Certificate management are critical security measures for communicating over networks within or outside an infrastructure. The presentation covers the basics of certificate infrastructure, a case study, and 5 must-knows about certificate management.
  • PKI and certificate management are essential for secure communication over networks
  • Certificates for Kubernetes clusters are simple, but microservices and service mesh require more complex design and implementation
  • Certificate infrastructure involves trust establishment, certificate authority, registration authority, and verification authority
  • Design considerations include network proxy layer, TLS version mismatches, certificate revocation methods, certificate automation and monitoring
  • Tools like Spiffe/Spire and Grafana can be used for automation, monitoring, and analysis
Authors: Keith, Keshi Dai, Yuzhui Liu, Ed Shee
2022-10-27

Managing a machine learning infrastructure is a great challenge, as its scope covers both common infrastructure tasks – such as cluster management, network, security, container management, and observability – and ML-focused tasks – such as GPU compute, data exploration, distributed training, and model serving. Kubernetes and its prosperous open source ecosystem provides great infrastructure tools (e.g., Knative, Cloud Native Buildpacks, Argo, and Envoy), as well as ML-focused projects (e.g., Kubeflow, KServe, Seldon Core, and KubeRay) that enable infrastructure engineers to build a modern machine learning infrastructure. In this panel, you’ll hear from engineers at Bloomberg, Seldon, and Spotify about how they’re using the Kubernetes ecosystem to provide machine learning infrastructure and their current challenges. Panelists represent a variety of use cases, including end-users and infrastructure providers, as well as both on-prem and cloud-based infrastructures.
Authors: Kenneth DuMez
2022-10-24

This talk will focus on the problems of credentials for machines in moderninfrastructure and why it’s imperative you treat your bots the same way you treatyour humans. Typically when using automation for CI/CD or Microservices, teamswill have vaulted credentials shared between worker nodes. This introduceschallenges as these credentials are often long-lived, requiring frequent rotation,introducing both toil and security threats. Open-source Teleport Machine ID mitigatesthese problems by assigning a unique identity with attached RBAC roles baked intounique, short-lived certificates enabling bot users to connect to remote hosts whilecentrally audit-logging all of the machine’s activity. This identity-based access controlplane works seamlessly with all your cloud infrastructure including K8s clusters,databases, and any other remote compute resource. The talk will include anassessment of current legacy automated access solutions, an overview of Teleport,a Machine ID demo, and an in-depth discussion of the technology behind it. Withopen-source Teleport, managing and rotating shared credentials is a thing of thepast. Give the machines rights! Secure your infrastructure.
Authors: Lukáš Hejtmánek, Viktória Spišaková
2021-10-13

tldr - powered by Generative AI

Challenges and solutions in implementing Kubernetes infrastructure and moving scientific computing to containers in the academic environment
  • Introduction of efforts made at CERIT-SC/Institute of Computer Science of Masaryk University to implement Kubernetes infrastructure and move scientific computing to containers
  • Challenges of multi-tenancy assurance, deploying applications under users, resource sharing, and building trust towards containerization among the research community
  • Several created solutions, presentation of European open-source projects, and demonstration of how containers help in the academic environment
  • Other issues faced and proposed ideas on new features
Conference:  Transform X 2021
Authors: Stephen Balaban
2021-10-07

In this session, Stephen Balaban, CEO of Lambda, shares a playbook for standing up machine learning infrastructure. This session is intended for any organization that has to scale up its infrastructure to support growing teams of Machine Learning (ML) practitioners. Stephen describes how large ML models are often built with on-premise infrastructure. He explores the pros and cons of this approach and how the workstations, servers, and other related resources could be scaled up to support larger workloads or numbers of users. How do you scale from a single workstation to a number of shared or dedicated servers for each ML practitioner? How can you use a single software stack across laptops, servers, clusters, and the cloud? What are the network, storage, and power considerations of each step in that journey? Join this session to hear some best practices for scaling up your machine learning platform, to serve the growing needs of your organization.