logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Adnan Hodzic
2023-04-19

This talk covers ING’s MLP (Machine Learning Platform) 2+ year migration journey to Kubernetes. ING being the biggest bank in the Netherlands and one of the biggest world banks entails we work in a highly regulated environment and are subjected to rigorous policies in terms of control with IT process lifecycle. Being a data scientist in one such environment, who would like to deploy pre-trained machine learning models to Production, without much or any underlying SRE/deployment knowledge complicates things. That’s where MLP (Machine Learning Platform) steps in, as it takes care of all the above mentioned problems by serving as a model hosting platform. As an SRE Adnan will cover problems and limitations of the existing platform setup in the VM (Virtual Machine) world and the inception of an idea to migrate to Kubernetes. Which steps it took to start the realization of one such idea and its migration plan. Followed by resistance, inability to choose the ideal target destination, platform’s growth and challenge in supporting the current setup in its growing capacity and ultimately leading to scalability issues. All these factors lead to a perfect storm, which led to the inevitable. Migration to Kubernetes and how that process came to be.
Conference:  CloudOpen 2022
Authors: Marcel Hild, Karsten Wade
2022-06-21

What are the benefits of running a project's code in an all-open source community cloud? What happens when a community of Site Reliability Engineering (SRE) practitioners decide to Open Source their craft? How does this Operate First concept help the nascent discipline of AIOps? There are many ways the Operate First concept can improve Open Source software development via operational insights. In this session you'll learn a few of those ways through stories and demonstrations. You'll see how the OS-Climate initiative has accelerated participation in the financial community via the Operate First community cloud. You'll explore the content and material from the SIG-SRE community that lets anyone see and learn how a real production clean is run. You'll get a look behind the scenes of the Operate First project's running OpenShift-based community cloud.
Authors: Jacob Valdemar Andreasen
2022-05-20

tldr - powered by Generative AI

The importance of contributing to open source projects and continuously learning in the field of site reliability engineering
  • Contributing to open source projects can be more than just coding, such as contributing to documentation or sharing knowledge with the community
  • Continuously learning and staying up to date with new technologies is crucial in the field of site reliability engineering
  • Joining conferences and engaging in internships can provide opportunities for learning and discovering new things
Authors: Kerim Satirli, Julie Gunderson
2022-05-19

Site Reliability Engineering (SRE) treats reliability as a software problem, but it really is an organizational problem that requires a different mindset. When the reliability of our service drops, so does our ability to create value for the organization we represent. In this talk, Julie and Kerim will take the audience on a guided journey, starting with how to determine if and how workloads are misbehaving and ending with practical approaches to improve reliability. Through simulated outages (of all types!), observability, and analysis, Julie and Kerim will show attendees how to catch and prepare for service disruptions. Going beyond deployments, attendees will also learn how to combine OpenTelemetry and OpenTracing to instill reliability into their systems.Click here to view captioning/translation in the MeetingPlay platform!