Sort by:  

Authors: Diana Atanasova, Julius von Kohout

tldr - powered by Generative AI

The presentation discusses security issues in Kubeflow pipelines and proposes solutions to address them.
  • Rootless containers can solve the issue of containers running as root, but there is a limitation in building OCI containers rootless
  • The controllers in Kubeflow pipelines run as cluster admin, which is a security risk
  • Namespace sharing can also be a security risk as collaborators gain access to service accounts
  • Solutions proposed include reducing the complexity of controllers and using reduced cluster roles
  • The presentation highlights the progress made in Kubeflow security, such as authentication and machine-to-machine authentication
Authors: Diogo Guerra, Diana Gaponcic

tldr - powered by Generative AI

The presentation discusses GPU utilization and benchmarking, focusing on time slicing and Mig, and provides insights on their use cases and performance trade-offs.
  • Time slicing is useful for low priority jobs with idle time, but not suitable for latency-sensitive or performance-intensive tasks.
  • Mig enables GPU sharing but comes with a performance loss due to the reduction in streaming multiprocessors.
  • Benchmarking shows that time slicing incurs a significant performance loss when contact switching is required for long-running processes.
  • Doubling memory and bandwidth through Mig can improve performance, but losing Mig without sharing the GPU results in a performance loss for no reason.
  • Monitoring pipeline utilization can help understand user jobs and optimize GPU usage.
Authors: Alejandro Saucedo

tldr - powered by Generative AI

The presentation discusses the need for collaboration and standardization in metadata operations for end-to-end data and machine learning platforms.
  • The goal is to achieve end-to-end interoperability at scale through collaboration and standardization.
  • Practitioners at every stage of the MLOps and DataOps lifecycle should collaborate to come up with standards.
  • The creation of bad standards is worse than having no standards at all.
  • Standardization should focus on interfaces, metrics, and operational considerations.
  • Tools like ml server, seldom core, and kubernetes can help abstract data science from operations.
Authors: Keith, Keshi Dai, Yuzhui Liu, Ed Shee

Managing a machine learning infrastructure is a great challenge, as its scope covers both common infrastructure tasks – such as cluster management, network, security, container management, and observability – and ML-focused tasks – such as GPU compute, data exploration, distributed training, and model serving. Kubernetes and its prosperous open source ecosystem provides great infrastructure tools (e.g., Knative, Cloud Native Buildpacks, Argo, and Envoy), as well as ML-focused projects (e.g., Kubeflow, KServe, Seldon Core, and KubeRay) that enable infrastructure engineers to build a modern machine learning infrastructure. In this panel, you’ll hear from engineers at Bloomberg, Seldon, and Spotify about how they’re using the Kubernetes ecosystem to provide machine learning infrastructure and their current challenges. Panelists represent a variety of use cases, including end-users and infrastructure providers, as well as both on-prem and cloud-based infrastructures.
Conference:  Transform X 2022
Authors: Daphne Koller

tldr - powered by Generative AI

Daphne Koller discusses how Insitro is using machine learning models to predict the outcome of drug development experiments and design novel, safe, and effective therapies.
  • Drug development is becoming more challenging and expensive due to the high failure rate of experiments.
  • Insitro is using high-quality data and machine learning models to predict the outcome of experiments and design novel therapies.
  • The focus is on learning meaningful representations of clinical state using self-supervised machine learning models.
  • Insitro has partnered with pharmaceutical companies to access data on liver disease and used machine learning to predict patient progression.
  • The ultimate goal is to develop a new approach to drug development that helps more people faster and at a lower cost.
Conference:  Transform X 2022
Authors: Mostafa Rohaninejad, Ariana Eisenstein, Louis Tremblay, Jack Guo, Russell Kaplan

Machine learning leaders from robotics (Covariant), home automation (Resideo), autonomous delivery (Nuro), and warehouse automation (Pickle Robot) sit down with Russell Kaplan, Scale’s Director of Engineering, to share their approaches to dataset management. Pickle Robot CTO Ariana Eisenstein will share how she thinks about modulating quantities from different data sources like synthetic and public open datasets with real-world data for training datasets. Mostafa Rohaninejad, Founding Research Scientist at Covariant, will describe how the object “picking” problem requires synthetic data for unsafe scenarios and how he also incorporates structured and time-series data—supervised and unsupervised learning should go hand-in-hand. Jack Guo, Head of Perception at Nuro, will explain how it’s essential to have tools and mechanisms to automatically highlight recorded data that deviates from the norm, especially if it was captured in a new location. Like Rohaninejad, he will stress the importance of simulation as a component of successful reinforcement learning. Louis Tremblay, AI/ML Engineering Leader at Resideo, will explain how security cameras in the home represent an even more unbounded environment than do warehouses. The group will also discuss why maintaining separate datasets and training pipelines for different customers is both costly and incurs additional technical debt over time. Testing on fault-tolerant customers first before deploying to the wider fleet is also important. Scale’s Kaplan will share how, in his experience, when metrics and anecdotes seem at odds, it makes sense to re-think the metrics and establish new ones.
Conference:  Transform X 2022
Authors: Thomas Kurian, Alexandr Wang

Thomas Kurian, the CEO of Google Cloud, will join Alexandr Wang, CEO and Founder of Scale, to discuss how AI helps businesses across various industries and use cases. Google Cloud is well-known for developer adoption, helping machine learning teams to create production-grade machine learning models. With platforms like Vertex AI and TensorFlow, Google boasts the most popular machine learning platforms adopted by over 3 million developers globally. Google also has succeeded with the widespread adoption of machine learning capabilities in its consumer and business products, including Gmail smart replies and predictive search.Kurian will advise how to best roll out machine learning capabilities to many customers and ensure they are widely adopted. He will also discuss that, with the advent of foundation models, now is the time for all industries to more broadly adopt AI or risk falling behind the competition. He will detail practical use cases for retail, logistics, manufacturing, and healthcare. Kurian and Wang will also discuss the future of machine learning and what it will take to get there.Before Google, Kurian spent 22 years at Oracle; his nearly 30 years of experience have given him a deep knowledge of engineering, enterprise relationships, and leadership of large organizations. Throughout his career, he has demonstrated a unique capability to align the latest technological developments, including machine learning, with real business problems to provide practical solutions to customers.
Conference:  Transform X 2022
Authors: Dan Shiebler

Abnormal Security builds ML products that help protect systems against cyber attacks. Dan Schiebler, Head of Machine Learning at Abnormal Security, discusses best practices for building cybercrime detection algorithms. In this session, Schiebler covesr how to design, monitor, and launch resilient ML systems and how to train ML models on production issues. He talks about the different types of problems that production ML systems can encounter, including features that become unavailable because of upstream data issues, distribution changes, or features that become stale. Schiebler addresses the different types of iteration loops in most companies—online vs offline—and how that plays into testing and training, as well as the company’s ablity to tolerate risk. Historical logs and data also play a key role. Before joining Abnormal, Schiebler worked at Twitter: first as an ML Researcher working on recommendation systems, and then as the Head of Web Ads Machine Learning. Before Twitter, he built smartphone sensor algorithms at TrueMotion.
Conference:  Transform X 2022
Authors: Dr. Will Roper

tldr - powered by Generative AI

Digital engineering is a transformative capability for hardware that enables fully digital design, testing, and certification of systems without physically building them.
  • Digital engineering is doing engineering digitally with the help of computer models and similar technology.
  • It is possible to fully digitally design, test, and certify systems without physically building them.
  • Computer models are capable of creating physics and structures to a high degree of reliability.
  • Digital engineering solves the problems of expensive, slow, and environmentally impactful physical prototyping.
  • The McLaren racing team's approach to digital engineering is an inspiring level of digital agility.
  • Digital engineering is a transformative capability for hardware that enables an agile approach to hardware development.
Conference:  Transform X 2022
Authors: Emad Mostaque, Alexandr Wang

tldr - powered by Generative AI

The speaker discusses the democratization of AI and the importance of diversity in data sets to ensure aligned artificial intelligence. They argue for the need to build smaller, more democratized models that impact a broader set of people and allow for adaptation to social issues. The speaker also emphasizes the importance of transparency in the development of AI models and the need for human feedback in reinforcement learning.
  • AI democratization and diversity in data sets are crucial for aligned artificial intelligence
  • Smaller, more democratized models are needed to impact a broader set of people and adapt to social issues
  • Transparency in AI model development is necessary
  • Human feedback is crucial in reinforcement learning