2022-06-21 ~ 2022-06-24

Presentations (with video): 23 (18)

This conference is for the open AI and Data community, and provides a forum to drive open source innovation in the AI, ML, DL, and Data domains by enabling collaboration and learning amongst the community. This event is produced by the LF AI + Data Foundation, and The Linux Foundation.

Sort by:  

Authors: Jakub Piotr Cłapa, Marcus Edel

Data sets are the backbone of Machine-Learning (ML), but some are more critical than others. There is a core set of them that researchers use to evaluate machine-learning models as a way to track how ML capabilities are advancing over time. One of the most known is the ImageNet data set, which kicked off the modern ML revolution. There's also Lyft's data set meant to train self-driving cars, etc. Over the years, studies have found that these data sets can contain serious flaws. ImageNet, for example, has several labels that are just flat-out wrong. A mushroom is labeled a spoon, a lion is labeled a monkey, or in the case of the Lyft data set, several cars are not annotated at all. All these datasets have one thing in common; they use a highly error-prone annotation pipeline with little or no quality checks. We worked on an open-source tool that uses and combines novel unsupervised machine-learning pipelines that help annotators and machine-learning engineers to identify and filter out potential label errors. In this talk, we will share our findings on how label errors affect the existing training process, discuss possible implications, and dive into how we leveraged unsupervised learning to filter out annotation errors while looking at real-world examples.
Authors: Neethu Elizabeth Simon, Scott Thomas

tldr - powered by Generative AI

Converting an old-school textile inspection machine into a smart system using AI/ML is effective and affordable even in the commodity fabric manufacturing industry.
  • Textile inspection is traditionally labor-intensive and error-prone.
  • Computer vision-based AI/ML solution using open source tools was developed for textile defect detection during the fabric inspection process.
  • Old-school manual fabric inspection machine was successfully integrated with cameras and open source AI/ML tools running on high-performance compute device.
  • Reasonably priced system was affordably applied to a much lower cost labor-intensive industry without expensive retooling or excessively high-priced technology.
  • Implementation and integration challenges encountered during design and development of this unique solution were resolved.
  • Model worked but was not scalable enough and was sensitive to folds and creases.
  • Inferencing was good but the system was not robust enough to handle high motor speed.
Authors: Charles Adetiloye, Keith Mattix

tldr - powered by Generative AI

Kubeflow Metal is a new way of deploying Kubeflow onto a Kubernetes cluster on bare metal servers, providing a low friction, high velocity way to deploy an ML platform in an easy, experimental on-prem environment.
  • Kubeflow Metal is a terraform module that deploys Kubeflow on a Kubernetes cluster on bare metal servers
  • It is a cheaper alternative to cloud infrastructure with a fixed cost
  • It allows for quick bootstrapping of an ML environment or infrastructure for a team
  • Deployment is elastic and easily scalable
  • It can be used for plugging into a CI/CD process
  • It is useful for cases where data cannot be moved to the cloud, such as financial or insurance data
  • Kubeflow Metal is looking for people to help improve the project
Authors: Jeff Zemerick

tldr - powered by Generative AI

Bringing NLP capabilities to Apache Solr through ONNX and OpenNLP
  • Apache OpenNLP is a Java-based NLP tool that has been around for over a decade and offers various capabilities such as tokenization, document classification, and named entity recognition
  • Apache Solr depends on Apache Lucene for search functionality, and Apache Lucene has a dependency on Apache OpenNLP for some NLP operations
  • The ONNX Runtime allows for the use of deep learning models across programming languages, architectures, and platforms, enabling the use of NLP services created in other languages
  • The speaker demonstrates how a deep learning model trained using PyTorch or Tensorflow can be used for inference from a Java search stack of Apache OpenNLP, Apache Lucene, and Apache Solr
  • The speaker discusses the challenges and relationships between OpenNLP, Lucene, and Solr, and provides resources for attendees to get started with these open source projects
Authors: Chin Huang, Ted Chang

tldr - powered by Generative AI

Overview of K-Serve with Model Mesh and demo of model inference using online features
  • K-Serve is a standards-based model serving platform built on top of Kubernetes
  • Model Mesh in K-Serve is designed to address Kubernetes' resource limitations and allows for high density and scalability
  • Model Mesh architecture includes serving runtime deployments, containers for model mesh logic, adapters for retrieving models, and model servers for inference
  • Scalability test showed that 20k simple stream models could be deployed into two serving runtime pods in a small Kubernetes cluster
  • Demo showed integration of open source model mesh model serving layer with Feast for multi-region model serving in a Kubernetes cluster
Authors: Bhakti Radharapu

How do I measure fairness? Is my ML model biased? How do I remediate bias in my model? This talk presents an overview of the main concepts of identifying, measuring and remediating bias in ML systems at scale. We begin by discussing how to measure fairness in production models and causes of algorithmic bias in systems. We then deep-dive into performing bias remediation at all steps of the ML life-cycle: data collection, pre-processing, in-training, and post-processing. We will focus on a gamut of open source tools and techniques in the ecosystem that can be used to create comprehensive fairness workflows. These have not only been vetted by the academic ML community but have also scaled very well for industry-level challenges. We hope that by the end of this talk, ML developers will not only be able to "flag" fairness issues in ML but also "fix" them by incorporating these tools and techniques in their ML workflows.
Authors: Christian Kadner

tldr - powered by Generative AI

The Q4 Pipelines team proposes a new component registry to address problems with authoring, publishing, and maintaining components. The registry will have a unified YAML format, versioning and tagging capabilities, and direct integration with the Q4 Pipelines SDK. Third-party registries can also implement the server-side of the API. The Machine Learning Exchange is an example of a registry that is implementing the new protocol. It offers various asset types, including pipelines, components, models, data sets, and notebooks. Watson Studio Pipelines is also in open beta and provides a canvas for running experiments and integrating notebooks.
  • Q4 Pipelines proposes a new component registry to address problems with authoring, publishing, and maintaining components
  • The registry will have a unified YAML format, versioning and tagging capabilities, and direct integration with the Q4 Pipelines SDK
  • Third-party registries can also implement the server-side of the API
  • The Machine Learning Exchange is an example of a registry that is implementing the new protocol and offers various asset types
  • Watson Studio Pipelines is in open beta and provides a canvas for running experiments and integrating notebooks
Authors: Oita Coleman, Jon Stine

Conversational AI is at a crossroads. Adoption of proprietary platforms has slowed significantly; consumer usage has stalled at simple functionality. At the same time, enterprises (across nearly all industries) see a value in conversational AI, not only in the call center, but in business operations and customer insight. What will it take to unlock the value of conversational AI for users? How might a Linux Foundation community make not only a difference for enterprises, but open opportunity for open-source developers? Join the leaders of the Open Voice Network, the LF's voice-centric community, for an open discussion on why, what, and what's next.
Authors: Zeyno A Dodd

According to a CNCF survey, 85% of the participating organizations emphasize the importance of security modernization for their cloud native deployments, along with the modernization of legacy infrastructure, adopting cloud-native security architectures, dynamic, standardized procedures, and automation going beyond the traditional security measures. Cloud-native security follows cloud-native technology, and with the implication of increased maturity of the cloud-native space, 82% expresses willingness to adopt OSS for security. This inclination is further relevant considering the challenge of sorting through a plethora of security and compliance products, frameworks and tools and lack of shared standards in an ever-evolving threat landscape. The need for adaptability and timely response to the threat of cyber-attacks drives global and focused efforts to build technologies, OSINT integration strategies, models and capabilities capturing CVEs, cybersecurity risk management frameworks, and knowledge bases of adversary tactics and techniques.Graph neural networks (GNNs) have received great attention due to their superior performance and ability to represent the real-world complexity in a variety of applications ranging from recommender systems to drug discovery. We outline a security strategy leveraging a GNN inference framework coupling prevention with detection capabilities against real-time threats and violations. Our efforts focus on the development of Kubernetes security agent templates, for real time detection, attack emulation and recommendation capabilities implementing various GNN inferences including link prediction and node classification. Our preliminary graph models are built and trained leveraging knowledge graphs from Mitre Att&ck framework threat patterns and techniques, and the Microsoft Security Threat Matrix for Kubernetes.
Authors: Bing HE

Unstructured data is flooding over businesses nowadays while the way of processing data has been always limited to a structured way before. Neural search creator Jina AI has come aiming to bring a new way of accessing unstructured data in its original unstructured way which helps unlock huge potential for businesses to see the value their unstructured data could bring. This talk will be sharing the best learnings that Jina AI has built with open source product ecosystem to help developers easily build applications built by neural search and also how this will bring unlock business opportunities.