OpenAI + Data Forum 2022

Sort by:

Labeling Tools are Great, but What About Quality Checks?

Conference: OpenAI + Data Forum 2022

Authors: Jakub Piotr Cłapa, Marcus Edel

2022-06-23

Data sets are the backbone of Machine-Learning (ML), but some are more critical than others. There is a core set of them that researchers use to evaluate machine-learning models as a way to track how ML capabilities are advancing over time. One of the most known is the ImageNet data set, which kicked off the modern ML revolution. There's also Lyft's data set meant to train self-driving cars, etc. Over the years, studies have found that these data sets can contain serious flaws. ImageNet, for example, has several labels that are just flat-out wrong. A mushroom is labeled a spoon, a lion is labeled a monkey, or in the case of the Lyft data set, several cars are not annotated at all. All these datasets have one thing in common; they use a highly error-prone annotation pipeline with little or no quality checks. We worked on an open-source tool that uses and combines novel unsupervised machine-learning pipelines that help annotators and machine-learning engineers to identify and filter out potential label errors. In this talk, we will share our findings on how label errors affect the existing training process, discuss possible implications, and dive into how we leveraged unsupervised learning to filter out annotation errors while looking at real-world examples.

Tags:

Show 0 Comments

Lightning Talk: Demystifying Challenges/Learnings in Converting an Old-school Textile Inspection Machine into a Smart System Using AI/ML

Conference: OpenAI + Data Forum 2022

Authors: Neethu Elizabeth Simon, Scott Thomas

2022-06-23

tldr - powered by Generative AI

Converting an old-school textile inspection machine into a smart system using AI/ML is effective and affordable even in the commodity fabric manufacturing industry.

Textile inspection is traditionally labor-intensive and error-prone.
Computer vision-based AI/ML solution using open source tools was developed for textile defect detection during the fabric inspection process.
Old-school manual fabric inspection machine was successfully integrated with cameras and open source AI/ML tools running on high-performance compute device.
Reasonably priced system was affordably applied to a much lower cost labor-intensive industry without expensive retooling or excessively high-priced technology.
Implementation and integration challenges encountered during design and development of this unique solution were resolved.
Model worked but was not scalable enough and was sensitive to folds and creases.
Inferencing was good but the system was not robust enough to handle high motor speed.

Tags:

textiledefectdetection

Show 0 Comments

Lightning Talk: Introducing Kubeflow Metal: Your Machine Learning Platform on Baremetal Kubernetes

Conference: OpenAI + Data Forum 2022

Authors: Charles Adetiloye, Keith Mattix

2022-06-23

tldr - powered by Generative AI

Kubeflow Metal is a new way of deploying Kubeflow onto a Kubernetes cluster on bare metal servers, providing a low friction, high velocity way to deploy an ML platform in an easy, experimental on-prem environment.

Kubeflow Metal is a terraform module that deploys Kubeflow on a Kubernetes cluster on bare metal servers
It is a cheaper alternative to cloud infrastructure with a fixed cost
It allows for quick bootstrapping of an ML environment or infrastructure for a team
Deployment is elastic and easily scalable
It can be used for plugging into a CI/CD process
It is useful for cases where data cannot be moved to the cloud, such as financial or insurance data
Kubeflow Metal is looking for people to help improve the project

Tags:

Show 0 Comments

Searching for the Right Words: Bringing NLP to Apache solr through ONNX and OpenNLP

Conference: OpenAI + Data Forum 2022

Authors: Jeff Zemerick

2022-06-23

tldr - powered by Generative AI

Bringing NLP capabilities to Apache Solr through ONNX and OpenNLP

Apache OpenNLP is a Java-based NLP tool that has been around for over a decade and offers various capabilities such as tokenization, document classification, and named entity recognition
Apache Solr depends on Apache Lucene for search functionality, and Apache Lucene has a dependency on Apache OpenNLP for some NLP operations
The ONNX Runtime allows for the use of deep learning models across programming languages, architectures, and platforms, enabling the use of NLP services created in other languages
The speaker demonstrates how a deep learning model trained using PyTorch or Tensorflow can be used for inference from a Java search stack of Apache OpenNLP, Apache Lucene, and Apache Solr
The speaker discusses the challenges and relationships between OpenNLP, Lucene, and Solr, and provides resources for attendees to get started with these open source projects

Tags:

Natural language processing

Show 0 Comments

Integrating High Performance Feature Stores with KServe Model Serving

Conference: OpenAI + Data Forum 2022

Authors: Chin Huang, Ted Chang

2022-06-23

tldr - powered by Generative AI

Overview of K-Serve with Model Mesh and demo of model inference using online features

K-Serve is a standards-based model serving platform built on top of Kubernetes
Model Mesh in K-Serve is designed to address Kubernetes' resource limitations and allows for high density and scalability
Model Mesh architecture includes serving runtime deployments, containers for model mesh logic, adapters for retrieving models, and model servers for inference
Scalability test showed that 20k simple stream models could be deployed into two serving runtime pods in a small Kubernetes cluster
Demo showed integration of open source model mesh model serving layer with Feast for multi-region model serving in a Kubernetes cluster

Tags:

Show 0 Comments

Flagging and Fixing Bias in ML

Conference: OpenAI + Data Forum 2022

Authors: Bhakti Radharapu

2022-06-23

How do I measure fairness? Is my ML model biased? How do I remediate bias in my model? This talk presents an overview of the main concepts of identifying, measuring and remediating bias in ML systems at scale. We begin by discussing how to measure fairness in production models and causes of algorithmic bias in systems. We then deep-dive into performing bias remediation at all steps of the ML life-cycle: data collection, pre-processing, in-training, and post-processing. We will focus on a gamut of open source tools and techniques in the ecosystem that can be used to create comprehensive fairness workflows. These have not only been vetted by the academic ML community but have also scaled very well for industry-level challenges. We hope that by the end of this talk, ML developers will not only be able to "flag" fairness issues in ML but also "fix" them by incorporating these tools and techniques in their ML workflows.

Tags:

Show 0 Comments

Lightning Talk: A Component Registry for Kubeflow Pipelines

Conference: OpenAI + Data Forum 2022

Authors: Christian Kadner

2022-06-23

tldr - powered by Generative AI

The Q4 Pipelines team proposes a new component registry to address problems with authoring, publishing, and maintaining components. The registry will have a unified YAML format, versioning and tagging capabilities, and direct integration with the Q4 Pipelines SDK. Third-party registries can also implement the server-side of the API. The Machine Learning Exchange is an example of a registry that is implementing the new protocol. It offers various asset types, including pipelines, components, models, data sets, and notebooks. Watson Studio Pipelines is also in open beta and provides a canvas for running experiments and integrating notebooks.

Q4 Pipelines proposes a new component registry to address problems with authoring, publishing, and maintaining components
The registry will have a unified YAML format, versioning and tagging capabilities, and direct integration with the Q4 Pipelines SDK
Third-party registries can also implement the server-side of the API
The Machine Learning Exchange is an example of a registry that is implementing the new protocol and offers various asset types
Watson Studio Pipelines is in open beta and provides a canvas for running experiments and integrating notebooks

Tags:

Show 0 Comments

BoF: An Open Future for Conversational AI -- Why is it Important, and How Do We Get There?

Conference: OpenAI + Data Forum 2022

Authors: Oita Coleman, Jon Stine

2022-06-23

Conversational AI is at a crossroads. Adoption of proprietary platforms has slowed significantly; consumer usage has stalled at simple functionality. At the same time, enterprises (across nearly all industries) see a value in conversational AI, not only in the call center, but in business operations and customer insight. What will it take to unlock the value of conversational AI for users? How might a Linux Foundation community make not only a difference for enterprises, but open opportunity for open-source developers? Join the leaders of the Open Voice Network, the LF's voice-centric community, for an open discussion on why, what, and what's next.

Tags:

Show 0 Comments

Lightning Talk: A GNN Based Framework for Kubernetes Security Agents: Threat and Vulnerability Detectors, Recommenders and Attack Simulators

Conference: OpenAI + Data Forum 2022

Authors: Zeyno A Dodd

2022-06-23

According to a CNCF survey, 85% of the participating organizations emphasize the importance of security modernization for their cloud native deployments, along with the modernization of legacy infrastructure, adopting cloud-native security architectures, dynamic, standardized procedures, and automation going beyond the traditional security measures. Cloud-native security follows cloud-native technology, and with the implication of increased maturity of the cloud-native space, 82% expresses willingness to adopt OSS for security. This inclination is further relevant considering the challenge of sorting through a plethora of security and compliance products, frameworks and tools and lack of shared standards in an ever-evolving threat landscape. The need for adaptability and timely response to the threat of cyber-attacks drives global and focused efforts to build technologies, OSINT integration strategies, models and capabilities capturing CVEs, cybersecurity risk management frameworks, and knowledge bases of adversary tactics and techniques.Graph neural networks (GNNs) have received great attention due to their superior performance and ability to represent the real-world complexity in a variety of applications ranging from recommender systems to drug discovery. We outline a security strategy leveraging a GNN inference framework coupling prevention with detection capabilities against real-time threats and violations. Our efforts focus on the development of Kubernetes security agent templates, for real time detection, attack emulation and recommendation capabilities implementing various GNN inferences including link prediction and node classification. Our preliminary graph models are built and trained leveraging knowledge graphs from Mitre Att&ck framework threat patterns and techniques, and the Microsoft Security Threat Matrix for Kubernetes.

Tags:

Show 0 Comments

The Unlimited Potential of Neural Search to Unlock the New Way of Data Comprehension

Conference: OpenAI + Data Forum 2022

Authors: Bing HE

2022-06-23

Unstructured data is flooding over businesses nowadays while the way of processing data has been always limited to a structured way before. Neural search creator Jina AI has come aiming to bring a new way of accessing unstructured data in its original unstructured way which helps unlock huge potential for businesses to see the value their unstructured data could bring. This talk will be sharing the best learnings that Jina AI has built with open source product ecosystem to help developers easily build applications built by neural search and also how this will bring unlock business opportunities.

Tags:

business opportunities

Show 0 Comments