logo

Searching for the Right Words: Bringing NLP to Apache solr through ONNX and OpenNLP

2022-06-23

Authors:   Jeff Zemerick


Summary

Bringing NLP capabilities to Apache Solr through ONNX and OpenNLP
  • Apache OpenNLP is a Java-based NLP tool that has been around for over a decade and offers various capabilities such as tokenization, document classification, and named entity recognition
  • Apache Solr depends on Apache Lucene for search functionality, and Apache Lucene has a dependency on Apache OpenNLP for some NLP operations
  • The ONNX Runtime allows for the use of deep learning models across programming languages, architectures, and platforms, enabling the use of NLP services created in other languages
  • The speaker demonstrates how a deep learning model trained using PyTorch or Tensorflow can be used for inference from a Java search stack of Apache OpenNLP, Apache Lucene, and Apache Solr
  • The speaker discusses the challenges and relationships between OpenNLP, Lucene, and Solr, and provides resources for attendees to get started with these open source projects
The speaker mentions that Apache OpenNLP has no dependencies, not even on the recent Log4j vulnerability, making it a lightweight and secure option for NLP operations

Abstract

Natural language processing capabilities have exploded in the past few years, with most of the work done in Python. The ONNX Runtime provides a means for using deep learning models across programming languages, architectures, and platforms, promising to further democratize advancements in machine learning. With the ONNX Runtime, developers no longer have to rely on remote services to access NLP services created in other languages. In this session we will show how a deep learning model trained using PyTorch or Tensorfow can be used for inference from a Java search stack of Apache OpenNLP, Apache Lucene, and Apache Solr. We will demonstrate how these state-of-the-art NLP capabilities can be realized in Apache Solr to offer search users a more impactful search experience. We will discuss the challenges, the relationships between OpenNLP, Lucene, and Solr, and how attendees can get started in these open source projects.

Materials: