Bringing NLP capabilities to Apache Solr through ONNX and OpenNLP
- Apache OpenNLP is a Java-based NLP tool that has been around for over a decade and offers various capabilities such as tokenization, document classification, and named entity recognition
- Apache Solr depends on Apache Lucene for search functionality, and Apache Lucene has a dependency on Apache OpenNLP for some NLP operations
- The ONNX Runtime allows for the use of deep learning models across programming languages, architectures, and platforms, enabling the use of NLP services created in other languages
- The speaker demonstrates how a deep learning model trained using PyTorch or Tensorflow can be used for inference from a Java search stack of Apache OpenNLP, Apache Lucene, and Apache Solr
- The speaker discusses the challenges and relationships between OpenNLP, Lucene, and Solr, and provides resources for attendees to get started with these open source projects