logo

Introduction to Text and Code Embeddings in the OpenAI API

Conference:  Transform X 2022

2022-10-19

Authors:   Arvind Neelakantan


Abstract

Text embeddings are useful features in many applications including semantic search, predicting code completion, natural language, topic modeling, classification, and computing text similarity. Arvind Neelakantan, Research Lead and Manager at OpenAI, introduces the concept of embeddings, a new terminus in the OpenAI API. When OpenAI originally introduced the API two years ago, it was based on the GPT-3 model, which was useful for many tasks. But, as Neelakantan explains, GPT-3 is not explicitly optimized to produce a single vector or embedding of the input. This ability, to have a condensed representation of the input, would be helpful for programmers and others to use as features for downstream applications, the OpenAI team determined. They set about building an unsupervised model that is good at getting this kind of single embedding, and created a contrastive pre-training model, which Neelakantan will describe. He covers use cases for embeddings, and how the API is used in the real world, including at JetBrains Research for astronomical research and at FineTune Learning, which builds education systems. FineTune is using text embeddings to more accurately find textbook content based on learning objectives.

Materials: