From `docker push` to Bytes on Disk: Inside Distribution


Authors:   Adam Wolfe Gordon, Wayne Warren


The presentation discusses the HTTP interactions involved in image pushes and pulls, as well as the high-level overview of the distribution internals. It also highlights the interface methods and objects involved in the sequence of storing a chunk layer in the configured backend, specifically in the HTTP patch function for the blobs uploads endpoint.
  • HTTP interactions involved in image pushes and pulls
  • High-level overview of the distribution internals
  • Interface methods and objects involved in storing a chunk layer in the configured backend
  • Authentication, resuming the session, and uploading the data phases of the patch request
  • Use of S3 API as the configured backend
  • Implementation of data upload phase using an IO copy that reads from the patch request body and writes to the S3 blob writer
  • Use of multi-part upload for buffering in the S3 backend
The presentation explains that the S3 API is used as the configured backend for storing image layers. During the data upload phase, the incoming patch request body is streamed using the S3 blob writer, which makes use of a multi-part upload for buffering. This ensures that the entire request is not buffered in memory, which can be a security risk. Instead, the writer repeatedly calls right on the S3 blob writer as bytes are streamed in from the patch request body.


If you use containers, at some point you've probably done a `docker pull` or a `docker push`. But, have you ever thought about how those operations work? How does a container image travel to persistent storage in the cloud? What does it look like when it gets there? We hadn't thought much about these questions until we started building DigitalOcean Container Registry (DOCR) on top of the CNCF Distribution codebase in 2019. Working on DOCR required us to learn a lot of the answers and we're excited to share them. In this talk we'll pull back the curtain on how Distribution works. From your registry client, to the OCI Distribution API, to the CNCF Distribution codebase, to bytes on disk, we'll explain exactly how a container image makes it from your computer to the cloud, what it looks like when it gets there, and what happens when you ask for it back. We'll also touch on less-standardized topics such as authentication and the evolving garbage collection implementation in Distribution.Click here to view captioning/translation in the MeetingPlay platform!