Issue #3128: Clarify image tagging vs. image naming - Docker Support - Pulp

Actions

Send by e-mail Copy link

Issue #3128

closed

Clarify image tagging vs. image naming

Added by mihai.ibanescu@gmail.com about 7 years ago. Updated almost 6 years ago.

Status:

CLOSED - NOTABUG

Priority:

Normal

Assignee:

Start date:

Due date:

Estimated time:

Severity:

2. Medium

Version - Docker:

Platform Release:

Target Release - Docker:

OS:

Triaged:

Groomed:

Sprint Candidate:

Tags:

Pulp 2

Sprint:

Quarter:

Description

As implemented upstream, within the REST API, a Docker registry refers to an image "name" as described in this doc:

https://docs.docker.com/registry/spec/api/#detail

For instance, for fetching a manifest:

GET /v2/<name>/manifests/<reference>

Their docs say:

Fetch the manifest identified by name and reference where reference can be a tag or digest. A HEAD request can also be issued to this endpoint to obtain resource information without receiving all data.

The <name> portion is typically a two-level (it may be more than two, but I am documenting what I typically see in a registry implemented by Docker) hiearchy, essentially <repo name>/<image name>.

Using the Docker client (from fedora 26), if I issue a "docker pull" command like:

docker pull registry.example.com/test-docker/app-nginx:latest

(general syntax is docker pull <registry>/<repo_id>/<image_name>:<tag>)

the repository is test-docker and the image name is app-rest. I am fetching the tag named latest.

This results in a REST call like:

GET /v2/test-docker/app-nginx/manifests/latest

I have attempted to replicate this behavior. Since Tag units in pulp only have a name (the tag value), and since Manifest units don't have a name (or at least Skopeo doesn't produce one, even though there is a "name" property that is available from querying a v2 manifest directly from a Docker registry), what I've tried was to name the tag "app-nginx:latest".

Question 1: was this ever considered as a use case? If so, is my attempt completely wrong?

Crane has no concept of repo, name and tag, I am in the process of fixing that, with a PR coming shortly if that is the case.

Assuming what I tried above is worthwhile: pulp's docker plugin has no concept of name + tag either, so the distributor produces a json document in the publish directory for v2, named <repo_id>/tags/list, containing:

{"name": "test-docker", "tags": ["app-nginx:latest", "latest"]}

The name here should have been "test-docker/app-nginx" and the tags should have been ["latest"]

So clearly my approach doesn't produce the right tags, and would require patching pulp_docker as well.

Talking to Thomas McKay on #pulp-dev, maybe I should ask the second question: how is image management intended to work in Pulp? For our use case, I need a pulp repo to contain different docker image types (i.e. different names) and their updates. Hypothetically, a deployment would consist of the following containers: app-nginx, app-mysql, app-server . Because they are all related, I want them all entitled the same way, so it would make sense to put them all in the same pulp repository (as well as their updates).

Actions

Copy link

Updated by mihai.ibanescu@gmail.com about 7 years ago

To summarize a discussion on #pulp-dev with @ipanova:

With a Docker registry at registry.example.com, I could create a "myproduct" repository, and store docker images for two containers, "app-nginx" and "app-mysql". Assuming that the image name matches the container name, and we model versioning with tags (as is typical in Docker):

registry.example.com/myproduct/app-nginx:1.0.0
registry.example.com/myproduct/app-mysql:1.2.3

Typically there is a :latest tag that points to the latest tag of each image (but that's just a convention). So those two images can also be addressed as:

registry.example.com/myproduct/app-nginx:latest
registry.example.com/myproduct/app-mysql:latest

which would fetch respectively versions 1.0.0 of app-nginx and 1.2.3 of app-mysql

This scenario cannot be currently implemented in pulp.

The best one can implement (at least based on my understanding) is create a "myproduct" pulp repo, and merge the image name and the tag together. The four tags would become:

app-nginx-1.0.0 and app-nginx-latest (pointing to the same unit)
app-mysql-1.2.3 and app-mysql-latest (pointing to the same unit)

Actions

Copy link

Updated by ipanova@redhat.com about 7 years ago

in docker:
you create namespace 'myproduct' under which you push images like "app-nginx" and "app-mysql".
you have an image id 123( which is app-nginx), you tag it 1.0.0 and push to myproduct/app-ngnix( the concatenation of myproduct and app-ngnix make a concept of docker repository)
you tag same image id 123( which is app-nginx) with latest tag and push to myproduct/app-nginx

you have an image id 321( which is app-mysql) you tag it 1.2.3 and push to myproduct/app-mysql
you tag same image id 321( which is app-mysql) with latest tag and push to myproduct/app-mysql

in pulp you create repo A and sync content from myproduct/app-ngnix( because you cannot sync just from myproduclt, there is even no such endpoint in docker registry API). As a result you would have in pulp 2 tags( latest and 1.0.0)
in pulp you create repo B sync content from myproduct/app-mysql. As a result you would have in pulp 2 tags ( latest and 1.2.3)

then you would do docker pull A:latest or docker pull A:1.0.0 and as a result you would get same image id
docker pull B:latest or docker pull 1.2.3 and as a result you would get same image id

You cannot have in pulp under one repo name 'myproduct' both app-nginx and app-mysql
Yes in docker they are under same namespace, but still considered as separate thing, just because when you try to pull repo tags, you don's access 'myproduct' but 'myproduct/appnginx' https://docs.docker.com/registry/spec/api/#tags
https://docs.docker.com/registry/spec/api/#catalog the catalog will return you 2 different repo ---> so as a conclusion, myproduct cannot be considered as as repo because Again there is even no such endpoint in the registry API, a repo is called --> myproduct/foo which can contain multiple tags and images( which are image manifests or manifests list) but all of them will be related to foo, probably different versions of foo, but just foo, not bar. This is how docker works.

https://docs.docker.com/registry/spec/api/#overview

I don't know how to explain in other words, maybe mhrivnak will give me a hand here. Or maybe i am wrong, but the probably my docker world will just collapse :D

Actions

Copy link

Updated by mihai.ibanescu@gmail.com about 7 years ago

I guess there are different ways of looking at the problem.

I understand why you reached the decision to map "image names" at the repo level, when the criterion is that, using a native API, I can enumerate all images in a native repo (e.g. yum has the repomd to accomplish that). Docker's /v2/_catalog API is not quite a match for that.

On the other hand, a Docker image is supporting a Docker container, which in a typical use is one process. A standalone Docker container typically has little value, and you would have more than one container in a deployment. From this perspective, being able to place disparate, but related, Docker images in the same pulp repo is valuable at least to me, because it brings Docker in line with all the other units we (SAS) manage in pulp: rpm, msi, debian, tar (forget I mentioned it), bosh releases.

Actions

Copy link

Updated by mhrivnak about 7 years ago

I think @ipanova is on the right track here. As far as I understand it, a "repository" in a docker registry is referenced by a path with two or more segments. The conceptual difference you are suggesting, where the first segment is a repository name, and the second is an image name within that repository, to my understanding does not line up with how I see registries getting used, nor have I seen such a semantic intent expressed from the docker project. It is certainly a valid idea, and the semantics you suggest could be a reasonable way for you to utilize the path namespace of your repositories.

I see docker repositories as more similar to a github repository than a yum repository. On github, a repository can be referenced as a path with two segments. The first segment is a user or organization, and the second segment identifies the name of an individual source repository. With your examples, I would expect to have github.com/mhrivnak/app-nginx/ and github.com/mhrivnak/app-rest/. Within each repository can be any number of branches and tags to track the lifecycle of that specific app.

In both a github repo and a docker registry repo, you can have tags/branches that contain completely different software and identify that by name, but it's not usually recommended.

In sum, I don't think docker has given us concrete primitives for differentiating a "repository" from an "image name". I believe those two concepts are generally considered to be equivalent within the ecosystem.

That said, if you know of any discussion within the docker ecosystem that supports a semantic differentiation between the path segments that identify a repository, we would love to see it and figure out how to support that.

Actions

Copy link

Updated by mihai.ibanescu@gmail.com about 7 years ago

Hopefully this doesn't sound like arguing for the purpose of arguing...

Let's leave the Docker registry semantics live their painful sad life for a second.

Our pipeline is currently, for all other unit types (rpm/deb/msi/msm/boshrelease)

build unit (koji or equivalent)
upload unit to existing repository in pulp
copy unit to its customer-facing location

Step 1 can produce any unit name, we are not in control of it.

With the current docker implementation, step 2 is impossible to implement, because, in order to not lose the image name, I would need to create one repo per image name, and I cannot create them ahead of time (see previous comment on step 1).

The problems:

Right now the pulp users performing step 2 are not allowed to create repositories, only to upload to existing repositories. We made that choice some time ago. It can be changed, of course, but it is a change
it makes Docker conceptually very different from everything else, and for no good reason.

One more data point: the Debian plugin had to implement a "repository in a repository" concept already, because of how Apt repositories are structured. So even with your semantic interpretation of Docker repositories, there is a precedent in making "sub-repos".

Actions

Copy link

Updated by ipanova@redhat.com about 7 years ago

i suggest to close this issue:
1) since whole quintessence of the technical discussion moved to #3136
2) there is nothing to clarify here, we use regular docker concepts conform docker specs and terms which could be found on their official docs pages.

Actions

Copy link