Project

Profile

Help

Task #2883

Updated by bizhang about 2 years ago


A content model, content serializer and content viewset have been already created by https://pulp.plan.io/issues/2882

This task is to finish those classes, adding any Python specific fields.

This task will be complete when a django shell user can CRUD full representations of Python Package "releases". A REST API user should be able to read a list of all Python units `/v3/content/python/` as well as retrieve data on a specific unit (url is not yet decided).

All unit metadata is provided by the shell user at this point. It is not expected that the plugin extract the metadata from a package or scrape it from upstream.

After discussion we will go with the Python "distribution package" as content unit model.

The PythonPackageContent PackageContent (because it's not really a PythonContent, and DistributionContent would overload the term 'distribution' too much) would contain the following fields:

h4. Pulp-related

| packagetype |
| path |
| filename (primary key) |

h4. Python-related

| name |
| version |
| metadata_version |
| summary |
| description |
| keywords |
| home_page|
| download_url |
| author |
| author_email |
| maintainer |
| maintainer_email |
| license |
| classifier |
| requires_python |
| project_url |
| platform |
| supported_platform |
| requires-dist |
| provides-dist |
| obsoletes-dist |
| requires-external
|

This is they way Pulp2 is modeled currently. Each content unit would contain one artifact corresponding to the filename distribution package on PyPI.

h3. Disadvantages

The disadvantage of modeling a Python distribution package as a content unit is that this is something the user would not care as much about. We would have multiple content units for the same release, but for different systems:
eg.
scipy-0.9.0-cp26-cp26mu-manylinux1_x86_64.whl
scipy-0.9.0-cp27-cp27m-manylinux1_x86_64.whl
scipy-0.9.0-cp27-cp27mu-manylinux1_x86_64.whl
scipy-0.9.0.tar.gz
scipy-0.9.0.zip

As a user I do not want to view all these distribution packages when I query a repository. The only thing I would care about is the release, and I will let pip take care of which distribution package to install. PyPI in particular makes the release a first class citizen instead of the distribution packages.

Metadata that belongs to a release (i.e. additional metadata) would be repeated across content units. PyPI stores these metadata fields as a part of the release [0], and these fields could be updated in PyPI outside of a release. The metadata we store would be the metadata in a distribution package, which is immutable, so if a user updates metadata in PyPI, we would not sync the metadata updates.

h3. Glossary

+*Release*+
A snapshot of a Project at a particular point in time, denoted by a version identifier.
Making a release may entail the publishing of multiple "distribution packages". For example, if version 1.0 of a project was released, it could be available in both a source distribution format and a Windows installer distribution format.

+*Distribution Package*+
A versioned archive file that contains Python packages, modules, and other resource files that are used to distribute a Release. The archive file is what an end-user will download from the internet and install. A project may contain many releases, and releases may contain many distribution packages. Can be type sdist, bdist, etc. "Distribution package" is used instead of "package" to avoid confusion with "import packages" or linux "distributions".

[0] https://warehouse.pypa.io/api-reference/xml-rpc/

Back