Task #2883
Updated by bizhang about 7 years ago
A content model, content serializer and content ViewSet will have been already created by https://pulp.plan.io/issues/2882 This task is to finish those classes, adding any Python specific fields. This task will be complete when a django shell user can CRUD full representations of Python Package Releases. A REST API user should be able to read a list of all Python units `/v3/content/python/` as well as retrieve data on a specific unit (url is not yet decided). All unit metadata is provided by the shell user at this point. It is not expected that the plugin extract the metadata from a package or scrape it from upstream. h2. Content Model There are two ways of content modeling being discussed right now. For compactness I'm going The Pulp Content should map to call a Python Release and should contain the following metadata fields _additional metadata_: required fields: |name | |version| |metadata_version| And the following optional fields: |summary| |Description| |Keywords | |Home-page| |Download-URL| |Author| |Author-email| |Maintainer | |Maintainer-email| |License | |Classifier | |Requires-Python| |Project-URL | |platform| |download_url| h3. Python Distribution Package as Each Content Unit This is they way Pulp2 is modeled currently. A content unit would look like: |filename (primary key)| |name | |version | |metadata_version| |packagetype | |path | |additional metadata | And each content unit would will contain one artifact corresponding 1 or more artifacts that correspond to the filename distribution package on PyPI. Python Distribution Release (sdist, wheel, egg) h4. Disadvantages h2. Clarifications The disadvantage It appears that the PyPI model keeps a copy of modeling a python distribution package as a each of the fields in the pulp content unit is that this is something the user would separately [0] So we do not care as much about. We would have multiple content units for the same release, but for different systems: eg. scipy-0.9.0-cp26-cp26mu-manylinux1_x86_64.whl scipy-0.9.0-cp27-cp27m-manylinux1_x86_64.whl scipy-0.9.0-cp27-cp27mu-manylinux1_x86_64.whl scipy-0.9.0.tar.gz scipy-0.9.0.zip As a user I do not want to view all these distribution packages when I query a repository. The only thing I would care worry about is shared fields. In the release, and I will let pip take care of which distribution package to install. PyPI in particular makes the release a first class citizen instead of the distribution packages. Metadata that belongs to a release (i.e. additional metadata) would be repeated across content units. PyPI stores these model, no duplicate metadata fields as a part of the release [0], and these fields could be updated in PyPI outside of a release. The metadata we store would be the metadata in a distirbution package, which copy is immutable, so kept [1]. This means that if the user has a user updates different set of metadata in PyPI, we would the distribution release, it should not sync be read by Pulp; Pulp will only get the metadata updates. h3. Python Release as a Content Unit The alternative is to model a python release as a content unit. A content unit would look like: |name | |version| |metadata_version| |additional metadata | Where from the primary key would be (name, version) And the Distribution Package would map to an artifact, release, and it would have generate the following fields: |packagetype | |path | |filename | Each Content unit would contain all artifact from the distribution releases associated with it. h4. Disadvantages This way of modeling works really well for release, without parsing the super easy use case of "I just want to sync everything", but it begins to break down when we consider the I only want the wheels use case. If I want one repository to contain all the sdists and one for all the wheels, this implementation as is would not allow that (since content units are immutable, we can't have one scipiv.0.9.0 sdist only content unit, and another wheels only). @asmacdo has proposed a partial sync workaround, which he will update this issue with. I will note however that even with partial sync, we will have to display the metadata for all within the artifacts otherwise the pulp <-> pulp sync will leave out known artifacts. distribution release. h3. Glossary Release [0] https://github.com/pypa/warehouse/blob/master/warehouse/packaging/models.py#L215 A snapshot of a Project at a particular point in time, denoted by a version identifier. Making a release may entail the publishing of multiple Distributions. For example, if version 1.0 of a project was released, it could be available in both a source distribution format and a Windows installer distribution format. https://github.com/pypa/pypi-legacy/blob/master/tools/sqlite_create.py#L18 Distribution Package A versioned archive file that contains Python packages, modules, and other resource files that are used to distribute a Release. The archive file is what an end-user will download from the internet and install. A project may contain many releases, and releases may contain many distribution packages. Can be type sdist, bdist, etc. "Distribution package" is used instead of "package" to avoid confusion with "import packages" or linux "distributions". [0] https://warehouse.pypa.io/api-reference/xml-rpc/ [1] https://github.com/pypa/warehouse/blob/master/warehouse/packaging/models.py#L358