Project

Profile

Help

Task #2883

Create model(s) for Python's Releases

Added by amacdona@redhat.com over 2 years ago. Updated 6 months ago.

Status:
MODIFIED
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
% Done:

100%

Platform Release:
Blocks Release:
Target Release - Python:
Backwards Incompatible:
No
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 29

Description

A content model, content serializer and content viewset have been already created by https://pulp.plan.io/issues/2882

This task is to finish those classes, adding any Python specific fields.

This task will be complete when a django shell user can CRUD full representations of Python Package "releases". A REST API user should be able to read a list of all Python units `/v3/content/python/` as well as retrieve data on a specific unit (url is not yet decided).

All unit metadata is provided by the shell user at this point. It is not expected that the plugin extract the metadata from a package or scrape it from upstream.

After discussion we will go with the Python "distribution package" as content unit model.

The PythonPackageContent (because it's not really a PythonContent, and DistributionContent would overload the term 'distribution' too much) would contain the following fields:

Pulp-related

packagetype
path
filename (primary key)

Python-related

name
version
metadata_version
summary
description
keywords
home_page
download_url
author
author_email
maintainer
maintainer_email
license
classifier
requires_python
project_url
platform
supported_platform
requires_dist
provides_dist
obsoletes_dist
requires_external

This is they way Pulp2 is modeled currently. Each content unit would contain one artifact corresponding to the filename distribution package on PyPI.

Disadvantages

The disadvantage of modeling a Python distribution package as a content unit is that this is something the user would not care as much about. We would have multiple content units for the same release, but for different systems:
eg.
scipy-0.9.0-cp26-cp26mu-manylinux1_x86_64.whl
scipy-0.9.0-cp27-cp27m-manylinux1_x86_64.whl
scipy-0.9.0-cp27-cp27mu-manylinux1_x86_64.whl
scipy-0.9.0.tar.gz
scipy-0.9.0.zip

As a user I do not want to view all these distribution packages when I query a repository. The only thing I would care about is the release, and I will let pip take care of which distribution package to install. PyPI in particular makes the release a first class citizen instead of the distribution packages.

Metadata that belongs to a release (i.e. additional metadata) would be repeated across content units. PyPI stores these metadata fields as a part of the release [0], and these fields could be updated in PyPI outside of a release. The metadata we store would be the metadata in a distribution package, which is immutable, so if a user updates metadata in PyPI, we would not sync the metadata updates.

Glossary

Release
A snapshot of a Project at a particular point in time, denoted by a version identifier.
Making a release may entail the publishing of multiple "distribution packages". For example, if version 1.0 of a project was released, it could be available in both a source distribution format and a Windows installer distribution format.

Distribution Package
A versioned archive file that contains Python packages, modules, and other resource files that are used to distribute a Release. The archive file is what an end-user will download from the internet and install. A project may contain many releases, and releases may contain many distribution packages. Can be type sdist, bdist, etc. "Distribution package" is used instead of "package" to avoid confusion with "import packages" or linux "distributions".

[0] https://warehouse.pypa.io/api-reference/xml-rpc/


Related issues

Blocked by Python Support - Task #2882: bootstrap pulp_python for Pulp 3 MODIFIED Actions
Blocks Python Support - Story #2885: As a User I can Publish Python packages in a way that is consumable by pip MODIFIED Actions
Blocks Python Support - Story #2884: As a user I can sync from PyPI MODIFIED Actions

Associated revisions

Revision 6a055226 View on GitHub
Added by werwty almost 2 years ago

Create models for PythonPackageContent

closes #2883
https://pulp.plan.io/issues/2883

History

#1 Updated by amacdona@redhat.com over 2 years ago

  • Blocked by Task #2882: bootstrap pulp_python for Pulp 3 added

#2 Updated by amacdona@redhat.com over 2 years ago

  • Project changed from Pulp to Python Support

#3 Updated by amacdona@redhat.com over 2 years ago

  • Blocks Story #2885: As a User I can Publish Python packages in a way that is consumable by pip added

#4 Updated by amacdona@redhat.com over 2 years ago

  • Blocks Story #2888: As a User, I can migrate my Pulp 2 Python content to Pulp 3 added

#5 Updated by mhrivnak over 2 years ago

How exactly would a user CRUD these? Will we give direct REST API access to the content via a viewset? I think that's reasonable, even desirable, but I'm not sure we've talked directly about that particular change in behavior from Pulp 2.

The things to think through are probably around locking.

C and R seem fine to do any time, outside of a task.

U is more iffy. We like content to be as immutable as possible. We need to think through specifics.

D starts to get interesting with regard to race conditions. Is it ok to allow a user to directly delete content via the API? Would we disallow that if the content was still associated with a repository? Are there any other restrictions? Maybe we should just not allow D at all for starters.

#6 Updated by amacdona@redhat.com over 2 years ago

  • Description updated (diff)

@mhrivnak, good question, have clarified in the description.

#7 Updated by bmbouter over 2 years ago

I'm confused by how this is written. Is this story to create a FileContent object in pulp_python as a subclass of ContentUnit from core? The way it reads now it seems that we'll be adding fields to core.

#8 Updated by amacdona@redhat.com over 2 years ago

  • Subject changed from Extend ContentUnit with all python specific fields to Finish the Content Model
  • Description updated (diff)

updated for clarity @bmbouter

#9 Updated by amacdona@redhat.com over 2 years ago

  • Description updated (diff)

#10 Updated by bmbouter over 2 years ago

@asmacdo, thanks for the revisions, this is clear now.

Do we know what these python fields are? Enumerating the names would be some good planning I think. What do you think about that? Is that even possible without doing all the work?

#11 Updated by amacdona@redhat.com over 2 years ago

@bmbouter determining which fields really will be a lot of the work.

Whoever takes this story should also check into warehouse and its expected release timeline. Specifically, we need to know what kind of metadata to support. PEP-0426 has not been accepted yet, but it outlines metadata version 3, (and previously documented version 2 which was never adopted.)

https://www.python.org/dev/peps/pep-0426/

#12 Updated by amacdona@redhat.com about 2 years ago

Some complexity of modeling is that PyPI still supports old versions of Metadata, and so we have to be as backwards compatible as PyPI is.

Relevant PEPs from a planning etherpad:
https://www.python.org/dev/peps/pep-0241/ Final (Original Metadata)
https://www.python.org/dev/peps/pep-0301/ Final (HTTP API)
https://www.python.org/dev/peps/pep-0314/ Final (Metadata 1.1)
https://www.python.org/dev/peps/pep-0345/ Accepted (Metadata 1.2)
https://www.python.org/dev/peps/pep-0427/ Accepted (Wheel 1.0)
https://www.python.org/dev/peps/pep-0491/ Draft (Wheel 1.9)
https://www.python.org/dev/peps/pep-0426/ Deferred (Metadata 2.0)
https://www.python.org/dev/peps/pep-0459/ Deferred (Standard Metadata Extensions)
https://www.python.org/dev/peps/pep-0503/ Accepted (Simple API)
https://www.python.org/dev/peps/pep-0508/ Active (Dep specification)

Another thing to consider is that in the PyPI metadata, some fields like description are shared between all releases, even though the metadata in the distribution packages might be different.

To keep it simple in Pulp 2, I opted to keep very little metadata for distribution packages (which is called a Packaged in Pulp 2). I'm not sure if this is the right approach.

#13 Updated by amacdona@redhat.com about 2 years ago

  • Subject changed from Finish the Content Model to Create model(s) for Python's DistributionPackages

#14 Updated by bizhang almost 2 years ago

  • Subject changed from Create model(s) for Python's DistributionPackages to Create model(s) for Python's Releases
  • Description updated (diff)

#15 Updated by bizhang almost 2 years ago

  • Description updated (diff)

#16 Updated by amacdona@redhat.com almost 2 years ago

For the "Release as Content Unit" model, the use case "I can publish a repo that contains only X Package type" wouldn't work without some other mechanism. I see 3 ways the plugin could work around this, (1) Filtered sync, (2) Rich Copy, (3) Filtered Publish. 1 and 2 both control the way that units are put into a repository, 3 would only change how the metadata is published.

Using a single release of scipy as an example: https://pypi.python.org/pypi/scipy/json/


"releases": {
        "0.9.0": [
            {
                "has_sig": true, 
                "upload_time": "2016-04-20T05:05:36", 
                "comment_text": "", 
                "python_version": "cp26", 
                "url": "https://pypi.python.org/packages/19/b1/a3ea10ee5425ca3c04f63aba1bb72e4d8f5535db99389016e980063238ac/scipy-0.9.0-cp26-cp26mu-manylinux1_x86_64.whl", 
                "md5_digest": "234d2300c7654b86cecaacfabc10812f", 
                "downloads": 125, 
                "filename": "scipy-0.9.0-cp26-cp26mu-manylinux1_x86_64.whl", 
                "packagetype": "bdist_wheel", 
                "path": "19/b1/a3ea10ee5425ca3c04f63aba1bb72e4d8f5535db99389016e980063238ac/scipy-0.9.0-cp26-cp26mu-manylinux1_x86_64.whl", 
                "size": 23438891
            }, 
            {
                "has_sig": true, 
                "upload_time": "2016-04-20T05:05:51", 
                "comment_text": "", 
                "python_version": "cp27", 
                "url": "https://pypi.python.org/packages/e4/81/df8b2598e99fe1651d70a80760732c95472d8cc3204d0750b8cb47e54525/scipy-0.9.0-cp27-cp27m-manylinux1_x86_64.whl", 
                "md5_digest": "77d0ff60d18961256f6a754e27cd435a", 
                "downloads": 260, 
                "filename": "scipy-0.9.0-cp27-cp27m-manylinux1_x86_64.whl", 
                "packagetype": "bdist_wheel", 
                "path": "e4/81/df8b2598e99fe1651d70a80760732c95472d8cc3204d0750b8cb47e54525/scipy-0.9.0-cp27-cp27m-manylinux1_x86_64.whl", 
                "size": 23468911
            }, 
            {
                "has_sig": true, 
                "upload_time": "2016-04-20T05:06:25", 
                "comment_text": "", 
                "python_version": "cp27", 
                "url": "https://pypi.python.org/packages/12/62/ee2b48d5117d6897f13d1e3e01cae50b479ab73f9d7131686808edcac5c1/scipy-0.9.0-cp27-cp27mu-manylinux1_x86_64.whl", 
                "md5_digest": "d640d7fb80614a959b3bdaf3ffb22673", 
                "downloads": 314, 
                "filename": "scipy-0.9.0-cp27-cp27mu-manylinux1_x86_64.whl", 
                "packagetype": "bdist_wheel", 
                "path": "12/62/ee2b48d5117d6897f13d1e3e01cae50b479ab73f9d7131686808edcac5c1/scipy-0.9.0-cp27-cp27mu-manylinux1_x86_64.whl", 
                "size": 23428517
            }, 
            {
                "has_sig": false, 
                "upload_time": "2011-02-28T07:17:23", 
                "comment_text": "", 
                "python_version": "source", 
                "url": "https://pypi.python.org/packages/4d/ed/08313eb178d8710c2f29bdf4e1efac55f716a6bcd40852cf80bc5d3d117f/scipy-0.9.0.tar.gz", 
                "md5_digest": "ebfef6e8e82d15c875a4ee6a46d4e1cd", 
                "downloads": 15180, 
                "filename": "scipy-0.9.0.tar.gz", 
                "packagetype": "sdist", 
                "path": "4d/ed/08313eb178d8710c2f29bdf4e1efac55f716a6bcd40852cf80bc5d3d117f/scipy-0.9.0.tar.gz", 
                "size": 6084552
            }
     ]
}

The "only X type" use case could apply to:
  • Only bdist_wheel (package_type: bdist_wheel)
  • only sdist (package_type: bdist_wheel
  • only cpython 2.7 (python_version: c27)
  • only cpython 2.6 (python_version: c26)

Filter "in". 1 (filtered sync) and 2 (filtered copy) both control the artifacts that are in a repository by creating FilteredContentUnits.

Content Units are immutable. If a repo contained only the sdist for the scipy-0.9.0 release, then a second repo that needed all DistributionPackages could not update scipy-0.9.0 to contain the wheels. This is simple enough to work around, a "filtererd sync" or a "filtered/rich copy" could create a FilteredContentUnit explained below. ContentUnits can share artifacts already, so we don't have to worry about duplication of artifacts.

This would require the importer to have at least one new field that indicates which artifacts/DistributionPackages will be included. For this example, lets say `importer.whitelist_package_type="sdist"`. At sync/upload/copy time, the importer processes the metadata from upstream and creates FilteredContentUnits like "scipy-0.9.0-sdist" which have only 1 artifact: "filename": "scipy-0.9.0.tar.gz". To reiterate, this artifact would be shared by the scipy-0.9.0 ContentUnit (which contains the wheels too) if another repository had all package types.

Filter "out" 3 (filtered publish)

Another approach would be to include all the content at import time, and filter the publish. The repository would contain a vanilla scipy-0.9.0 ContentUnit. At publish time, metadata is generated, simply leaving out anything unwanted. This would require a new field, ex. `publisher.whitelist_package_type`. Essentially this means that Pulp knows about all the types, but Pulp only tells clients about the specified type.

"releases": {
        "0.9.0": [

            {
                "has_sig": false, 
                "upload_time": "2011-02-28T07:17:23", 
                "comment_text": "", 
                "python_version": "source", 
                "url": "https://pypi.python.org/packages/4d/ed/08313eb178d8710c2f29bdf4e1efac55f716a6bcd40852cf80bc5d3d117f/scipy-0.9.0.tar.gz", 
                "md5_digest": "ebfef6e8e82d15c875a4ee6a46d4e1cd", 
                "downloads": 15180, 
                "filename": "scipy-0.9.0.tar.gz", 
                "packagetype": "sdist", 
                "path": "4d/ed/08313eb178d8710c2f29bdf4e1efac55f716a6bcd40852cf80bc5d3d117f/scipy-0.9.0.tar.gz", 
                "size": 6084552
            }
     ]
}

To expand this into a nice workflow, the user might, and create multiple Publishers for a given repository. `Python3LinuxWheelsPublisher` and `SourcePublisher`, which would each get their own Distribution(Pulp).

This meets the "curated type" use case, but doesn't help with using less disk space or syncing faster. These use cases could be met by using the on_demand feature. To create an "sdist only" Distribution(Pulp), the user could sync the repo with importer.download_policy="on_demand". They could then trigger immediate downloads for "sdists" only (/me waves my hands magically). The repo would contain the wheels, but wouldn't download them unless requested, which would never happen because pip (or whatever client) left out of the published metadata.

This would cause problems with Pulp-Pulp syncs. Syncing from a "sdist only" publication would cause the creation of partial ContentUnits, leading to the problems discussed in the "Filter in" method.

Upload fail

This could be the final nail. The user wants to upload a sdist of their project to their repository so QE can install to test their code. They upload "pulp-3.0.0-alpha" sdist to their "pulp-testing" repository. QE says "it works!" so the user creates a wheel and uploads that too. Now, we have broken the immutable content unit.

IMHO

I really like the Release-ContentUnit concept, since DistributionPackages are always a single file, and they are just different distributions of the same code (Release). It aligns well to the way PyPI publishes, and allows users to think "versions" instead of "distributions". However, I think it is too "against the grain" with Pulp, and causes too many issues. I think I've demonstrated that it is possible to work around the problems that come out of "sync/copy" and "filtered publish", I haven't thought of a way to work around "upload 2 DistributionPackages from the same release at different times" or "filtered publish + Pulp-Pulp sync". I'm certainly willing to continue thinking about it, but **my thinking is that we should use the DistributionPackage-ContentUnit model.

#17 Updated by bizhang almost 2 years ago

  • Description updated (diff)
  • Sprint Candidate changed from No to Yes

#18 Updated by dalley almost 2 years ago

  • Description updated (diff)

I made a few changes directly (see diff) but want to ask a few questions before marking groomed.

Would it make sense to change s/PackageContent/PythonPackageContent ? It would be good to avoid generic-sounding names inside of plugins.

I crosschecked the list against the metadata PEPs, and I just want to double-check that all of the fields from the PEPs that are not present up above were left off intentionally (as opposed to overlooked). Here are the ones in question:

  • Supported-Platform
  • Keywords
  • Requires-Dist
  • Provides-Dist
  • Obsoletes-Dist
  • Requires-External

#19 Updated by bizhang almost 2 years ago

+1 PythonPackageContent

Supported-Platform

This is only present in PEP-0345 but not in legacy-PyPI nor warehouse. +1 adding it as a pulp metadata field, but ignore on publish to PyPI

Keywords

+1 adding this

Requires-Dist
Provides-Dist
Obsoletes-Dist
Requires-External

I was thinking of modeling these as dependencies (with links to their dependents) but that should probably be a POST MVP thing.
+1 adding these as text fields for now

#20 Updated by bizhang almost 2 years ago

  • Description updated (diff)

#21 Updated by dalley almost 2 years ago

  • Groomed changed from No to Yes

#22 Updated by bizhang almost 2 years ago

  • Description updated (diff)

#23 Updated by bmbouter almost 2 years ago

  • Blocks Story #2884: As a user I can sync from PyPI added

#24 Updated by rchan almost 2 years ago

  • Sprint/Milestone set to 48

#25 Updated by daviddavis almost 2 years ago

  • Tags Pulp 3 MVP added

#26 Updated by bizhang almost 2 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to bizhang

#27 Updated by bizhang almost 2 years ago

  • Status changed from ASSIGNED to POST

#28 Updated by werwty almost 2 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

#29 Updated by bizhang almost 2 years ago

  • Sprint/Milestone changed from 48 to 51

#30 Updated by bizhang almost 2 years ago

  • Sprint/Milestone changed from 51 to 48

Removing python milestone since it prevented it from being on the sprint

#31 Updated by bmbouter over 1 year ago

  • Sprint set to Sprint 29

#32 Updated by bmbouter over 1 year ago

  • Sprint/Milestone deleted (48)

#33 Updated by bmbouter 6 months ago

  • Blocks deleted (Story #2888: As a User, I can migrate my Pulp 2 Python content to Pulp 3)

#34 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3, Pulp 3 MVP)

Please register to edit this issue

Also available in: Atom PDF