For the "Release as Content Unit" model, the use case "I can publish a repo that contains only X Package type" wouldn't work without some other mechanism. I see 3 ways the plugin could work around this, (1) Filtered sync, (2) Rich Copy, (3) Filtered Publish. 1 and 2 both control the way that units are put into a repository, 3 would only change how the metadata is published.
Using a single release of scipy as an example: https://pypi.python.org/pypi/scipy/json/
"releases": {
"0.9.0": [
{
"has_sig": true,
"upload_time": "2016-04-20T05:05:36",
"comment_text": "",
"python_version": "cp26",
"url": "https://pypi.python.org/packages/19/b1/a3ea10ee5425ca3c04f63aba1bb72e4d8f5535db99389016e980063238ac/scipy-0.9.0-cp26-cp26mu-manylinux1_x86_64.whl",
"md5_digest": "234d2300c7654b86cecaacfabc10812f",
"downloads": 125,
"filename": "scipy-0.9.0-cp26-cp26mu-manylinux1_x86_64.whl",
"packagetype": "bdist_wheel",
"path": "19/b1/a3ea10ee5425ca3c04f63aba1bb72e4d8f5535db99389016e980063238ac/scipy-0.9.0-cp26-cp26mu-manylinux1_x86_64.whl",
"size": 23438891
},
{
"has_sig": true,
"upload_time": "2016-04-20T05:05:51",
"comment_text": "",
"python_version": "cp27",
"url": "https://pypi.python.org/packages/e4/81/df8b2598e99fe1651d70a80760732c95472d8cc3204d0750b8cb47e54525/scipy-0.9.0-cp27-cp27m-manylinux1_x86_64.whl",
"md5_digest": "77d0ff60d18961256f6a754e27cd435a",
"downloads": 260,
"filename": "scipy-0.9.0-cp27-cp27m-manylinux1_x86_64.whl",
"packagetype": "bdist_wheel",
"path": "e4/81/df8b2598e99fe1651d70a80760732c95472d8cc3204d0750b8cb47e54525/scipy-0.9.0-cp27-cp27m-manylinux1_x86_64.whl",
"size": 23468911
},
{
"has_sig": true,
"upload_time": "2016-04-20T05:06:25",
"comment_text": "",
"python_version": "cp27",
"url": "https://pypi.python.org/packages/12/62/ee2b48d5117d6897f13d1e3e01cae50b479ab73f9d7131686808edcac5c1/scipy-0.9.0-cp27-cp27mu-manylinux1_x86_64.whl",
"md5_digest": "d640d7fb80614a959b3bdaf3ffb22673",
"downloads": 314,
"filename": "scipy-0.9.0-cp27-cp27mu-manylinux1_x86_64.whl",
"packagetype": "bdist_wheel",
"path": "12/62/ee2b48d5117d6897f13d1e3e01cae50b479ab73f9d7131686808edcac5c1/scipy-0.9.0-cp27-cp27mu-manylinux1_x86_64.whl",
"size": 23428517
},
{
"has_sig": false,
"upload_time": "2011-02-28T07:17:23",
"comment_text": "",
"python_version": "source",
"url": "https://pypi.python.org/packages/4d/ed/08313eb178d8710c2f29bdf4e1efac55f716a6bcd40852cf80bc5d3d117f/scipy-0.9.0.tar.gz",
"md5_digest": "ebfef6e8e82d15c875a4ee6a46d4e1cd",
"downloads": 15180,
"filename": "scipy-0.9.0.tar.gz",
"packagetype": "sdist",
"path": "4d/ed/08313eb178d8710c2f29bdf4e1efac55f716a6bcd40852cf80bc5d3d117f/scipy-0.9.0.tar.gz",
"size": 6084552
}
]
}
The "only X type" use case could apply to:
- Only bdist_wheel (package_type: bdist_wheel)
- only sdist (package_type: bdist_wheel
- only cpython 2.7 (python_version: c27)
- only cpython 2.6 (python_version: c26)
Filter "in". 1 (filtered sync) and 2 (filtered copy) both control the artifacts that are in a repository by creating FilteredContentUnits.¶
Content Units are immutable. If a repo contained only the sdist for the scipy-0.9.0 release, then a second repo that needed all DistributionPackages could not update scipy-0.9.0 to contain the wheels. This is simple enough to work around, a "filtererd sync" or a "filtered/rich copy" could create a FilteredContentUnit explained below. ContentUnits can share artifacts already, so we don't have to worry about duplication of artifacts.
This would require the importer to have at least one new field that indicates which artifacts/DistributionPackages will be included. For this example, lets say `importer.whitelist_package_type="sdist"`. At sync/upload/copy time, the importer processes the metadata from upstream and creates FilteredContentUnits like "scipy-0.9.0-sdist" which have only 1 artifact: "filename": "scipy-0.9.0.tar.gz". To reiterate, this artifact would be shared by the scipy-0.9.0 ContentUnit (which contains the wheels too) if another repository had all package types.
Filter "out" 3 (filtered publish)¶
Another approach would be to include all the content at import time, and filter the publish. The repository would contain a vanilla scipy-0.9.0 ContentUnit. At publish time, metadata is generated, simply leaving out anything unwanted. This would require a new field, ex. `publisher.whitelist_package_type`. Essentially this means that Pulp knows about all the types, but Pulp only tells clients about the specified type.
"releases": {
"0.9.0": [
{
"has_sig": false,
"upload_time": "2011-02-28T07:17:23",
"comment_text": "",
"python_version": "source",
"url": "https://pypi.python.org/packages/4d/ed/08313eb178d8710c2f29bdf4e1efac55f716a6bcd40852cf80bc5d3d117f/scipy-0.9.0.tar.gz",
"md5_digest": "ebfef6e8e82d15c875a4ee6a46d4e1cd",
"downloads": 15180,
"filename": "scipy-0.9.0.tar.gz",
"packagetype": "sdist",
"path": "4d/ed/08313eb178d8710c2f29bdf4e1efac55f716a6bcd40852cf80bc5d3d117f/scipy-0.9.0.tar.gz",
"size": 6084552
}
]
}
To expand this into a nice workflow, the user might, and create multiple Publishers for a given repository. `Python3LinuxWheelsPublisher` and `SourcePublisher`, which would each get their own Distribution(Pulp).
This meets the "curated type" use case, but doesn't help with using less disk space or syncing faster. These use cases could be met by using the on_demand feature. To create an "sdist only" Distribution(Pulp), the user could sync the repo with importer.download_policy="on_demand". They could then trigger immediate downloads for "sdists" only (/me waves my hands magically). The repo would contain the wheels, but wouldn't download them unless requested, which would never happen because pip (or whatever client) left out of the published metadata.
This would cause problems with Pulp-Pulp syncs. Syncing from a "sdist only" publication would cause the creation of partial ContentUnits, leading to the problems discussed in the "Filter in" method.
Upload fail¶
This could be the final nail. The user wants to upload a sdist of their project to their repository so QE can install to test their code. They upload "pulp-3.0.0-alpha" sdist to their "pulp-testing" repository. QE says "it works!" so the user creates a wheel and uploads that too. Now, we have broken the immutable content unit.
IMHO¶
I really like the Release-ContentUnit concept, since DistributionPackages are always a single file, and they are just different distributions of the same code (Release). It aligns well to the way PyPI publishes, and allows users to think "versions" instead of "distributions". However, I think it is too "against the grain" with Pulp, and causes too many issues. I think I've demonstrated that it is possible to work around the problems that come out of "sync/copy" and "filtered publish", I haven't thought of a way to work around "upload 2 DistributionPackages from the same release at different times" or "filtered publish + Pulp-Pulp sync". I'm certainly willing to continue thinking about it, but **my thinking is that we should use the DistributionPackage-ContentUnit model.
Create models for PythonPackageContent
closes #2883 https://pulp.plan.io/issues/2883