Story #1882
closedRebuild model to support all package types
100%
Description
The python model is currently built from the PKG-INFO file. It's unit key is (name, version) which is a problem for wheels because there can be many files for each name and version. Filename is a better key because name, version, platform, pyversion, and filetype are all present in a predictable, collision proof way defined in https://www.python.org/dev/peps/pep-0427/ for wheels. The other formats should only have one file each (double check this).
The new model must:
- be able to be instantiated (and later saved to support lazy) from the JSON
- be able to update fields that are unavailable from JSON which can be populated from format (except .exe) using twine at download_success time
- can be instantiated from a package metadata (necessary for upload).
Additionally, it is important to consider that there are multiple versions of python metadata currently supported on pypi (1.0, 1.1, and 2.0). I have written a script to pull down packages and list the metadata keys (including the nested keys for each file) for each metadata version, the results are below.
To complete this task:
- Determine which fields are necessary for sync
- determine which fields may not play nice with all package types
- implement the model
- implement instantiation from a package
- add twine to requirements
{ u'1.0': set([ u'author',
u'author_email',
u'classifiers',
u'comment',
u'description',
u'download_url',
u'filetype',
u'home_page',
u'keywords',
u'license',
u'maintainer',
u'maintainer_email',
u'md5_digest',
u'metadata_version',
u'name',
u'obsoletes',
u'obsoletes_dist',
u'platform',
u'project_urls',
u'provides',
u'provides_dist',
u'pyversion',
u'requires',
u'requires_dist',
u'requires_external',
u'requires_python',
u'summary',
u'supported_platform',
u'version']),
u'1.1': set([ u'author',
u'author_email',
u'classifiers',
u'comment',
u'description',
u'download_url',
u'filetype',
u'home_page',
u'keywords',
u'license',
u'maintainer',
u'maintainer_email',
u'md5_digest',
u'metadata_version',
u'name',
u'obsoletes',
u'obsoletes_dist',
u'platform',
u'project_urls',
u'provides',
u'provides_dist',
u'pyversion',
u'requires',
u'requires_dist',
u'requires_external',
u'requires_python',
u'summary',
u'supported_platform',
u'version']),
u'2.0': set([ u'author',
u'author_email',
u'classifiers',
u'comment',
u'description',
u'download_url',
u'filetype',
u'home_page',
u'keywords',
u'license',
u'maintainer',
u'maintainer_email',
u'md5_digest',
u'metadata_version',
u'name',
u'obsoletes',
u'obsoletes_dist',
u'platform',
u'project_urls',
u'provides',
u'provides_dist',
u'pyversion',
u'requires',
u'requires_dist',
u'requires_external',
u'requires_python',
u'summary',
u'supported_platform',
u'version']),
'json': set([ u'_pypi_hidden',
u'_pypi_ordering',
u'author',
u'author_email',
u'bugtrack_url',
u'cheesecake_code_kwalitee_id',
u'cheesecake_documentation_id',
u'cheesecake_installability_id',
u'classifiers',
u'comment_text',
u'description',
u'docs_url',
u'download_url',
u'downloads',
u'filename',
u'has_sig',
u'home_page',
u'keywords',
u'license',
u'maintainer',
u'maintainer_email',
u'md5_digest',
u'name',
u'package_url',
u'packagetype',
u'path',
u'platform',
u'python_version',
u'release_url',
u'requires_dist',
u'requires_python',
u'size',
u'summary',
u'upload_time',
u'url',
u'version'])}
Files
Related issues
Updated by amacdona@redhat.com over 8 years ago
- Blocked by Story #1883: As a user, I can sync and publish all package types added
Updated by amacdona@redhat.com over 8 years ago
- Blocked by deleted (Story #1883: As a user, I can sync and publish all package types)
Updated by amacdona@redhat.com over 8 years ago
- Blocks Story #1883: As a user, I can sync and publish all package types added
Updated by amacdona@redhat.com over 8 years ago
- Blocks Story #1884: As a user, I can lazily sync python packages added
Updated by mhrivnak over 8 years ago
Did this come from breaking up #135 into smaller pieces? Or is this work that has not previously been put on a sprint? One way or another, we need this to go through the sprint planning process and get it accepted onto sprint 2.
Updated by amacdona@redhat.com over 8 years ago
- Tracker changed from Task to Story
- Sprint Candidate changed from No to Yes
This is the result of the reorganization of python stories.
Added by Austin Macdonald over 8 years ago
Updated by Anonymous over 8 years ago
- Status changed from ASSIGNED to MODIFIED
- % Done changed from 0 to 100
Applied in changeset 54687513a426a1bdec567452a52ff103555f7efe.
Updated by amacdona@redhat.com over 8 years ago
- Target Release - Python set to 2.0.0
Added by Austin Macdonald over 8 years ago
Revision 2b951967 | View on GitHub
Create new model for python packages
This commit does 4 things:
1. Rename _filename
to filename
.
2. Switch unit key of Package to filename.
3. Remove unnecessary fields from the model.
4. Add the machinery to create instances of new model from the PyPI
JSON metadata and directly from the packages.
The filename contains the name, version, platform, and python versions.
This means that filename is guaranteed to be unique on PyPI. This is
necessary because name
and version
will no longer guarantee
uniqueness when we are supporting multiple package types. For instance,
two seperate packages, a wheel and and sdist can and would share the
same name and version.
There are two reasons why I chose to drop the extra fields. Firstly, the
expected use case will be that users will specify which packages they
want to install from a pulp python repository, they will not need to
search the repo to figure out which packages to install. This means that
the fields on the model that are not necessary for sync or publish are
not very useful. The second problem is that if we are to implement lazy
sync for python, we need to be able to create the units from the
metadata available on PyPI. This metadata combines some fields like
license
and home_page
for all versions, platforms, and package types
of a given package name. This could present an issue where some of this
information changed over time, but the newest version of that data is
applied to all units. Since it is not really necessary in the first
place, it is preferred to leave the information out than to possibly
present inaccurate information.
The field summary
, though not strictly necessary, provide helpful
information and was left for convinence, despite the possibility that
it will change over time.
closes #1882
mor minimal model
mor docs
Updated by Anonymous over 8 years ago
- Status changed from POST to MODIFIED
Applied in changeset 2b9519671e44ebf36fbdb5eb1ec53fd98fc37700.
Updated by semyers almost 8 years ago
- Status changed from 5 to MODIFIED
- Platform Release deleted (
2.12.0)
This issue has been removed from the 2.12.0 release, and returned to its MODIFIED state for inclusion in a future release of Pulp.
Updated by semyers almost 8 years ago
- Status changed from MODIFIED to POST
I'm moving this back to POST to flag this bug as untestable in pulp-smash. This status is also appropriate because while the changes are merged to the 2.0-dev branch of pulp_python, that entire branch was deemed unreleasable for pulp 2.12, and the situation has not improved since then.
Updated by semyers almost 8 years ago
- Status changed from POST to MODIFIED
2.0-dev is now being included in Platform 2.13 builds again, returning this to a testable state. Note that any smash tests related to this issue should not be run prior to 2.13.0; this is tracked in pulp-smash issue https://github.com/PulpQE/pulp-smash/issues/588.
Updated by pcreech over 7 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Create new model for python packages
This commit does 4 things: 1. Rename
_filename
tofilename
. 2. Switch unit key of Package to filename. 3. Remove unnecessary fields from the model. 4. Add the machinery to create instances of new model from the PyPI JSON metadata and directly from the packages.The filename contains the name, version, platform, and python versions. This means that filename is guaranteed to be unique on PyPI. This is necessary because
name
andversion
will no longer guarantee uniqueness when we are supporting multiple package types. For instance, two seperate packages, a wheel and and sdist can and would share the same name and version.There are two reasons why I chose to drop the extra fields. Firstly, the expected use case will be that users will specify which packages they want to install from a pulp python repository, they will not need to search the repo to figure out which packages to install. This means that the fields on the model that are not necessary for sync or publish are not very useful. The second problem is that if we are to implement lazy sync for python, we need to be able to create the units from the metadata available on PyPI. This metadata combines some fields like
license
andhome_page
for all versions, platforms, and package types of a given package name. This could present an issue where some of this information changed over time, but the newest version of that data is applied to all units. Since it is not really necessary in the first place, it is preferred to leave the information out than to possibly present inaccurate information.The field
summary
, though not strictly necessary, provide helpful information and was left for convinence, despite the possibility that it will change over time.closes #1882
mor minimal model
mor docs