Story #1882
closedRebuild model to support all package types
100%
Description
The python model is currently built from the PKG-INFO file. It's unit key is (name, version) which is a problem for wheels because there can be many files for each name and version. Filename is a better key because name, version, platform, pyversion, and filetype are all present in a predictable, collision proof way defined in https://www.python.org/dev/peps/pep-0427/ for wheels. The other formats should only have one file each (double check this).
The new model must:
- be able to be instantiated (and later saved to support lazy) from the JSON
- be able to update fields that are unavailable from JSON which can be populated from format (except .exe) using twine at download_success time
- can be instantiated from a package metadata (necessary for upload).
Additionally, it is important to consider that there are multiple versions of python metadata currently supported on pypi (1.0, 1.1, and 2.0). I have written a script to pull down packages and list the metadata keys (including the nested keys for each file) for each metadata version, the results are below.
To complete this task:
- Determine which fields are necessary for sync
- determine which fields may not play nice with all package types
- implement the model
- implement instantiation from a package
- add twine to requirements
{ u'1.0': set([ u'author',
u'author_email',
u'classifiers',
u'comment',
u'description',
u'download_url',
u'filetype',
u'home_page',
u'keywords',
u'license',
u'maintainer',
u'maintainer_email',
u'md5_digest',
u'metadata_version',
u'name',
u'obsoletes',
u'obsoletes_dist',
u'platform',
u'project_urls',
u'provides',
u'provides_dist',
u'pyversion',
u'requires',
u'requires_dist',
u'requires_external',
u'requires_python',
u'summary',
u'supported_platform',
u'version']),
u'1.1': set([ u'author',
u'author_email',
u'classifiers',
u'comment',
u'description',
u'download_url',
u'filetype',
u'home_page',
u'keywords',
u'license',
u'maintainer',
u'maintainer_email',
u'md5_digest',
u'metadata_version',
u'name',
u'obsoletes',
u'obsoletes_dist',
u'platform',
u'project_urls',
u'provides',
u'provides_dist',
u'pyversion',
u'requires',
u'requires_dist',
u'requires_external',
u'requires_python',
u'summary',
u'supported_platform',
u'version']),
u'2.0': set([ u'author',
u'author_email',
u'classifiers',
u'comment',
u'description',
u'download_url',
u'filetype',
u'home_page',
u'keywords',
u'license',
u'maintainer',
u'maintainer_email',
u'md5_digest',
u'metadata_version',
u'name',
u'obsoletes',
u'obsoletes_dist',
u'platform',
u'project_urls',
u'provides',
u'provides_dist',
u'pyversion',
u'requires',
u'requires_dist',
u'requires_external',
u'requires_python',
u'summary',
u'supported_platform',
u'version']),
'json': set([ u'_pypi_hidden',
u'_pypi_ordering',
u'author',
u'author_email',
u'bugtrack_url',
u'cheesecake_code_kwalitee_id',
u'cheesecake_documentation_id',
u'cheesecake_installability_id',
u'classifiers',
u'comment_text',
u'description',
u'docs_url',
u'download_url',
u'downloads',
u'filename',
u'has_sig',
u'home_page',
u'keywords',
u'license',
u'maintainer',
u'maintainer_email',
u'md5_digest',
u'name',
u'package_url',
u'packagetype',
u'path',
u'platform',
u'python_version',
u'release_url',
u'requires_dist',
u'requires_python',
u'size',
u'summary',
u'upload_time',
u'url',
u'version'])}
Files
Related issues
Create new model for python packages
This commit does 4 things: 1. Rename
_filename
tofilename
. 2. Switch unit key of Package to filename. 3. Remove unnecessary fields from the model. 4. Add the machinery to create instances of new model from the PyPI JSON metadata and directly from the packages.The filename contains the name, version, platform, and python versions. This means that filename is guaranteed to be unique on PyPI. This is necessary because
name
andversion
will no longer guarantee uniqueness when we are supporting multiple package types. For instance, two seperate packages, a wheel and and sdist can and would share the same name and version.There are two reasons why I chose to drop the extra fields. Firstly, the expected use case will be that users will specify which packages they want to install from a pulp python repository, they will not need to search the repo to figure out which packages to install. This means that the fields on the model that are not necessary for sync or publish are not very useful. The second problem is that if we are to implement lazy sync for python, we need to be able to create the units from the metadata available on PyPI. This metadata combines some fields like
license
andhome_page
for all versions, platforms, and package types of a given package name. This could present an issue where some of this information changed over time, but the newest version of that data is applied to all units. Since it is not really necessary in the first place, it is preferred to leave the information out than to possibly present inaccurate information.The field
summary
, though not strictly necessary, provide helpful information and was left for convinence, despite the possibility that it will change over time.closes #1882
mor minimal model
mor docs