Project

Profile

Help

Task #1940

closed

Determine if package metadata should be updated after download

Added by mhrivnak almost 8 years ago. Updated about 5 years ago.

Status:
CLOSED - NOTABUG
Priority:
Normal
Sprint/Milestone:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Platform Release:
Target Release - Python:
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 3
Quarter:

Description

Credit to @amacdona, who actually wrote this in email. I'm just moving it to redmine.

I (@amacdona) have discussed this issue with a few of you, but I want to open it up
to the entire team.

The python plugin used to crack open the metadata files and use the
information there to populate the unit's metadata. For the upload case
we should be able to generate all the necessary information from the
packages themselves using the library twine.

Since we are moving towards the ability to lazily sync Python
repositories, there is a requirement that we can generate units using
only metadata that is available before downloading the bits.

Option 1. Continue to use only the metadata from PyPI. This will lead
nicely to the lazy work, the model will continue to be minimal.
Option 2. Create minimal units from PyPI and when the packages are
downloaded, inspect them (using the same twine library that will be used
in upload) and use that metadata to populate a more informative model.

It is necessary to give a little background information for why this
choice matters. The PyPI model is structured differently than ours and
because of this, some of the information on each package is lost when
packages are grouped into projects. I have a more detailed explanation
in our python model docs. [0] The point of all of this is that an older
release may have different metadata than a new release, but this
information is not accessible through the PyPI API, it is only
accessible by inspecting the files.

Benefits of Option 1:
1. Faster. We no longer need to touch the files.
2. Metadata is consistent with PyPI's API
3. Packages are consistent from the time they are created.

Benefits of Option 2:
1. We can include more metadata in the model.
2. The metadata of the package is consistent with the snapshot of
metadata at the time of package release.
3. Metadata will be consistent with the same package if it were
uploaded rather than synced.

I am leaning toward Option 1, but I would like to hear everyone's
feedback first.

[0]
https://github.com/asmacdo/pulp_python/blob/08f8f76656de89818fc7429b2c022f4634eaea77/plugins/pulp_python/plugins/models.py#L24

Also available in: Atom PDF