Story #985
closedStory #1883: As a user, I can sync and publish all package types
As a user, I can sync all packages from pypi (complete mirror)
100%
Description
To sync all packages from PyPI, bandersnatch[0] (the PyPI mirror tool) will be a good reference.
We will need 2 workflows.
initial sync /force-full sync¶
Roughly, the workflow the same as a sync with a whitelist of project names except this will require an additional call to the simple index to retrieve a list of all projects and their urls.
incremental sync¶
The XML-RPC PyPI API has a call `changelog_since_serial(since_serial)` which will return all of the projects that have been updated since the last sync. Once we have this, we essentially have our whitelist and sync can proceed as it does in the other cases.
This does present a problem though. The repository would need a "latest_serial" or something similar. Currently, this could be stored in repository.notes['latest_serial'], but if possible, I would prefer to avoid using the notes field like this. An alternative would require a significant change to pulpcore-- typed repositories.
Related issues
Pulp now uses Bandersnatch to perform metadata syncing
Sync uses Bandersnatch to perform metadata fetching and filtering enabling Pulp to sync all of PyPi.
closes: #6930 closes: #6875 closes: #985 https://pulp.plan.io/issues/6930 https://pulp.plan.io/issues/6875 https://pulp.plan.io/issues/985