Project

Profile

Help

Story #985

closed

Story #1883: As a user, I can sync and publish all package types

As a user, I can sync all packages from pypi (complete mirror)

Added by ashbyj@imsweb.com over 9 years ago. Updated almost 4 years ago.

Status:
MODIFIED
Priority:
High
Assignee:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
3.0.0
Target Release - Python:
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Quarter:

Description

To sync all packages from PyPI, bandersnatch[0] (the PyPI mirror tool) will be a good reference.

We will need 2 workflows.

initial sync /force-full sync

Roughly, the workflow the same as a sync with a whitelist of project names except this will require an additional call to the simple index to retrieve a list of all projects and their urls.

incremental sync

The XML-RPC PyPI API has a call `changelog_since_serial(since_serial)` which will return all of the projects that have been updated since the last sync. Once we have this, we essentially have our whitelist and sync can proceed as it does in the other cases.

This does present a problem though. The repository would need a "latest_serial" or something similar. Currently, this could be stored in repository.notes['latest_serial'], but if possible, I would prefer to avoid using the notes field like this. An alternative would require a significant change to pulpcore-- typed repositories.

[0]: https://pypi.org/project/bandersnatch/


Related issues

Related to Python Support - Refactor #6930: Use Bandersnatch to perform package metadata fetching and filteringMODIFIEDgerrod

Actions

Also available in: Atom PDF