Story #985
Updated by amacdona@redhat.com over 6 years ago
To sync Basically, I'd like to be able to set up an internal pypi mirror. From our list discussion: <pre> From: pulp-list On Behalf Of Randy Barlow Sent: Wednesday, May 13, 2015 9:00 AM To: pulp-list Subject: Re: [Pulp-list] Sync all packages from PyPI, bandersnatch[0] (the PyPI mirror tool) will be a good reference. PyPi with pulp_python plugin We will need 2 workflows. -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 h3. initial sync /force-full sync Roughly, the workflow the same as On 05/13/2015 08:21 AM, Ashby, Jason (IMS) wrote: > I’m looking to set up a sync pypi mirror with a whitelist of project names except this will require an additional call pulp. I’m currently > using Bandersnatch for this, but it’d be nice to drop it and use > pulp instead. Per the simple index docs*, I see that you can sync specific > packages from pypi, e.g. > > pulp-admin python repo create --repo-id pypi --feed > https://pypi.python.org/ --package-names numpy,scipy > > but I can’t seem to retrieve sync ALL packages. I tried leaving off the > --package-names option, but a list of all projects and their urls. h3. incremental sync downloads 0 packages. Should > I submit an issue/feature request at > https://pulp.plan.io/projects/pulp_python/issues? Hi Jason! The XML-RPC problem is that PyPI API has a call `changelog_since_serial(since_serial)` which will return all of the projects that does not have been updated since one single manifest file for the last sync. Once we have available package versions, but rather one manifest per package name. Due to this, we essentially have our whitelist and in order to sync can proceed as all packages from PyPI it does in would be necessary to make around 45-50,000 web requests just to find out what would need to be downloaded, and then of course we would need to perform the other cases. actual package downloads. That said, we are working on a plan to have Pulp be able to lazy fetch packages as they are requested. This does present plan will take a long time to implement (so don't expect it in any of our close releases) but I think it will solve this problem though. The repository would need in a "latest_serial" or something similar. Currently, performant way. Another possible solution may be Warehouse[0]. I've been talking to the PyPA developers about this could problem, and they are aware that it needs to be stored solved. They may fix it there, in repository.notes['latest_serial'], which case we can get the Python importer to be aware of all the packages. I have also considered just doing the 50k requests anyway. I suspect that PyPI won't like if we do that, but if possible, it is technically possible as well. I would prefer say go ahead and file an RFE. I'll think some more about how we might be able to avoid using get it working. Thanks for the notes field like this. An alternative would require a significant change to pulpcore-- typed repositories. [0]: https://pypi.org/project/bandersnatch/ note, and I hope you enjoy the plugin otherwise! [0] https://warehouse.python.org/ </pre>