https://pulp.plan.io/https://pulp.plan.io/favicon.ico2015-05-14T13:30:39ZPulpPython Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=43362015-05-14T13:30:39Zashbyj@imsweb.comashbyj@imsweb.com
<ul></ul><p>Warehouse looks great. How stable is it? I see they have a list_packages() query that hopefully can grab a list of packages in one query, but pulp_python may need some refactoring to download multiple packages in one shot instead of looping and downloading each package as a separate request. The mirroring support in Warehouse looks helpful as well.</p>
<p><a href="https://warehouse.readthedocs.org/api-reference/xml-rpc/#package-querying" class="external">https://warehouse.readthedocs.org/api-reference/xml-rpc/#package-querying</a></p> Python Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=43442015-05-14T22:11:21Zrbarlow
<ul><li><strong>Tracker</strong> changed from <i>Issue</i> to <i>Story</i></li><li><strong>Category</strong> deleted (<del><i>21</i></del>)</li><li><strong>Groomed</strong> set to <i>No</i></li><li><strong>Sprint Candidate</strong> set to <i>No</i></li></ul><p><a href="mailto:ashbyj@imsweb.com" class="email">ashbyj@imsweb.com</a> wrote:</p>
<blockquote>
<p>Warehouse looks great. How stable is it? I see they have a list_packages() query that hopefully can grab a list of packages in one query, but pulp_python may need some refactoring to download multiple packages in one shot instead of looping and downloading each package as a separate request. The mirroring support in Warehouse looks helpful as well.</p>
</blockquote>
<p>Yeah it really does look nice. I'm not sure how stable it is yet, other than that PyPI is not using it yet and they are still developing it. I have considered starting a branch to test the Python plugin against the current deployment, but I'd like to know the answer to your question before going too far with that ☺</p>
<p>As for the refactor - I think that's a good idea!</p> Python Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=113592016-05-06T16:33:37Zamacdona@redhat.comaustin@redhat.com
<ul><li><strong>Parent issue</strong> set to <i>#1883</i></li></ul> Python Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=282152018-05-15T03:23:32Zamacdona@redhat.comaustin@redhat.com
<ul><li><strong>Subject</strong> changed from <i>As a user, I can sync all packages from pypi</i> to <i>As a user, I can sync all packages from pypi (complete mirror)</i></li><li><strong>Tags</strong> <i>Pulp 3</i> added</li></ul> Python Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=282252018-05-15T03:46:53Zamacdona@redhat.comaustin@redhat.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/28225/diff?detail_id=28885">diff</a>)</li></ul><p>From the original post: Basically, I'd like to be able to set up an internal pypi mirror. From our list discussion:</p>
<pre><code>From: pulp-list On Behalf Of Randy Barlow
Sent: Wednesday, May 13, 2015 9:00 AM
To: pulp-list
Subject: Re: [Pulp-list] Sync all packages from PyPi with pulp_python plugin
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
On 05/13/2015 08:21 AM, Ashby, Jason (IMS) wrote:
> I’m looking to set up a pypi mirror with pulp. I’m currently
> using Bandersnatch for this, but it’d be nice to drop it and use
> pulp instead. Per the docs*, I see that you can sync specific
> packages from pypi, e.g.
>
> pulp-admin python repo create --repo-id pypi --feed
> https://pypi.python.org/ --package-names numpy,scipy
>
> but I can’t seem to sync ALL packages. I tried leaving off the
> --package-names option, but a sync downloads 0 packages. Should
> I submit an issue/feature request at
> https://pulp.plan.io/projects/pulp_python/issues?
Hi Jason!
The problem is that PyPI does not have one single manifest file for
the available package versions, but rather one manifest per package
name. Due to this, in order to sync all packages from PyPI it would be
necessary to make around 45-50,000 web requests just to find out what
would need to be downloaded, and then of course we would need to
perform the actual package downloads.
That said, we are working on a plan to have Pulp be able to lazy fetch
packages as they are requested. This plan will take a long time to
implement (so don't expect it in any of our close releases) but I
think it will solve this problem in a performant way.
Another possible solution may be Warehouse[0]. I've been talking to
the PyPA developers about this problem, and they are aware that it
needs to be solved. They may fix it there, in which case we can get
the Python importer to be aware of all the packages.
I have also considered just doing the 50k requests anyway. I suspect
that PyPI won't like if we do that, but it is technically possible as
well.
I say go ahead and file an RFE. I'll think some more about how we
might be able to get it working. Thanks for the note, and I hope you
enjoy the plugin otherwise!
[0] https://warehouse.python.org/
</code></pre> Python Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=282272018-05-15T03:56:15Zamacdona@redhat.comaustin@redhat.com
<ul></ul><p>Many of us have expressed concern about being polite to PyPI and this story. Some notes from pycon:</p>
<ol>
<li>PyPI makes good use of caching</li>
<li>there are a lot of mirrors (bandersnatch) that regularly sync, so they can handle it. In theory, pulp could actually reduce the load on pypi, especially after we work in the lazy feature.</li>
<li>Using the changelog_since_serial will allow us to only download new metadata for projects that have changed</li>
</ol> Python Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=284902018-05-30T14:16:46Zbizhangbizhang@redhat.com
<ul><li><strong>Sprint/Milestone</strong> set to <i>3.0 GA</i></li></ul> Python Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=431442019-04-26T20:40:13Zbmbouterbmbouter@redhat.com
<ul><li><strong>Tags</strong> deleted (<del><i>Pulp 3</i></del>)</li></ul> Python Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=456102019-07-12T13:24:54ZCodeHeeler
<ul><li><strong>Sprint</strong> set to <i>Sprint 56</i></li></ul> Python Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=462082019-08-02T13:14:45Zrchan
<ul><li><strong>Sprint</strong> changed from <i>Sprint 56</i> to <i>Sprint 57</i></li></ul> Python Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=462622019-08-02T13:18:20Zrchan
<ul><li><strong>Sprint</strong> deleted (<del><i>Sprint 57</i></del>)</li></ul> Python Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=562622020-05-12T21:19:54Zdalleydalley@redhat.com
<ul><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li></ul> Python Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=574102020-06-02T04:28:35Zdalleydalley@redhat.com
<ul></ul><p>A little bit of additional context:</p>
<ul>
<li>The XML-RPC APIs mentioned above are considered "deprecated" and not recommened for use, but plenty of people including bandersnatch still use them</li>
<li>If possible, it would be great if we could utilize bandersnatch as a library, but I haven't evaluated this at all</li>
</ul>
<p>Upstream issue to track for JSON replacement APIs for XML-RPC replacement: <a href="https://github.com/pypa/warehouse/issues/284" class="external">https://github.com/pypa/warehouse/issues/284</a></p> Python Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=577652020-06-08T21:11:04Zdalleydalley@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-4 priority-6 priority-default closed child" href="/issues/6930">Refactor #6930</a>: Use Bandersnatch to perform package metadata fetching and filtering</i> added</li></ul> Python Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=626842020-09-23T01:08:51Zgerrod
<ul><li><strong>Status</strong> changed from <i>NEW</i> to <i>MODIFIED</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>Applied in changeset <a class="changeset" title="Pulp now uses Bandersnatch to perform metadata syncing Sync uses Bandersnatch to perform metadat..." href="https://pulp.plan.io/projects/pulp_python/repository/18/revisions/5270947abc578d13c942f5cc64bf27556c212ebc">5270947abc578d13c942f5cc64bf27556c212ebc</a>.</p> Python Support - Story #985: As a user, I can sync all packages from pypi (complete mirror)https://pulp.plan.io/issues/985?journal_id=661962021-01-12T23:37:05Zdalleydalley@redhat.com
<ul><li><strong>Platform Release</strong> set to <i>3.0.0</i></li></ul>