Project

Profile

Help

Task #2168

Task #1864: Plan a new plugin API

Plan a download API to replace Nectar

Added by jcline@redhat.com about 3 years ago. Updated 6 months ago.

Status:
MODIFIED
Priority:
Normal
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Platform Release:
Blocks Release:
Backwards Incompatible:
No
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 21

Description

Nectar is a wrapper around requests which attempts to provide an asynchronous API while masking the true requests API. It's aim is to make downloading easier, but causes a great deal of problems in both ease-of-use and error handling/reporting. Furthermore, it offers very few useful features to Pulp developers since it attempts to be general.

Below I include a sketchy plan for a download API. This needs to be fleshed out and a few choices need to get made, which is what this task is about. It then needs to be implemented as part of a plugin API.

Feature Set

  • An easy-to-use synchronous API that can handle HTTP/HTTPS and provides features like connection pooling, granular TLS configuration, etc.
  • An easy-to-use asynchronous API that can handle HTTP/HTTPS and provides features like connection pooling, granular TLS configuration, a callback system, etc.
  • Automatic handling of content validation (checksums, size, whatever)
  • Automatic handling of storage to an appropriate backend (local storage, some object store, a temporary file, etc).
  • Automatic creation of unit files that track where it is stored, how to validate it (in case of bitrot), all the places it can come from with optional priority weighting (this replaces both the lazy catalog and the alternate content source catalog) and network authentication information, etc.
  • Automatic progress reporting
  • Optionally creating the content units (maybe even associating them with a provided repository?)
  • Shared connection pooling across all repositories and plugins
  • Optional global concurrent connection configuration

Synchronous API

I think that this should simply be requests, with some wrapping code to handle where it's stored, post-download validation, and model creation. We should expose the session API as-is to the user.

Asynchronous API

This API is the API that requires some research and choices. There are several asynchronous HTTP clients that I am aware of, and there may be others.

  1. grequests
  2. requests-futures (Python 3 only)
  3. Twisted's web client

grequests

grequests (https://github.com/kennethreitz/grequests) is a replica of the requests API powered by gevent. It is very simple and does not provide any sort of callback system to handle certain events. It is not actively developed.

requests-futures

This is a small add-on to the requests library that uses Python 3.3's concurrent.futures (https://docs.python.org/3.5/library/concurrent.futures.html) or the backport for Python 2.6+. It provides one additional kwarg, ``background_callback`` which lets you work with the Response objects requests generates, which could do things like write the the data to disk.

Twisted

Twisted's web client provides a different API than the requests-based options. Instead, it uses the standard Twisted Deferreds and Failures (https://twistedmatrix.com/documents/current/core/howto/defer.html). Streaming is built-in and is handled by Twisted's Producers and Consumers (https://twistedmatrix.com/documents/current/core/howto/producers.html).

Twisted supports

  • Connection pools (with configurable size and timeouts)
  • Automatic retries (limited to one retry according to the documentation)
  • Automatic redirect handling
  • HTTP proxy support

Conclusion

Of the three, Twisted is probably the most robust. It provides a well-documented callback system (which is already used in the Pulp streamer) and it looks like it offers all the features we need, although the configuration of timeouts and retries looks light currently (they, of course, accept pull requests!).

Twisted's callback chaining system also offers a handy way for users of the asynchronous download API to hook additional functionality into the download process.

One downside to Twisted is that it requires a great deal of configuration to match the feature set requests offers by default. However, this only needs to be done once and is, in the grand scheme of things, not terribly complicated. Another downside is that the users need to understand Twisted's callback system to add or modify download behavior, although this is probably not a common usecase.

Associated revisions

Revision bd83e3f6 View on GitHub
Added by jortel@redhat.com over 2 years ago

Add download and changeset.
closes #2168

Revision bd83e3f6 View on GitHub
Added by jortel@redhat.com over 2 years ago

Add download and changeset.
closes #2168

Revision bd83e3f6 View on GitHub
Added by jortel@redhat.com over 2 years ago

Add download and changeset.
closes #2168

History

#1 Updated by jcline@redhat.com about 3 years ago

  • Description updated (diff)

#2 Updated by jcline@redhat.com about 3 years ago

  • Description updated (diff)

#3 Updated by jcline@redhat.com about 3 years ago

  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes

#4 Updated by jortel@redhat.com about 3 years ago

  • Sprint/Milestone set to 26
  • Tags Pulp 3 added

#5 Updated by jcline@redhat.com about 3 years ago

  • Description updated (diff)

#6 Updated by jcline@redhat.com about 3 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to jcline@redhat.com

#7 Updated by jcline@redhat.com about 3 years ago

After quite a bit of playing with Twisted, I think we might be better off not going in that direction. The Twisted API, while nice, is not necessarily easy to dive right into. Also, while there is support for redirects and proxies, it seems that (right now) you can't have both at the same time. I think we would be better off having a unified requests-style API.

#8 Updated by bmbouter about 3 years ago

+1 to comment 7

#9 Updated by jcline@redhat.com about 3 years ago

https://github.com/pulp/pulp/pull/2757 has a rough outline of what I think the API could look like.

#10 Updated by ipanova@redhat.com about 3 years ago

  • Status changed from ASSIGNED to NEW
  • Assignee deleted (jcline@redhat.com)

#11 Updated by jortel@redhat.com about 3 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to jortel@redhat.com

#12 Updated by mhrivnak about 3 years ago

  • Sprint/Milestone changed from 26 to 27

#13 Updated by jortel@redhat.com almost 3 years ago

  • Sprint/Milestone changed from 27 to 28

#14 Updated by mhrivnak almost 3 years ago

  • Sprint/Milestone changed from 28 to 29

#15 Updated by mhrivnak almost 3 years ago

  • Sprint/Milestone changed from 29 to 30

#16 Updated by mhrivnak over 2 years ago

  • Sprint/Milestone changed from 30 to 36

#17 Updated by mhrivnak over 2 years ago

  • Sprint/Milestone changed from 36 to 37

#18 Updated by jortel@redhat.com over 2 years ago

  • Sprint/Milestone changed from 37 to 38

#19 Updated by mhrivnak over 2 years ago

  • Sprint/Milestone changed from 38 to 39

#20 Updated by mhrivnak over 2 years ago

  • Sprint/Milestone changed from 39 to 40

#21 Updated by jortel@redhat.com over 2 years ago

  • Status changed from ASSIGNED to MODIFIED
  • % Done changed from 0 to 100

#22 Updated by bmbouter over 1 year ago

  • Sprint set to Sprint 21

#23 Updated by bmbouter over 1 year ago

  • Sprint/Milestone deleted (40)

#24 Updated by daviddavis 6 months ago

  • Sprint/Milestone set to 3.0

#25 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3)

Please register to edit this issue

Also available in: Atom PDF