Project

Profile

Help

Task #2917

Add keyword arguments to the Plugin API and switch some attributes to properties

Added by bmbouter about 2 years ago. Updated 6 months ago.

Status:
MODIFIED
Priority:
Normal
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

0%

Platform Release:
Blocks Release:
Backwards Incompatible:
No
Groomed:
Yes
Sprint Candidate:
No
Tags:
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 25

Description

The following kwargs should be added to the plugin API constructors:

pulpcore.plugin.changeset.main.ChangeSet(self, importer, additions=(), removals=(), deferred=False):  # adds 'deferred'

pulpcore.plugin.changeset.model.RemoteContent(self, model, artifacts=None):  # adds 'artifacts'

pulpcore.plugin.changeset.model.RemoteArtifact(self, model, download, content=None):  # adds 'content'

pulpcore.plugin.changeset.report.ChangeReport(self, action, content, error=None):  # adds 'error'

pulpcore.plugin.download.HttpDownload(self, url, writer, method='GET', timeout=None, user=None, ssl=None, proxy_url=None, headers=None):  # adds 'timeout', 'user', 'ssl', 'proxy_url', and 'headers'

pulpcore.plugin.download.FtpDownload(self, url, writer, user=None):   # adds 'user'.

pulpcore.plugin.download.Batch.__init__(self, downloads, concurrent=CONCURRENT, backlog=BACKLOG, context=None):  # adds 'context'


Checklist


Related issues

Duplicated by Pulp - Task #2918: Add keyword arguments for all options to all objects in pulpcore.plugin.download CLOSED - DUPLICATE Actions

History

#1 Updated by jortel@redhat.com about 2 years ago

I think it would be useful to list specific classes in pulpcore.plugin.changeset. Although, I thought the discussion on the PR was regarding pulpcode.download.http.HttpDownload.

In general, most classes in the standard library don't take large numbers arguments to .__init__() and few have many optional arguments. So not sure how much can be inferred from the standard libs as typical in this matter. Also the 'requests' and 'celery' libraries are popular but too small a sample to define what is typical.

I did some digging and hoped to find this addressed in some best practices PEP but did not find anything. Best I can do is describe my philosophy on this. At minimum, .__init__() must take arguments to fully initialize the object. The object must be functional and useful. The arguments need to satisfy this criteria are required. Optional arguments pertain to attributes used to configure the object or provide supplemental data. There are a few things to consider when including optional arguments in the .__init__() signature.

  • How likely is the argument likely to be known when the object is constructed?
  • How many optional arguments are we talking about?

In general the .__init__() signature should be kept as simple as possible yet provide convenience. There is a cost associated with each argument added:

  • The signature becomes noisier and harder to understand.
  • The code required to implement .__init__() increases and contains more boiler-plate code that adds no value.
  • The code required to override .__init__() in a subclass increases and contains more boiler-plate code that adds no value and must be maintained.

The following example of a Shape class with lots of optional values.

class Shape:

    def __init__(x, y, border=None, shadow=None, background=0x00, foreground=0xFFFF, tilt=0, angle=0, transparency=0,
                 fade=0, motion=None):
        self.x = x
        self.y = y
        self.border = border
        self.shadow = shadow
        self.background = background
        self.foreground = foreground
        self.tilt = tilt
        self.angle = angle
        self.transparency = transparency
        self.fade = fade
        self.motion = motion

    def draw():
        ...

Now let's subclass Shape <- Triangle and override .__init__() to add a required attribute equilateral.
This requires a lot of extra boiler-plate-like code that needs to be kept in sync with changes in Shape.__init__(). This is my primary objection to this practice.

class Triangle(Shape):

    def __init__(x, y, equilateral=True, border=None, shadow=None, background=0x00, foreground=0xFFFF, tilt=0, angle=0,
                 transparency=0, fade=0, motion=None):
        super(Triangle, self).__init__(
            x,
            y,
            border=border,
            shadow=shadow,
            background=background,
            foreground=foreground,
            tilt=tilt,
            angle=angle,
            transparency=transparency,
            fade=fade,
            motion=motion)
        self.equilateral = equilateral

Having all of the optional arguments in .__init__() does support:

triangle = Triangle(
    x=10,
    y=20,
    equilateral=False,
    border=(0, 0, 0, 0),
    shadow=True,
    background=0x0AFA,
    foreground=0xFAFA)

which does not seem any easier or readable (IMHO, less readable) than:

triangle = Triangle(10, 10, False)
triangle.border = (0, 0, 0, 0)
triangle.shadow = True
triangle.background = 0x0AFA
triangle.foreground=0xFAFA

Regardless of whether other projects include all possible optional arguments to .__init__(), I don't think Pulp should do this as a blanket practice. Optional arguments to .__init__() make sense in some cases but not always. Let's do what makes sense for each class.

#2 Updated by bmbouter about 2 years ago

I read comment 1, and the points are well made. The think the main concerns are readability and boiler plate code in subclassing. I also want readability and less boiler plate code. I come back to these concerns at the end of this post.

Additionally though, I want a design that adheres to common Python practices so that a user who is familiar with the Python Standard Library will feel right at home with Pulp's plugin API. I wanted to make sure I'm on solid ground that the Python standard library uses kwargs as the primary means of configuring instantiated objects, so I checked my understanding of this topic in #python on freenode. Here is a snippet from our convo with two immediate replies.

07:12:07 bmbouter | so I'm having a python debate w/ a co-worker about a python API we are making                                                          
07:12:31 bmbouter | for objects our users will instantiate I think all data should be passed in as kwargs                                                  
07:13:03 bmbouter | versus instantiating the object with nearly no kwargs and then binding all of the "options" as attributes on the instantiated object   
07:13:27 bmbouter | my claim is that almost nothing in python's standard library or it's major userspace libraries, e.g. requests work like that           
07:14:02   nedbat | bmbouter: if you have named arguments in the __init__, you can document them in the docstring nicely.                                  
07:14:37    Yhg1s | bmbouter: your claim is correct, yes.

The majority of the standard library works this way so we can find lots of examples. Here is one:

Consider a timedelta object which has an __init__ signature like:

timedelta.__init__(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)

If Python worked the way that the Pulp plugin API currently works, it's signature could look like: timedelta.__init__() and you would use it like:

my_timedelta = timedelta()
my_timedelta.days = 3
my_timedelta.seconds = 0
my_timedelta.minutes = 34
my_timedelta.hours = 6
my_timedelta.weeks = 3

So ^ is my reasoning that Python uses kwargs and the plugin API should too.

Regarding the boilerplate code problem for subclassing using *args and **kwargs. Say we subclass the Shape to make a Triangle object and that object only cares about x, y, or the optional argument equilateral.

class Triangle(object):
    def __init__(self, x, y, *args, equilateral=True, **kwargs):
        # Do something specific w/ kwargs that this subclass cares about
        super(x, y, *args, equilateral=equilateral, **kwargs)

Regarding the readability concern with kwargs, the docstring, not the signature itself, is where most reading will occur. With Napoleon and Sphinx reading the docstring for kwargs either rendered or un-rendered should be very readable.

#3 Updated by pcreech about 2 years ago

I do not think a hard and fast 'always do this' rule will apply in this case. Both options have their valid use cases, and have their place in object oriented programming.

The main beneifit of constructor parameters are for parameters that will not change over the course of the life of the object. A perfect example is the timedelta example provided earlier. timedelta is supposed to be an immutable object, therefore it's attributes are provided at constructor time. Timedelta properties are read only.

Attributes are usefull for attributes that are decided upon or modified after instantiation, or are not needed upon immediate construction of the object. If it's not expicilty needed in the init function, we don't neccessarily need a constructor argument to instantiate it. This could artificially inflate the length of our constructor to something that becomes unamangeable. It's a common concept that if you have too many constructor arguments, you should've wrapped it up into another object and passed that object into the constructor instead.

I guess my end point is, we should judicially allow and apply both with adherance to current software development best practices, instead of creating a hard and fast rule of 'always x' or 'always y'. Each will make sense when it is used correctly.

#4 Updated by mhrivnak about 2 years ago

All the points made here are generally reasonable. I also think this is circumstantial, and we don't need a policy on it.

I agree that the timedelta's init arguments are an example that falls under the category Jeff described as required for the object to be "functional and useful." It just doesn't make sense to have a timedelta object without those attributes set. I think we all agree on that.

I also see diversity in the ecosystem on this question. For additional examples of objects with public attributes that aren't settable in init, we need only look as far as django. The Request and Response objects provide plenty of examples, as do many other familiar objects, based on a quick survey.

As another interesting data point, I looked at the extensive quickstart guide to "requests" to see what behavior they promote.

http://docs.python-requests.org/en/master/user/quickstart/

Interestingly, on that very long page, there is only one place where they demonstrate instantiating a class, and it's an io.BytesIO() from the standard library. They do not show a single example of instantiating their own classes. The interaction happens entirely through functions.

Moving on to their "advanced" docs:

http://docs.python-requests.org/en/master/user/advanced/

This page has several examples of instantiating a class with no args, and then setting attributes. The second code block on the page reads like this:

s = requests.Session()
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})

# both 'x-test' and 'x-test2' are sent
s.get('http://httpbin.org/headers', headers={'x-test2': 'true'})

If we go look at that class, the init method takes no arguments at all.

https://github.com/requests/requests/blob/v2.18.1/requests/sessions.py#L334

I appreciate and share the desire to provide a familiar and low-friction API. I don't see that the pattern Jeff has put forward is particularly uncommon, nor that it would necessarily make the plugin API less familiar to developers.

#5 Updated by bmbouter about 2 years ago

I want to reiterate that most of the standard library uses kwargs. The discussion in #python confirmed that assertion. While it's not "uncommon", why can't we also have kwargs, especially given that:

1) It's the most common standard library method for instantiating objects
2) I have expressed this concern pre-merge and if I had known this would have been an issue I would have blocked the merge

#6 Updated by mhrivnak about 2 years ago

I really don't feel too strongly either way. Brian, I had the same initial reaction you did, but Jeff talked me into the merits of the other approach, and I'm willing to give it a try.

For what it's worth, looking through the standard library and thinking about how I usually interact with it, it struck me that far and away the most common way to instantiate a class with the standard library is to use a factory function. I'm actually surprised by how pervasive that is, never having thought much specifically about how "classy" the standard library is. I wonder if this is because the standard library is trying to be flexible, and allow developers to use it either in an object-oriented way or in a more functional way.

You can certainly find a few parts of the standard library that include classes you can instantiate. But it's not clear to me that the standard library should be used as an example of object oriented best practices.

I do find it interesting to go back to basics and read a bit about constructors as a concept within object oriented programming. Maybe others would find this interesting too, so here's a starting point: https://en.wikipedia.org/wiki/Constructor_(object-oriented_programming)

#7 Updated by daviddavis about 2 years ago

FWIW, I kind of lean toward the kwargs approach. It seems more readable and allows for better documentation--especially in this particular case where plugin authors will have to read our code.

I do agree though that it should not maybe be a hard-and-fast rule but rather something we prefer in most cases.

#8 Updated by dkliban@redhat.com about 2 years ago

I agree that the the Python standard library should not be used as a reference for object oriented programming in general, however, it is a good reference for accepted Python practices.

I also think that as a consumer of the plugin API, I prefer to create objects with one call to the constructor. I don't want to be forced to bind data post creation.

#9 Updated by jortel@redhat.com about 2 years ago

I appreciate the effort to reduce the boiler-plate code here.

class Triangle(object):
    def __init__(self, x, y, *args, equilateral=True, **kwargs):
        # Do something specific w/ kwargs that this subclass cares about
        super(x, y, *args, equilateral=equilateral, **kwargs)

But, I'd rather we didn't use *args, **kwargs in signatures as suggested. Although, the docstring can provide guidance, any signature with *args and **kwargs creates a less explicit interface. I think it would be better to be explicit and cut-n-paste all the optional arguments from the superclass and include them here.

#10 Updated by jortel@redhat.com about 2 years ago

After good discussion, Brian and I are not likely to reach agreement on the broader topic of keyword arguments in .__ini__(). However, given the lack of clear consensus (here) and that Brian feels strongly about this, I suggest that the most productive path forward is to update/rewrite this issue with specific classes and keyword arguments to be added so that it can be groomed and added to the sprint. Let's also consolidate this and https://pulp.plan.io/issues/2918.

#11 Updated by bmbouter about 2 years ago

  • Duplicated by Task #2918: Add keyword arguments for all options to all objects in pulpcore.plugin.download added

#12 Updated by bmbouter about 2 years ago

  • Subject changed from Add keyword arguments for all options to all objects in pulpcore.plugin.changeset to Add keyword arguments to the Plugin API
  • Description updated (diff)

Rewriting to make specific recommendations:

#13 Updated by jortel@redhat.com about 2 years ago

@bmbouter, thanks for adding the specifics.

bmbouter wrote:

pulpcore.plugin.changeset.model.RemoteContent(self, model, artifacts=set()): # adds 'artifacts'

The set of artifacts must be unique for each instance. This default would result in all instances sharing the same global set.

pulpcore.plugin.changeset.model.RemoteArtifact(self, model, download, content=None, path=None): # adds 'content' and 'path'

The 'content' reverse relationship is stitched-up by the ContentIterator. The plugin writer specifies this relationship by adding the artifact to Content.artifacts. Providing another way for the plugin writer to do this will be confusing and could cause anomalies when they don't agree.

The RemoteArtifact.path is set based on where the downloader has (optionally) written the file. A not-none value indicates that a file was actually downloaded. Allowing the plugin writer to initialize this will very likely cause problems.

pulpcore.plugin.download.Batch(self, downloads, concurrent=CONCURRENT, backlog=BACKLOG, iterator=None, context=None, feeder=None, is_shutdown=False): # adds 'iterator', 'context', 'feeder', and 'batch'

The 'iterator', 'feeder' and 'is_shutdown' should not be initialized by the plugin writer.

#14 Updated by bmbouter about 2 years ago

  • Description updated (diff)

I adjusted the artifacts to None on RemoteContent so that not all instances will share the set().

Any attribute that shouldn't be touched by the plugin writer because they'll break something should become either a property that they can only read or a private attribute. How should we track the privatizing of those attributes you mentioned, in this ticket or a different one?

#15 Updated by jortel@redhat.com about 2 years ago

bmbouter wrote:

Any attribute that shouldn't be touched by the plugin writer because they'll break something should become either a property that they can only read or a private attribute. How should we track the privatizing of those attributes you mentioned, in this ticket or a different one?

Really just a few to make protected with leading "_" and add public properties for.

  • Remote.settled
  • Batch.is_shutdown
  • RemoteArtifact.path

I don't think we need another ticket.

The other attributes I described as "should not be touched" by plugin writers or otherwise requested they be removed from the list of changes may be left public but should not be included in the .__init__() signature because there isn't a use case for them to be initialized by a plugin writer.

#16 Updated by bmbouter about 2 years ago

  • Subject changed from Add keyword arguments to the Plugin API to Add keyword arguments to the Plugin API and switch some attributes to properties
  • Description updated (diff)

I added some checklist items to update the properties.

I updated the description to remove the kwargs that are becoming protected and exposed via a property.

Can you look at this once again and comment or groom?

#17 Updated by jortel@redhat.com about 2 years ago

Please remove the iterator and feeder from batch.__init__() in the description. The plugin writer has no reason to initialize those attributes.

#18 Updated by bmbouter about 2 years ago

@jortel can you please edit with whatever changes are necessary. I think it will reduce the back and forth.

#19 Updated by jortel@redhat.com about 2 years ago

  • Description updated (diff)
  • Groomed changed from No to Yes

@bmbouter, done.

One question though, I'm not sure what the last check list item means. "update docstrings so that init arts only contain all kwargs and nothing extra". Can you clarify?

#20 Updated by jortel@redhat.com about 2 years ago

  • Description updated (diff)

#21 Updated by bmbouter about 2 years ago

I just mean to have these changes include updates to the napoleon docstrings. I rewrote the checklist item to clarify.

@jortel, all the changes look good, but shouldn't FTPDownload still get user just like HTTPDownload?

#22 Updated by jortel@redhat.com about 2 years ago

  • Description updated (diff)

bmbouter wrote:

I just mean to have these changes include updates to the napoleon docstrings. I rewrote the checklist item to clarify.

@jortel, all the changes look good, but shouldn't FTPDownload still get user just like HTTPDownload?

bmbouter wrote:

I just mean to have these changes include updates to the napoleon docstrings. I rewrote the checklist item to clarify.

Thanks.

@jortel, all the changes look good, but shouldn't FTPDownload still get user just like HTTPDownload?

Yes, but the change you listed was to add 'user' to the FileDownload. The FtpDownload is just a first-pass so not yet included in the plugin API. But for completeness, let's update the init() here anyway.

#23 Updated by bmbouter about 2 years ago

@jortel, this looks perfect, thanks.

#24 Updated by mhrivnak about 2 years ago

  • Sprint/Milestone set to 43

#25 Updated by jortel@redhat.com about 2 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to jortel@redhat.com

#26 Updated by jortel@redhat.com about 2 years ago

  • Status changed from ASSIGNED to POST

#27 Updated by jortel@redhat.com about 2 years ago

  • Sprint/Milestone changed from 43 to 44

#28 Updated by jortel@redhat.com about 2 years ago

  • Status changed from POST to MODIFIED

#29 Updated by bmbouter almost 2 years ago

  • Tags deleted (Pulp 3 Plugin Writer Alpha)

Cleaning up Redmine tags

#30 Updated by bmbouter over 1 year ago

  • Sprint set to Sprint 25

#31 Updated by bmbouter over 1 year ago

  • Sprint/Milestone deleted (44)

#32 Updated by daviddavis 6 months ago

  • Sprint/Milestone set to 3.0

#33 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3)

Please register to edit this issue

Also available in: Atom PDF