Issue #1448: Generating repo content applicability doesn't return call report - Pulp

Actions

Send by e-mail Copy link

Issue #1448

closed

Generating repo content applicability doesn't return call report

Added by Ichimonji10 over 8 years ago. Updated almost 4 years ago.

Status:

CLOSED - CURRENTRELEASE

Priority:

Normal

Assignee:

dkliban@redhat.com

Category:

Sprint/Milestone:

Start date:

Due date:

Estimated time:

Severity:

2. Medium

Version:

Master

Platform Release:

2.8.0

OS:

Triaged:

Yes

Groomed:

Sprint Candidate:

Tags:

Pulp 2

Sprint:

Quarter:

Description

When Pulp is asked to generate content applicability for repositories, it returns an HTTP 202 (accepted) response. This response should be a [call report](http://pulp.readthedocs.org/en/latest/dev-guide/conventions/sync-v-async.html#call-report), but it's not.

Here's an example of an API call that produces the error:

POST https://pulp.example.com/pulp/api/v2/repositories/actions/content/regenerate_applicability/
{'repo_criteria': {}}

Here's an example of a (HTTP 202) response body:

{
    '_href': '/pulp/api/v2/task_groups/568f4f73-b487-4944-b7a5-36bf00159e3c/',
    'group_id': '568f4f73-b487-4944-b7a5-36bf00159e3c'
}

Related issues

Actions

Copy link

Updated by Ichimonji10 over 8 years ago

This issue does not apply to Pulp 2.7.

Actions

Copy link

Updated by mhrivnak over 8 years ago

Status changed from NEW to ASSIGNED
Assignee set to dkliban@redhat.com
Platform Release set to 2.8.0
Triaged changed from No to Yes

Dennis, is it possible and reasonable to have the API continue to return a call report as before, and make the new behavior opt-in?

Actions

Copy link

Updated by dkliban@redhat.com over 8 years ago

I don't think it's reasonable to return a call report. A call report contains a list of spawned tasks and it does not contain a group id. Without the group id a user would need to check each task to see if it is done.

How would a user opt in to the new design? By providing an extra argument when making the API call?

Actions

Copy link

Updated by bmbouter over 8 years ago

How many spawned tasks would the call report contain?

Would it be possible to add group_id into the call report?

From a backwards compatibility standpoint this changes the HTTP method type and structure of the response.

Actions

Copy link

Updated by Ichimonji10 over 8 years ago

Without the group id a user would need to check each task to see if it is done.

What in the world is a group ID? It took me a while to figure out how to make my API handler poll task reports. The documentation doesn't mention task groups, except to state that "This API has been removed in 2.4, as there is no longer any concept of Task Groups."

Actions

Copy link

Updated by dkliban@redhat.com over 8 years ago

We are currently dispatching 10 profiles per task. Based on what I've seen in production environments this could be upwards of 1000 tasks. We could add the group id to the call report.

The documentation for 2.8 is not published yet. Here[0-1] are the places where the new behavior is described.

[0] https://github.com/pulp/pulp/blob/master/docs/dev-guide/integration/rest-api/consumer/applicability.rst
[1] https://github.com/pulp/pulp/blob/master/docs/dev-guide/integration/rest-api/tasks.rst

Actions

Copy link

Updated by Ichimonji10 over 8 years ago

The documentation for 2.8 is not published yet. Here[0-1] are the places where the new behavior is described.

Is this the new format for HTTP 202 responses across the board for Pulp 2.8?

Actions

Copy link

Updated by dkliban@redhat.com over 8 years ago

This is the only API call in 2.8 that has a different response body for 202. This is mentioned in the release notes[0].

[0] https://github.com/pulp/pulp/blob/master/docs/user-guide/release-notes/master.rst#rest-api-changes

Actions

Copy link

Updated by Ichimonji10 over 8 years ago

Ahaaa. Thanks for the info. I've updated the test suite accordingly. See this commit

I do dislike having two different kinds of response formats. It means that I, as an API user, have to design two different sets of code for dealing with long running tasks. It also means that I have to use out-of-band knowledge to create a special case in my code base for dealing with this one path. Basically, it's a PITA, even if you've read the release notes.

Actions

Copy link

#10

Updated by Ichimonji10 over 8 years ago

Status changed from ASSIGNED to CLOSED - NOTABUG

Actions

Copy link

#11

Updated by rbarlow over 8 years ago

I am a little concerned, as it sounds like we are not maintaining backwards compatibility from 2.7 to 2.8. Is my understanding correct? If so, I would make a case that we should make the new functionality be either a new REST endpoint, or make it opt-in through a user flag so that integrators from 2.7 don't have to do anything to upgrade to 2.8 unless they want to take advantage of the new awesomeness.

Actions

Copy link

#12

Updated by dkliban@redhat.com over 8 years ago

I have no problem with creating a new endpoint, but I am concerned that it may be too late for this in the release cycle.

Actions

Copy link

#13

Updated by Ichimonji10 over 8 years ago

How about HTTP link headers? We can keep backward compatibility and add group task polling to as many endpoints as we want.

Actions

Copy link

#14

Updated by Ichimonji10 over 8 years ago

The biggest danger in using HTTP link headers is that we might do a bad job of polluting Pulp's link registry.

Actions

Copy link

#15

Updated by rbarlow over 8 years ago

Status changed from CLOSED - NOTABUG to NEW

I don't think we should consider this issue closed. I raised concerns about this endpoint being backwards incompatible and was assured that it was done in a new endpoint, rather than removing support for the old endpoint. I think it's important that we stick to semantic versioning, especially since we document that we follow it. There are community users who do use this endpoint, as I know that I have personally guided two of them through writing code to interact with it.

Actions

Copy link

#16

Updated by Ichimonji10 over 8 years ago

+1 for ensuring backward compatibility in the 2.8 release.

By the way, if "HTTP link headers" or "link registry" is greek to anyone, check out sections 5.3 (When and How to Use Link Headers) and 5.4 (How to Assign Link Relation Types) from RESTful Web Services Cookbook

Actions

Copy link

#17

Updated by dkliban@redhat.com over 8 years ago

The change in behavior was needed for 2 reasons: 1) Parallelize the work and 2) prevent a possible mongo cursor time out.

The solution for 1 required spawning many tasks to handle the calculations in parallel. The number of tasks that are generated depends on the number of unique consumer profiles that are bound to a particular repository. When managing 10000 unique systems, 1000 tasks are spawned. If we return 1000 tasks in a call report, the client will then try to poll Pulp for 1000 tasks statuses. Depending on the polling interval, this would require Pulp to respond to thousands more extra requests. That is a huge overhead for Pulp. As a result it is necessary to return a resource that summarizes the status of all the tasks.

The solution for 1 also provides the solution for 2. Since each task has to only load profiles from the database once per task, we don't have to worry about a consequent call to the database getting a cursor timeout error.

Actions

Copy link

#18

Updated by Ichimonji10 over 8 years ago

I certainly understand that polling 1,000+ tasks can be extremely slow and fragile. With that in mind, a good question to ask is "is poor performance a good reason to ignore semver and break compatibility?" Either answer will produce angry users. We'll either end up with users who are frustrated with poor performance when polling 1,000+ tasks, or we'll end up with users who are frustrated that they have to add hacky fixes into their API clients to deal with breaking changes on a minor release. I'd opine that we should pick a solution that obeys semver and that can be applied to this entire category of issues.

Actions

Copy link

#19

Updated by bmbouter over 8 years ago

One option to consider is that the total count of group tasks and the completed count of group tasks could be added to an existing call report. That would be an additive change which would also provide the usability enhancement.

Actions

Copy link

#20

Updated by dkliban@redhat.com over 8 years ago

Would we still include the list of all the spawned tasks though?

Actions

Copy link

#21

Updated by Ichimonji10 over 8 years ago

Would we still include the list of all the spawned tasks though?

Yes. I don't see any other way of maintaining backward compatibility.

Actions

Copy link

#22

Updated by Ichimonji10 over 8 years ago

Would we still include the list of all the spawned tasks though?

Yes. I don't see any other way of maintaining backward compatibility.

I guess you could do something clever like make the list of spawned tasks include only one task, and have that one task reflect the state of the entire "regenerate content applicability" operation. I'm not sure if that would cause bugs of its own, though.

Actions

Copy link

#23

Updated by rbarlow over 8 years ago

I think we most likely need the API to work exactly like it did before and provide a way for the user to explicitly request the new behavior, rather than trying to make an additive change to the old behavior. This could either be a new API endpoint, or by having the user provide an optional parameter with their request that allows them to opt in. My preference is for the latter.

Actions

Copy link

#24

Updated by bmbouter over 8 years ago

+1 to an optional GET parameter to opt-in to the new functionality

Actions

Copy link

#25

Updated by rbarlow over 8 years ago

bmbouter wrote:

+1 to an optional GET parameter to opt-in to the new functionality

I think it's actually a POST, but thanks for the +1 ☺

Actions

Copy link

#26

Updated by dkliban@redhat.com about 8 years ago

Status changed from NEW to ASSIGNED

Actions

Copy link

#27

Updated by ipanova@redhat.com about 8 years ago

Status changed from ASSIGNED to POST

https://github.com/pulp/pulp/pull/2367

Actions

Copy link

#28

Updated by semyers about 8 years ago

Blocks Issue #1552: applicable errata regeneration tasks not getting spawned added

Actions

Copy link

#29

Updated by semyers about 8 years ago

Blocks deleted (Issue #1552: applicable errata regeneration tasks not getting spawned)

Actions

Copy link

#30

Updated by semyers about 8 years ago

Related to Issue #1552: applicable errata regeneration tasks not getting spawned added

Added by dkliban@redhat.com about 8 years ago

Revision ac2c17a7 | View on GitHub

Adds back the API for generating applicability for a repo

This adds a parameter called 'parallel' to the API call for regenerating applicability for a repository. When specified and True, the work is done in parallel. When not specified or False, the work is done as one long running task.

https://pulp.plan.io/issues/1448 closes #1448