Project

Profile

Help

Issue #3358

closed

Very occasionally, a task will be processed and have a 'worker_name' of None

Added by dalley almost 7 years ago. Updated over 5 years ago.

Status:
CLOSED - WONTFIX
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

Operations:  deferred_download
Resources:
State:       Successful
Start Time:  2018-02-13T00:51:16Z
Finish Time: 2018-02-13T00:51:17Z
Task Id:     b7b031da-8c77-4d32-a661-934d02bdf0d3
Worker Name: None

Operations:  deferred_download
Resources:
State:       Successful
Start Time:  2018-02-13T00:53:16Z
Finish Time: 2018-02-13T00:53:16Z
Task Id:     1647f306-4933-482e-be84-9cdd53e824ca
Worker Name: reserved_resource_worker-0@pulp2.dev

Operations:  deferred_download
Resources:
State:       Successful
Start Time:  2018-02-13T00:55:16Z
Finish Time: 2018-02-13T00:55:16Z
Task Id:     174ba037-1e6a-40cb-953f-83f3db768988
Worker Name: reserved_resource_worker-1@pulp2.dev

Operations:  deferred_download
Resources:
State:       Successful
Start Time:  2018-02-13T00:57:16Z
Finish Time: 2018-02-13T00:57:17Z
Task Id:     fc6400ef-913b-4f04-ada9-58a8cb78b4bc
Worker Name: None

[vagrant@pulp2 ~]$ pulp-admin tasks details --task-id fc6400ef-913b-4f04-ada9-58a8cb78b4bc
+----------------------------------------------------------------------+
                              Task Details
+----------------------------------------------------------------------+

Operations:       deferred_download
Resources:        
State:            Successful
Start Time:       2018-02-13T00:57:16Z
Finish Time:      2018-02-13T00:57:17Z
Result:           N/A
Task Id:          fc6400ef-913b-4f04-ada9-58a8cb78b4bc
Worker Name:      None
Progress Report:  
  On Demand Download: 
    Description:   Download Cached On-Demand Content
    Details:       
    Error Details: 
    Items Total:   0
    Num Failures:  0
    Num Processed: 0
    Num Success:   0
    State:         FINISHED
    Step Id:       22b2e07d-c6cc-473f-8dba-12fc6a699dcb
    Step Type:     on_demand_download

[vagrant@pulp2 ~]$ mongo
> use pulp_database
> db.task_status.find({"task_id": "fc6400ef-913b-4f04-ada9-58a8cb78b4bc"}).pretty()
{
        "_id" : ObjectId("5a8237ec9664af7dc48b31e5"),
        "task_id" : "fc6400ef-913b-4f04-ada9-58a8cb78b4bc",
        "start_time" : "2018-02-13T00:57:16Z",
        "worker_name" : null,
        "state" : "finished",
        "exception" : null,
        "task_type" : "pulp.server.controllers.repository.download_deferred",
        "tags" : [
                "pulp:action:deferred_download"
        ],
        "_ns" : "task_status",
        "progress_report" : {
                "on_demand_download" : [
                        {
                                "num_success" : 0,
                                "description" : "Download Cached On-Demand Content",
                                "step_type" : "on_demand_download",
                                "items_total" : 0,
                                "state" : "FINISHED",
                                "error_details" : [ ],
                                "details" : "",
                                "num_failures" : 0,
                                "step_id" : "22b2e07d-c6cc-473f-8dba-12fc6a699dcb",
                                "num_processed" : 0
                        }
                ]
        },
        "group_id" : null,
        "id" : null,
        "finish_time" : "2018-02-13T00:57:17Z"
}
>

So far, I've only seen this happen with deferred_download tasks, however it might be a general problem. The scope is unclear at the moment.

First noticed after Pulp had been left idle for 2 days.

Reproduced after leaving Pulp idle for 6 hours with the rate of deferred_download tasks increased from 1 per 30 minutes to 1 per 2 minutes.

They seem to be distributed pretty sparsely, but clustered together. I suspect it could be caused by an improperly handled race condition in this code [0] but don't know for sure yet.

https://github.com/pulp/pulp/blob/master/server/pulp/server/async/tasks.py#L419


Related issues

Related to Pulp - Issue #4428: Upsert query raises NotUniqueErrorCLOSED - CURRENTRELEASEdaviddavisActions
Actions #1

Updated by dalley almost 7 years ago

  • Description updated (diff)
Actions #2

Updated by dalley almost 7 years ago

  • Description updated (diff)
Actions #3

Updated by dalley almost 7 years ago

After letting it run for an additional 12 hours, I found 3 more instances where this happened, for a total of 5.

One of the tasks out of 5 failed with this error:

[vagrant@pulp2 ~]$ pulp-admin tasks details --task-id afa5d8c7-9bef-4d48-bdff-e60cd9ec56fd
+----------------------------------------------------------------------+
                              Task Details
+----------------------------------------------------------------------+

Operations:       deferred_download
Resources:        
State:            Failed
Start Time:       Unstarted
Finish Time:      2018-02-13T11:43:22Z
Result:           N/A
Task Id:          afa5d8c7-9bef-4d48-bdff-e60cd9ec56fd
Worker Name:      None
Progress Report:  
Traceback:        Traceback (most recent call last):   File
                  "/usr/lib/python2.7/site-packages/celery/app/trace.py", line
                  367, in trace_task     R = retval = fun(*args, **kwargs)  
                  File
                  "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py",
                  line 521, in __call__     upsert=True)   File
                  "/usr/lib/python2.7/site-packages/mongoengine/queryset/base.py
                  ", line 564, in update_one     upsert=upsert, multi=False,
                  write_concern=write_concern, **update)   File
                  "/home/vagrant/devel/pulp/server/pulp/server/db/querysets.py",
                  line 100, in update     super(CriteriaQuerySet,
                  self).update(*args, **kwargs)   File
                  "/usr/lib/python2.7/site-packages/mongoengine/queryset/base.py
                  ", line 515, in update     raise NotUniqueError(u'Update
                  failed (%s)' % six.text_type(err)) NotUniqueError: Update
                  failed (E11000 duplicate key error collection:
                  pulp_database.task_status index: task_id_-1 dup key: { :
                  "afa5d8c7-9bef-4d48-bdff-e60cd9ec56fd" }) 

This is more evidence in favor of the root cause being a race condition being handled improperly.

Actions #4

Updated by dalley almost 7 years ago

  • Triaged changed from No to Yes
  • Sprint Candidate changed from No to Yes

20 hours and 600 tasks in, so far this has occurred 7 times. So, roughly 1.1%

Actions #5

Updated by dalley almost 7 years ago

After 24 hours / 954 tasks, the current count is 9 tasks without a worker name.

Actions #6

Updated by dalley almost 7 years ago

For the past 24 hours I had turned the number of deferred download tasks back to normal, and instead have been running sync + publish tasks every 2 minutes. There have been no new occurences of this problem thus far.

Actions #7

Updated by amacdona@redhat.com over 6 years ago

  • Sprint Candidate changed from Yes to No
Actions #8

Updated by dalley almost 6 years ago

  • Related to Issue #4428: Upsert query raises NotUniqueError added
Actions #9

Updated by bmbouter over 5 years ago

  • Status changed from NEW to CLOSED - WONTFIX
Actions #10

Updated by bmbouter over 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF