Issue #3358
closedVery occasionally, a task will be processed and have a 'worker_name' of None
Description
Operations: deferred_download
Resources:
State: Successful
Start Time: 2018-02-13T00:51:16Z
Finish Time: 2018-02-13T00:51:17Z
Task Id: b7b031da-8c77-4d32-a661-934d02bdf0d3
Worker Name: None
Operations: deferred_download
Resources:
State: Successful
Start Time: 2018-02-13T00:53:16Z
Finish Time: 2018-02-13T00:53:16Z
Task Id: 1647f306-4933-482e-be84-9cdd53e824ca
Worker Name: reserved_resource_worker-0@pulp2.dev
Operations: deferred_download
Resources:
State: Successful
Start Time: 2018-02-13T00:55:16Z
Finish Time: 2018-02-13T00:55:16Z
Task Id: 174ba037-1e6a-40cb-953f-83f3db768988
Worker Name: reserved_resource_worker-1@pulp2.dev
Operations: deferred_download
Resources:
State: Successful
Start Time: 2018-02-13T00:57:16Z
Finish Time: 2018-02-13T00:57:17Z
Task Id: fc6400ef-913b-4f04-ada9-58a8cb78b4bc
Worker Name: None
[vagrant@pulp2 ~]$ pulp-admin tasks details --task-id fc6400ef-913b-4f04-ada9-58a8cb78b4bc
+----------------------------------------------------------------------+
Task Details
+----------------------------------------------------------------------+
Operations: deferred_download
Resources:
State: Successful
Start Time: 2018-02-13T00:57:16Z
Finish Time: 2018-02-13T00:57:17Z
Result: N/A
Task Id: fc6400ef-913b-4f04-ada9-58a8cb78b4bc
Worker Name: None
Progress Report:
On Demand Download:
Description: Download Cached On-Demand Content
Details:
Error Details:
Items Total: 0
Num Failures: 0
Num Processed: 0
Num Success: 0
State: FINISHED
Step Id: 22b2e07d-c6cc-473f-8dba-12fc6a699dcb
Step Type: on_demand_download
[vagrant@pulp2 ~]$ mongo
> use pulp_database
> db.task_status.find({"task_id": "fc6400ef-913b-4f04-ada9-58a8cb78b4bc"}).pretty()
{
"_id" : ObjectId("5a8237ec9664af7dc48b31e5"),
"task_id" : "fc6400ef-913b-4f04-ada9-58a8cb78b4bc",
"start_time" : "2018-02-13T00:57:16Z",
"worker_name" : null,
"state" : "finished",
"exception" : null,
"task_type" : "pulp.server.controllers.repository.download_deferred",
"tags" : [
"pulp:action:deferred_download"
],
"_ns" : "task_status",
"progress_report" : {
"on_demand_download" : [
{
"num_success" : 0,
"description" : "Download Cached On-Demand Content",
"step_type" : "on_demand_download",
"items_total" : 0,
"state" : "FINISHED",
"error_details" : [ ],
"details" : "",
"num_failures" : 0,
"step_id" : "22b2e07d-c6cc-473f-8dba-12fc6a699dcb",
"num_processed" : 0
}
]
},
"group_id" : null,
"id" : null,
"finish_time" : "2018-02-13T00:57:17Z"
}
>
So far, I've only seen this happen with deferred_download tasks, however it might be a general problem. The scope is unclear at the moment.
First noticed after Pulp had been left idle for 2 days.
Reproduced after leaving Pulp idle for 6 hours with the rate of deferred_download tasks increased from 1 per 30 minutes to 1 per 2 minutes.
They seem to be distributed pretty sparsely, but clustered together. I suspect it could be caused by an improperly handled race condition in this code [0] but don't know for sure yet.
https://github.com/pulp/pulp/blob/master/server/pulp/server/async/tasks.py#L419
Related issues
Updated by dalley almost 7 years ago
After letting it run for an additional 12 hours, I found 3 more instances where this happened, for a total of 5.
One of the tasks out of 5 failed with this error:
[vagrant@pulp2 ~]$ pulp-admin tasks details --task-id afa5d8c7-9bef-4d48-bdff-e60cd9ec56fd
+----------------------------------------------------------------------+
Task Details
+----------------------------------------------------------------------+
Operations: deferred_download
Resources:
State: Failed
Start Time: Unstarted
Finish Time: 2018-02-13T11:43:22Z
Result: N/A
Task Id: afa5d8c7-9bef-4d48-bdff-e60cd9ec56fd
Worker Name: None
Progress Report:
Traceback: Traceback (most recent call last): File
"/usr/lib/python2.7/site-packages/celery/app/trace.py", line
367, in trace_task R = retval = fun(*args, **kwargs)
File
"/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py",
line 521, in __call__ upsert=True) File
"/usr/lib/python2.7/site-packages/mongoengine/queryset/base.py
", line 564, in update_one upsert=upsert, multi=False,
write_concern=write_concern, **update) File
"/home/vagrant/devel/pulp/server/pulp/server/db/querysets.py",
line 100, in update super(CriteriaQuerySet,
self).update(*args, **kwargs) File
"/usr/lib/python2.7/site-packages/mongoengine/queryset/base.py
", line 515, in update raise NotUniqueError(u'Update
failed (%s)' % six.text_type(err)) NotUniqueError: Update
failed (E11000 duplicate key error collection:
pulp_database.task_status index: task_id_-1 dup key: { :
"afa5d8c7-9bef-4d48-bdff-e60cd9ec56fd" })
This is more evidence in favor of the root cause being a race condition being handled improperly.
Updated by dalley almost 7 years ago
- Triaged changed from No to Yes
- Sprint Candidate changed from No to Yes
20 hours and 600 tasks in, so far this has occurred 7 times. So, roughly 1.1%
Updated by dalley almost 7 years ago
After 24 hours / 954 tasks, the current count is 9 tasks without a worker name.
Updated by dalley almost 7 years ago
For the past 24 hours I had turned the number of deferred download tasks back to normal, and instead have been running sync + publish tasks every 2 minutes. There have been no new occurences of this problem thus far.
Updated by amacdona@redhat.com about 6 years ago
- Sprint Candidate changed from Yes to No
Updated by dalley almost 6 years ago
- Related to Issue #4428: Upsert query raises NotUniqueError added
Updated by bmbouter over 5 years ago
- Status changed from NEW to CLOSED - WONTFIX