Actions
Issue #1838
closedTasks being stuck
Status:
CLOSED - NOTABUG
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
2.7.1
Platform Release:
OS:
RHEL 7
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:
Description
I now have 3 tasks that are stuck in "Waiting".
We have 2 hosts that run as an HA cluster, with corosync as the heartbeat. Celery runs on both, so both should process tasks. The resource manager runs only on one, and gets moved to the other if corosync determines the primary is dead.
Here is some debug output:
2016-04-12 09:22:45,763 - DEBUG - sending GET request to /pulp/api/v2/tasks/622041ac-e9e4-4a15-bd7c-7c98a17782e0/
2016-04-12 09:22:46,023 - INFO - GET request to /pulp/api/v2/tasks/622041ac-e9e4-4a15-bd7c-7c98a17782e0/ with parameters None
2016-04-12 09:22:46,023 - INFO - Response status : 200
2016-04-12 09:22:46,023 - INFO - Response body :
{
"exception": null,
"task_type": "pulp.server.managers.repo.publish.publish",
"_href": "/pulp/api/v2/tasks/622041ac-e9e4-4a15-bd7c-7c98a17782e0/",
"task_id": "622041ac-e9e4-4a15-bd7c-7c98a17782e0",
"tags": [
"pulp:repository:thirdparty-snapshot-rpm-latest",
"pulp:action:publish"
],
"finish_time": null,
"_ns": "task_status",
"start_time": null,
"traceback": null,
"spawned_tasks": [],
"progress_report": {},
"queue": "None.dq",
"state": "waiting",
"worker_name": null,
"result": null,
"error": null,
"_id": {
"$oid": "5705bd46cbdef6e14906bf98"
},
"id": "5705bd46cbdef6e14906bf98"
}
Operations: publish
Resources: thirdparty-snapshot-rpm-latest (repository)
State: Waiting
Start Time: Unstarted
Finish Time: Incomplete
Result: Incomplete
Task Id: 622041ac-e9e4-4a15-bd7c-7c98a17782e0
Progress Report:
Output of ps afuxw | grep celery:
On host1:
root 2921 0.0 0.0 112640 960 pts/2 S+ 09:31 0:00 | \_ grep --color=auto celery
apache 21996 0.1 0.0 519060 62080 ? Ssl Apr06 10:43 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30
apache 22119 2.6 0.1 654736 193452 ? Rl Apr06 220:36 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30
apache 21998 0.1 0.0 518364 61656 ? Ssl Apr06 10:12 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30
apache 22124 0.3 0.0 544160 80196 ? Sl Apr06 25:32 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30
apache 22000 0.1 0.0 519052 61984 ? Ssl Apr06 10:56 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30
apache 22129 2.3 0.2 669752 208464 ? Dl Apr06 198:42 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30
apache 22002 0.1 0.0 518980 62028 ? Ssl Apr06 10:50 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30
apache 22126 2.5 0.4 867344 405440 ? Dl Apr06 217:02 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30
apache 22004 0.1 0.0 518972 62176 ? Ssl Apr06 10:41 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-4.pid --heartbeat-interval=30
apache 22128 2.3 0.2 681192 219840 ? Dl Apr06 196:41 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-4.pid --heartbeat-interval=30
apache 22006 0.1 0.0 518500 61580 ? Ssl Apr06 10:17 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-5.pid --heartbeat-interval=30
apache 22132 0.0 0.0 518960 54696 ? Sl Apr06 7:16 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-5.pid --heartbeat-interval=30
apache 22008 0.1 0.0 518364 61624 ? Ssl Apr06 10:20 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-6.pid --heartbeat-interval=30
apache 22120 0.3 0.0 519700 57868 ? Dl Apr06 31:11 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-6.pid --heartbeat-interval=30
apache 22010 0.1 0.0 518700 61616 ? Ssl Apr06 10:24 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-7.pid --heartbeat-interval=30
apache 22121 1.6 0.2 671912 210604 ? Rl Apr06 138:42 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-7.pid --heartbeat-interval=30
apache 21270 0.3 0.0 487004 27936 ? Ssl Apr11 2:41 /usr/bin/python /usr/bin/celery beat --app=pulp.server.async.celery_instance.celery --scheduler=pulp.server.async.scheduler.Scheduler
apache 17185 0.5 0.0 522104 65144 ? Ssl 08:59 0:10 /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
apache 17289 5.9 0.0 518356 54268 ? Sl 08:59 1:55 \_ /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
On host2:
root 4431 0.0 0.0 112640 960 pts/0 S+ 09:32 0:00 | \_ grep --color=auto celery
apache 14669 0.1 0.0 520664 63784 ? Ssl Apr06 12:17 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30
apache 15042 1.9 0.1 652572 190552 ? Dl Apr06 166:59 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30
apache 14671 0.1 0.0 520672 63668 ? Ssl Apr06 12:24 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30
apache 15046 2.4 0.1 618272 153048 ? Sl Apr06 205:57 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30
apache 14674 0.1 0.0 520168 63324 ? Ssl Apr06 12:07 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30
apache 15044 2.7 0.1 645860 184516 ? Rl Apr06 234:59 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30
apache 14676 0.1 0.0 520672 63816 ? Ssl Apr06 12:12 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30
apache 15048 2.7 0.2 665080 203128 ? Dl Apr06 230:19 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30
apache 14678 0.1 0.0 520664 63724 ? Ssl Apr06 12:18 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-4.pid --heartbeat-interval=30
apache 15045 2.3 0.2 680920 219648 ? Rl Apr06 201:53 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-4.pid --heartbeat-interval=30
apache 14681 0.1 0.0 520680 63792 ? Ssl Apr06 12:07 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-5.pid --heartbeat-interval=30
apache 15041 2.6 0.2 666260 204232 ? Dl Apr06 223:23 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-5.pid --heartbeat-interval=30
apache 14684 0.1 0.0 520168 63304 ? Ssl Apr06 11:44 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-6.pid --heartbeat-interval=30
apache 15043 0.1 0.0 534632 71388 ? Sl Apr06 13:16 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-6.pid --heartbeat-interval=30
apache 14693 0.1 0.0 520940 64036 ? Ssl Apr06 13:41 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-7.pid --heartbeat-interval=30
apache 15047 2.8 0.2 667648 205668 ? Rl Apr06 240:37 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-7.pid --heartbeat-interval=30
apache 1909 0.4 0.0 521980 64864 ? Ssl 08:57 0:09 /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
apache 2020 5.4 0.0 518348 54256 ? Sl 08:57 1:51 \_ /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
Actions