Project

Profile

Help

Issue #1838

closed

Tasks being stuck

Added by mihai.ibanescu@gmail.com about 6 years ago. Updated about 2 years ago.

Status:
CLOSED - NOTABUG
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
2.7.1
Platform Release:
OS:
RHEL 7
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

I now have 3 tasks that are stuck in "Waiting".

We have 2 hosts that run as an HA cluster, with corosync as the heartbeat. Celery runs on both, so both should process tasks. The resource manager runs only on one, and gets moved to the other if corosync determines the primary is dead.

Here is some debug output:

2016-04-12 09:22:45,763 - DEBUG - sending GET request to /pulp/api/v2/tasks/622041ac-e9e4-4a15-bd7c-7c98a17782e0/
2016-04-12 09:22:46,023 - INFO - GET request to /pulp/api/v2/tasks/622041ac-e9e4-4a15-bd7c-7c98a17782e0/ with parameters None
2016-04-12 09:22:46,023 - INFO - Response status : 200 

2016-04-12 09:22:46,023 - INFO - Response body :
 {
  "exception": null, 
  "task_type": "pulp.server.managers.repo.publish.publish", 
  "_href": "/pulp/api/v2/tasks/622041ac-e9e4-4a15-bd7c-7c98a17782e0/", 
  "task_id": "622041ac-e9e4-4a15-bd7c-7c98a17782e0", 
  "tags": [
    "pulp:repository:thirdparty-snapshot-rpm-latest", 
    "pulp:action:publish"
  ], 
  "finish_time": null, 
  "_ns": "task_status", 
  "start_time": null, 
  "traceback": null, 
  "spawned_tasks": [], 
  "progress_report": {}, 
  "queue": "None.dq", 
  "state": "waiting", 
  "worker_name": null, 
  "result": null, 
  "error": null, 
  "_id": {
    "$oid": "5705bd46cbdef6e14906bf98"
  }, 
  "id": "5705bd46cbdef6e14906bf98"
}

Operations:       publish
Resources:        thirdparty-snapshot-rpm-latest (repository)
State:            Waiting
Start Time:       Unstarted
Finish Time:      Incomplete
Result:           Incomplete
Task Id:          622041ac-e9e4-4a15-bd7c-7c98a17782e0
Progress Report:  

Output of ps afuxw | grep celery:

On host1:

root      2921  0.0  0.0 112640   960 pts/2    S+   09:31   0:00  |                       \_ grep --color=auto celery
apache   21996  0.1  0.0 519060 62080 ?        Ssl  Apr06  10:43 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30
apache   22119  2.6  0.1 654736 193452 ?       Rl   Apr06 220:36  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30
apache   21998  0.1  0.0 518364 61656 ?        Ssl  Apr06  10:12 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30
apache   22124  0.3  0.0 544160 80196 ?        Sl   Apr06  25:32  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30
apache   22000  0.1  0.0 519052 61984 ?        Ssl  Apr06  10:56 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30
apache   22129  2.3  0.2 669752 208464 ?       Dl   Apr06 198:42  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30
apache   22002  0.1  0.0 518980 62028 ?        Ssl  Apr06  10:50 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30
apache   22126  2.5  0.4 867344 405440 ?       Dl   Apr06 217:02  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30
apache   22004  0.1  0.0 518972 62176 ?        Ssl  Apr06  10:41 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-4.pid --heartbeat-interval=30
apache   22128  2.3  0.2 681192 219840 ?       Dl   Apr06 196:41  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-4.pid --heartbeat-interval=30
apache   22006  0.1  0.0 518500 61580 ?        Ssl  Apr06  10:17 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-5.pid --heartbeat-interval=30
apache   22132  0.0  0.0 518960 54696 ?        Sl   Apr06   7:16  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-5.pid --heartbeat-interval=30
apache   22008  0.1  0.0 518364 61624 ?        Ssl  Apr06  10:20 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-6.pid --heartbeat-interval=30
apache   22120  0.3  0.0 519700 57868 ?        Dl   Apr06  31:11  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-6.pid --heartbeat-interval=30
apache   22010  0.1  0.0 518700 61616 ?        Ssl  Apr06  10:24 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-7.pid --heartbeat-interval=30
apache   22121  1.6  0.2 671912 210604 ?       Rl   Apr06 138:42  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-7.pid --heartbeat-interval=30
apache   21270  0.3  0.0 487004 27936 ?        Ssl  Apr11   2:41 /usr/bin/python /usr/bin/celery beat --app=pulp.server.async.celery_instance.celery --scheduler=pulp.server.async.scheduler.Scheduler
apache   17185  0.5  0.0 522104 65144 ?        Ssl  08:59   0:10 /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
apache   17289  5.9  0.0 518356 54268 ?        Sl   08:59   1:55  \_ /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30

On host2:

root      4431  0.0  0.0 112640   960 pts/0    S+   09:32   0:00  |                       \_ grep --color=auto celery
apache   14669  0.1  0.0 520664 63784 ?        Ssl  Apr06  12:17 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30
apache   15042  1.9  0.1 652572 190552 ?       Dl   Apr06 166:59  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30
apache   14671  0.1  0.0 520672 63668 ?        Ssl  Apr06  12:24 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30
apache   15046  2.4  0.1 618272 153048 ?       Sl   Apr06 205:57  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30
apache   14674  0.1  0.0 520168 63324 ?        Ssl  Apr06  12:07 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30
apache   15044  2.7  0.1 645860 184516 ?       Rl   Apr06 234:59  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30
apache   14676  0.1  0.0 520672 63816 ?        Ssl  Apr06  12:12 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30
apache   15048  2.7  0.2 665080 203128 ?       Dl   Apr06 230:19  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30
apache   14678  0.1  0.0 520664 63724 ?        Ssl  Apr06  12:18 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-4.pid --heartbeat-interval=30
apache   15045  2.3  0.2 680920 219648 ?       Rl   Apr06 201:53  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-4.pid --heartbeat-interval=30
apache   14681  0.1  0.0 520680 63792 ?        Ssl  Apr06  12:07 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-5.pid --heartbeat-interval=30
apache   15041  2.6  0.2 666260 204232 ?       Dl   Apr06 223:23  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-5.pid --heartbeat-interval=30
apache   14684  0.1  0.0 520168 63304 ?        Ssl  Apr06  11:44 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-6.pid --heartbeat-interval=30
apache   15043  0.1  0.0 534632 71388 ?        Sl   Apr06  13:16  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-6.pid --heartbeat-interval=30
apache   14693  0.1  0.0 520940 64036 ?        Ssl  Apr06  13:41 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-7.pid --heartbeat-interval=30
apache   15047  2.8  0.2 667648 205668 ?       Rl   Apr06 240:37  \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-7.pid --heartbeat-interval=30
apache    1909  0.4  0.0 521980 64864 ?        Ssl  08:57   0:09 /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
apache    2020  5.4  0.0 518348 54256 ?        Sl   08:57   1:51  \_ /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30

Also available in: Atom PDF