Issue #1838
closedTasks being stuck
Description
I now have 3 tasks that are stuck in "Waiting".
We have 2 hosts that run as an HA cluster, with corosync as the heartbeat. Celery runs on both, so both should process tasks. The resource manager runs only on one, and gets moved to the other if corosync determines the primary is dead.
Here is some debug output:
2016-04-12 09:22:45,763 - DEBUG - sending GET request to /pulp/api/v2/tasks/622041ac-e9e4-4a15-bd7c-7c98a17782e0/
2016-04-12 09:22:46,023 - INFO - GET request to /pulp/api/v2/tasks/622041ac-e9e4-4a15-bd7c-7c98a17782e0/ with parameters None
2016-04-12 09:22:46,023 - INFO - Response status : 200
2016-04-12 09:22:46,023 - INFO - Response body :
{
"exception": null,
"task_type": "pulp.server.managers.repo.publish.publish",
"_href": "/pulp/api/v2/tasks/622041ac-e9e4-4a15-bd7c-7c98a17782e0/",
"task_id": "622041ac-e9e4-4a15-bd7c-7c98a17782e0",
"tags": [
"pulp:repository:thirdparty-snapshot-rpm-latest",
"pulp:action:publish"
],
"finish_time": null,
"_ns": "task_status",
"start_time": null,
"traceback": null,
"spawned_tasks": [],
"progress_report": {},
"queue": "None.dq",
"state": "waiting",
"worker_name": null,
"result": null,
"error": null,
"_id": {
"$oid": "5705bd46cbdef6e14906bf98"
},
"id": "5705bd46cbdef6e14906bf98"
}
Operations: publish
Resources: thirdparty-snapshot-rpm-latest (repository)
State: Waiting
Start Time: Unstarted
Finish Time: Incomplete
Result: Incomplete
Task Id: 622041ac-e9e4-4a15-bd7c-7c98a17782e0
Progress Report:
Output of ps afuxw | grep celery:
On host1:
root 2921 0.0 0.0 112640 960 pts/2 S+ 09:31 0:00 | \_ grep --color=auto celery
apache 21996 0.1 0.0 519060 62080 ? Ssl Apr06 10:43 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30
apache 22119 2.6 0.1 654736 193452 ? Rl Apr06 220:36 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30
apache 21998 0.1 0.0 518364 61656 ? Ssl Apr06 10:12 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30
apache 22124 0.3 0.0 544160 80196 ? Sl Apr06 25:32 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30
apache 22000 0.1 0.0 519052 61984 ? Ssl Apr06 10:56 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30
apache 22129 2.3 0.2 669752 208464 ? Dl Apr06 198:42 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30
apache 22002 0.1 0.0 518980 62028 ? Ssl Apr06 10:50 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30
apache 22126 2.5 0.4 867344 405440 ? Dl Apr06 217:02 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30
apache 22004 0.1 0.0 518972 62176 ? Ssl Apr06 10:41 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-4.pid --heartbeat-interval=30
apache 22128 2.3 0.2 681192 219840 ? Dl Apr06 196:41 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-4.pid --heartbeat-interval=30
apache 22006 0.1 0.0 518500 61580 ? Ssl Apr06 10:17 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-5.pid --heartbeat-interval=30
apache 22132 0.0 0.0 518960 54696 ? Sl Apr06 7:16 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-5.pid --heartbeat-interval=30
apache 22008 0.1 0.0 518364 61624 ? Ssl Apr06 10:20 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-6.pid --heartbeat-interval=30
apache 22120 0.3 0.0 519700 57868 ? Dl Apr06 31:11 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-6.pid --heartbeat-interval=30
apache 22010 0.1 0.0 518700 61616 ? Ssl Apr06 10:24 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-7.pid --heartbeat-interval=30
apache 22121 1.6 0.2 671912 210604 ? Rl Apr06 138:42 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-7.pid --heartbeat-interval=30
apache 21270 0.3 0.0 487004 27936 ? Ssl Apr11 2:41 /usr/bin/python /usr/bin/celery beat --app=pulp.server.async.celery_instance.celery --scheduler=pulp.server.async.scheduler.Scheduler
apache 17185 0.5 0.0 522104 65144 ? Ssl 08:59 0:10 /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
apache 17289 5.9 0.0 518356 54268 ? Sl 08:59 1:55 \_ /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
On host2:
root 4431 0.0 0.0 112640 960 pts/0 S+ 09:32 0:00 | \_ grep --color=auto celery
apache 14669 0.1 0.0 520664 63784 ? Ssl Apr06 12:17 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30
apache 15042 1.9 0.1 652572 190552 ? Dl Apr06 166:59 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30
apache 14671 0.1 0.0 520672 63668 ? Ssl Apr06 12:24 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30
apache 15046 2.4 0.1 618272 153048 ? Sl Apr06 205:57 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30
apache 14674 0.1 0.0 520168 63324 ? Ssl Apr06 12:07 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30
apache 15044 2.7 0.1 645860 184516 ? Rl Apr06 234:59 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30
apache 14676 0.1 0.0 520672 63816 ? Ssl Apr06 12:12 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30
apache 15048 2.7 0.2 665080 203128 ? Dl Apr06 230:19 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30
apache 14678 0.1 0.0 520664 63724 ? Ssl Apr06 12:18 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-4.pid --heartbeat-interval=30
apache 15045 2.3 0.2 680920 219648 ? Rl Apr06 201:53 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-4.pid --heartbeat-interval=30
apache 14681 0.1 0.0 520680 63792 ? Ssl Apr06 12:07 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-5.pid --heartbeat-interval=30
apache 15041 2.6 0.2 666260 204232 ? Dl Apr06 223:23 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-5.pid --heartbeat-interval=30
apache 14684 0.1 0.0 520168 63304 ? Ssl Apr06 11:44 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-6.pid --heartbeat-interval=30
apache 15043 0.1 0.0 534632 71388 ? Sl Apr06 13:16 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-6.pid --heartbeat-interval=30
apache 14693 0.1 0.0 520940 64036 ? Ssl Apr06 13:41 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-7.pid --heartbeat-interval=30
apache 15047 2.8 0.2 667648 205668 ? Rl Apr06 240:37 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-7.pid --heartbeat-interval=30
apache 1909 0.4 0.0 521980 64864 ? Ssl 08:57 0:09 /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
apache 2020 5.4 0.0 518348 54256 ? Sl 08:57 1:51 \_ /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
Updated by mihai.ibanescu@gmail.com over 8 years ago
[root@repulpmst01r ~]# rabbitmqctl list_queues
Listing queues ...
resource_manager@repulpmst01r.unx.sas.com.dq 0
reserved_resource_worker-1@repulpmst01r.unx.sas.com.dq 0
celeryev.df343c1f-2803-4a44-990b-14d6d3c13801 0
reserved_resource_worker-0@repulpmst02r.unx.sas.com.dq 0
reserved_resource_worker-4@repulpmst01r.unx.sas.com.dq 0
reserved_resource_worker-6@repulpmst01r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-7@repulpmst02r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-6@repulpmst02r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-1@repulpmst02r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-3@repulpmst02r.unx.sas.com.dq 0
celeryev.5dcb0d61-11cf-4ab4-aa4e-adbc018d130a 0
celeryev.c9c6b8fb-f95c-4532-a6bf-5bb91e3a1865 0
celeryev.9e2f0f1e-5eb8-4e7e-a488-0d234ac91f01 0
reserved_resource_worker-3@repulpmst01r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-7@repulpmst02r.unx.sas.com.dq 0
resource_manager@repulpmst02r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-6@repulpmst01r.unx.sas.com.dq 0
celeryev.55205282-58b1-43fc-9647-bc201bedca65 0
reserved_resource_worker-5@repulpmst01r.unx.sas.com.celery.pidbox 0
celery 0
celeryev.54e4bdb1-aa71-423c-8589-ece93ed05c41 0
reserved_resource_worker-5@repulpmst02r.unx.sas.com.dq 0
reserved_resource_worker-1@repulpmst02r.unx.sas.com.dq 0
celeryev.f8361df4-1a2b-4e49-861a-0da77d373c88 0
reserved_resource_worker-4@repulpmst02r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-5@repulpmst02r.unx.sas.com.celery.pidbox 0
celeryev.ef25e2df-0c1b-4f92-9452-18765cb8b0ab 0
celeryev.1606c43b-42c2-4eb9-80be-82da99a5230d 0
reserved_resource_worker-0@repulpmst01r.unx.sas.com.dq 0
celeryev.10341687-d167-4033-9e39-6d8057160153 0
celeryev.4842ebb5-f26b-48ff-af59-f30a735a16ab 0
reserved_resource_worker-5@repulpmst01r.unx.sas.com.dq 0
reserved_resource_worker-2@repulpmst02r.unx.sas.com.dq 0
celeryev.853c0d3d-f8b0-43d8-a73e-5d0024f83886 0
celeryev.80c223e1-1192-41d1-9007-94366e25f502 0
celeryev.9cf0c549-650d-453f-8ed4-43dd2c2058d6 0
celeryev.78f2aa99-0226-4998-a446-10d0ea348b82 0
reserved_resource_worker-2@repulpmst01r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-3@repulpmst01r.unx.sas.com.dq 0
reserved_resource_worker-7@repulpmst01r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-6@repulpmst02r.unx.sas.com.dq 0
resource_manager@repulpmst02r.unx.sas.com.dq 0
reserved_resource_worker-4@repulpmst02r.unx.sas.com.dq 0
reserved_resource_worker-2@repulpmst02r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-3@repulpmst02r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-0@repulpmst01r.unx.sas.com.celery.pidbox 0
celeryev.2aac488f-f8b6-4b50-aea7-9014dac2693e 0
reserved_resource_worker-0@repulpmst02r.unx.sas.com.celery.pidbox 0
resource_manager 0
reserved_resource_worker-1@repulpmst01r.unx.sas.com.celery.pidbox 0
resource_manager@repulpmst01r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-2@repulpmst01r.unx.sas.com.dq 0
celeryev.2ac48cd5-dfcc-455f-ba95-0a76b8dd6acc 0
pulp.task 0
reserved_resource_worker-4@repulpmst01r.unx.sas.com.celery.pidbox 0
celeryev.fefb61fc-2f85-4fde-8eea-2d4a3757d83a 0
celeryev.f8588021-36b9-4b9e-8795-b53711acd0eb 0
reserved_resource_worker-7@repulpmst01r.unx.sas.com.dq 0
[root@repulpmst02r ~]# rabbitmqctl list_queues
Listing queues ...
resource_manager@repulpmst01r.unx.sas.com.dq 0
reserved_resource_worker-1@repulpmst01r.unx.sas.com.dq 0
celeryev.df343c1f-2803-4a44-990b-14d6d3c13801 0
reserved_resource_worker-0@repulpmst02r.unx.sas.com.dq 0
reserved_resource_worker-4@repulpmst01r.unx.sas.com.dq 0
reserved_resource_worker-6@repulpmst01r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-7@repulpmst02r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-6@repulpmst02r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-1@repulpmst02r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-3@repulpmst02r.unx.sas.com.dq 0
celeryev.5dcb0d61-11cf-4ab4-aa4e-adbc018d130a 0
celeryev.c9c6b8fb-f95c-4532-a6bf-5bb91e3a1865 0
celeryev.9e2f0f1e-5eb8-4e7e-a488-0d234ac91f01 0
reserved_resource_worker-3@repulpmst01r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-7@repulpmst02r.unx.sas.com.dq 0
resource_manager@repulpmst02r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-6@repulpmst01r.unx.sas.com.dq 0
celeryev.55205282-58b1-43fc-9647-bc201bedca65 0
reserved_resource_worker-5@repulpmst01r.unx.sas.com.celery.pidbox 0
celery 0
celeryev.54e4bdb1-aa71-423c-8589-ece93ed05c41 0
reserved_resource_worker-5@repulpmst02r.unx.sas.com.dq 0
reserved_resource_worker-1@repulpmst02r.unx.sas.com.dq 0
celeryev.f8361df4-1a2b-4e49-861a-0da77d373c88 0
reserved_resource_worker-4@repulpmst02r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-5@repulpmst02r.unx.sas.com.celery.pidbox 0
celeryev.ef25e2df-0c1b-4f92-9452-18765cb8b0ab 0
celeryev.1606c43b-42c2-4eb9-80be-82da99a5230d 0
reserved_resource_worker-0@repulpmst01r.unx.sas.com.dq 0
celeryev.10341687-d167-4033-9e39-6d8057160153 0
celeryev.4842ebb5-f26b-48ff-af59-f30a735a16ab 0
reserved_resource_worker-5@repulpmst01r.unx.sas.com.dq 0
reserved_resource_worker-2@repulpmst02r.unx.sas.com.dq 0
celeryev.853c0d3d-f8b0-43d8-a73e-5d0024f83886 0
celeryev.80c223e1-1192-41d1-9007-94366e25f502 0
celeryev.9cf0c549-650d-453f-8ed4-43dd2c2058d6 0
celeryev.78f2aa99-0226-4998-a446-10d0ea348b82 0
reserved_resource_worker-2@repulpmst01r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-3@repulpmst01r.unx.sas.com.dq 0
reserved_resource_worker-7@repulpmst01r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-6@repulpmst02r.unx.sas.com.dq 0
resource_manager@repulpmst02r.unx.sas.com.dq 0
reserved_resource_worker-4@repulpmst02r.unx.sas.com.dq 0
reserved_resource_worker-2@repulpmst02r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-3@repulpmst02r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-0@repulpmst01r.unx.sas.com.celery.pidbox 0
celeryev.2aac488f-f8b6-4b50-aea7-9014dac2693e 0
reserved_resource_worker-0@repulpmst02r.unx.sas.com.celery.pidbox 0
resource_manager 0
reserved_resource_worker-1@repulpmst01r.unx.sas.com.celery.pidbox 0
resource_manager@repulpmst01r.unx.sas.com.celery.pidbox 0
reserved_resource_worker-2@repulpmst01r.unx.sas.com.dq 0
celeryev.2ac48cd5-dfcc-455f-ba95-0a76b8dd6acc 0
pulp.task 0
reserved_resource_worker-4@repulpmst01r.unx.sas.com.celery.pidbox 0
celeryev.fefb61fc-2f85-4fde-8eea-2d4a3757d83a 0
celeryev.f8588021-36b9-4b9e-8795-b53711acd0eb 0
reserved_resource_worker-7@repulpmst01r.unx.sas.com.dq 0
Updated by bmbouter over 8 years ago
You only have 1 rabbitMQ broker right? All Pulp services need to use the same broker for celery communication.
The "worker_name": null tells me that the task never got processed by the resource_manager and assigned to a specific worker. The resource_manager reads out of the resource_manager queue which shows a depth of 0 so the work is effectively "gone" by Pulp.
One thing about your deployment is you have 2 resource_managers in the cluster. Pulp is only designed to work with 1 resource manager in the cluster currently. Having two could lead to strange issues. Could you do the following:
0) verify all of your workers are configured to connect to the same broker (see server.conf settings on all nodes).
1) ensure there is only 1 resource_manager running in the entire cluster.
2) cancel all tasks so you have 0 in the waiting or running states.
3) try to reproduce the issue and post back.
Updated by mihai.ibanescu@gmail.com over 8 years ago
I will try to answer this to the best of my understanding, given that I am not the one who set up the cluster.
Each node connects to rabbitmq at localhost.
The queues on rabbitmq are configured to be HA, so they should be shared between the two rabbitmq nodes.
There is only one resource manager in the cluster. Or at least there should only be one.
If you have different recommendations on connecting to a highly-available rabbitmq message bus, please let us know. We want to have an active/active setup as much as possible, with the understanding that there should be only one resource manager which gets moved around by Pacemaker.
Updated by bmbouter over 8 years ago
mihai.ibanescu@gmail.com wrote:
There is only one resource manager in the cluster. Or at least there should only be one.
If you have different recommendations on connecting to a highly-available rabbitmq message bus, please let us know. We want to have an active/active setup as much as possible, with the understanding that there should be only one resource manager which gets moved around by Pacemaker.
Based on your ps output in the issue description, you have 2 resource managers running. Please re-read the steps from comment 4 to try to reproduce the issue when there is only 1 resource manager running in your environment. This issue was skipped at triage today because it's unclear if this is a legitimate bug or not. We'll need more info to move forward on this.
Also joining in #pulp would be a good way for us to synchronously resolve the issue. Feel free to ping my nick in there, I'm 'bmbouter'.
Updated by bmbouter over 8 years ago
- Status changed from NEW to CLOSED - NOTABUG
- Triaged changed from No to Yes
After IRC discussion it is believed that this issue was due to an environmental issue of two resource managers being run. I'm closing it as NOTABUG. If you experience an issue in the future please reopen, e-mail pulp-list, or discuss on IRC.
Updated by bmbouter over 4 years ago
- Category deleted (
14)
We are removing the 'API' category per open floor discussion June 16, 2020.