Issue #686
closed
Pulp task hangs when pulp services were restarted (upstart)
Status:
CLOSED - CURRENTRELEASE
Description
Description of problem:
orphan remove task hangs on RHEL 6 if pulp services were restarted.
Version-Release number of selected component (if applicable):
2.5-release
How reproducible:
Always
Steps to Reproduce:
I created a script to make this easy:
$ cat all.sh
sudo service qpidd $1
sudo service pulp_celerybeat $1
sudo service pulp_resource_manager $1
sudo service pulp_workers $1
sudo service httpd $1
1. $ ./all.sh restart
2. $ pulp-admin orphan remove --all
Actual results:
Waiting to begin... (hangs here)
Expected results:
Task Succeeded
Additional info:
The services need to be stopped and then started. From this state,
1. $ ./all.sh stop
2. $ ./all.sh start
3. $ pulp-admin orphan remove --all
Task Succeeded
+ This bug was cloned from Bugzilla Bug #1188755 +
- Status changed from ASSIGNED to POST
- Tags Easy Fix added
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
- Severity changed from High to 3. High
- Status changed from MODIFIED to 5
- Status changed from 5 to 6
verified
[root@cloud-qe-12 ~]# rpm -qa pulp-server
pulp-server-2.6.1-0.2.beta.el6.noarch
[root@cloud-qe-12 ~]#
[root@cloud-qe-12 ~]#
[root@cloud-qe-12 ~]# ./all.sh restart
Stopping Qpid AMQP daemon: [ OK ]
Starting Qpid AMQP daemon: [ OK ]
celery init v10.0.
Using configuration: /etc/default/pulp_workers, /etc/default/pulp_celerybeat
Restarting celery periodic task scheduler
Stopping pulp_celerybeat... OK
Starting pulp_celerybeat...
celery init v10.0.
Using config script: /etc/default/pulp_resource_manager
celery multi v3.1.11 (Cipater)
> Stopping nodes...
> resource_manager@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: QUIT -> 18130
> Waiting for 1 node -> 18130.....
> resource_manager@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
celery multi v3.1.11 (Cipater)
> Starting nodes...
> resource_manager@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
celery init v10.0.
Using config script: /etc/default/pulp_workers
celery multi v3.1.11 (Cipater)
> Stopping nodes...
> reserved_resource_worker-2@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: QUIT -> 18344
> reserved_resource_worker-1@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: QUIT -> 18313
> reserved_resource_worker-0@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: QUIT -> 18284
> reserved_resource_worker-3@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: QUIT -> 18377
> Waiting for 4 nodes -> 18344, 18313, 18284, 18377........
> reserved_resource_worker-2@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
> Waiting for 3 nodes -> 18313, 18284, 18377....
> reserved_resource_worker-1@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
> Waiting for 2 nodes -> 18284, 18377....
> reserved_resource_worker-0@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
> Waiting for 1 node -> 18377....
> reserved_resource_worker-3@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
celery multi v3.1.11 (Cipater)
> Starting nodes...
> reserved_resource_worker-0@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
> reserved_resource_worker-1@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
> reserved_resource_worker-2@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
> reserved_resource_worker-3@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
Stopping httpd: [ OK ]
Starting httpd: [ OK ]
[root@cloud-qe-12 ~]# pulp-admin orphan remove --all
This command may be exited via ctrl+c without affecting the request.
[-]
Running...
Task Succeeded
<\pre>
- Status changed from 6 to CLOSED - CURRENTRELEASE
Also available in: Atom
PDF
Upstart use stop/start instead of restart_workers.
This commit updates our Upstart init scripts to call stop_workers and then start_workers, instead of restart_workers(). There seems to be an issue with Celery multi, restarting, and Pulp. This commit works around that issue by fully stopping the workers instead of issuing a restart to them.
https://pulp.plan.io/issues/686
closes #686