Project

Profile

Help

Issue #686

Pulp task hangs when pulp services were restarted (upstart)

Added by amacdona@redhat.com over 6 years ago. Updated over 2 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
2.5
Platform Release:
2.6.1
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Easy Fix, Pulp 2
Sprint:
Quarter:

Description

Description of problem:

orphan remove task hangs on RHEL 6 if pulp services were restarted.

Version-Release number of selected component (if applicable):

2.5-release

How reproducible:
Always

Steps to Reproduce:

I created a script to make this easy:

$ cat all.sh

sudo service qpidd $1
sudo service pulp_celerybeat $1
sudo service pulp_resource_manager $1
sudo service pulp_workers $1
sudo service httpd $1

1. $ ./all.sh restart
2. $ pulp-admin orphan remove --all

Actual results:

Waiting to begin... (hangs here)

Expected results:

Task Succeeded

Additional info:

The services need to be stopped and then started. From this state,

1. $ ./all.sh stop
2. $ ./all.sh start
3. $ pulp-admin orphan remove --all

Task Succeeded

+ This bug was cloned from Bugzilla Bug #1188755 +

Associated revisions

Revision 1fb36484 View on GitHub
Added by rbarlow over 6 years ago

Upstart use stop/start instead of restart_workers.

This commit updates our Upstart init scripts to call stop_workers and then start_workers, instead of restart_workers(). There seems to be an issue with Celery multi, restarting, and Pulp. This commit works around that issue by fully stopping the workers instead of issuing a restart to them.

https://pulp.plan.io/issues/686

closes #686

Revision 1fb36484 View on GitHub
Added by rbarlow over 6 years ago

Upstart use stop/start instead of restart_workers.

This commit updates our Upstart init scripts to call stop_workers and then start_workers, instead of restart_workers(). There seems to be an issue with Celery multi, restarting, and Pulp. This commit works around that issue by fully stopping the workers instead of issuing a restart to them.

https://pulp.plan.io/issues/686

closes #686

History

#1 Updated by amacdona@redhat.com over 6 years ago

I was able to reproduce this on 2.6-testing also.

+ This comment was cloned from Bugzilla #1188755 comment 1 +

#2 Updated by cduryee over 6 years ago

This bz caused me a lot of confusion. Can this be put on 2.6.0?

+ This comment was cloned from Bugzilla #1188755 comment 2 +

#3 Updated by rbarlow over 6 years ago

  • Status changed from ASSIGNED to POST
  • Tags Easy Fix added

#4 Updated by rbarlow over 6 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

#5 Updated by bmbouter over 6 years ago

  • Severity changed from High to 3. High

#6 Updated by bcourt over 6 years ago

  • Status changed from MODIFIED to 5

#7 Updated by pthomas@redhat.com over 6 years ago

  • Status changed from 5 to 6

verified
[root@cloud-qe-12 ~]# rpm -qa pulp-server
pulp-server-2.6.1-0.2.beta.el6.noarch
[root@cloud-qe-12 ~]#

[root@cloud-qe-12 ~]# 
[root@cloud-qe-12 ~]# ./all.sh restart
Stopping Qpid AMQP daemon: [  OK  ]
Starting Qpid AMQP daemon: [  OK  ]
celery init v10.0.
Using configuration: /etc/default/pulp_workers, /etc/default/pulp_celerybeat
Restarting celery periodic task scheduler
Stopping pulp_celerybeat... OK
Starting pulp_celerybeat...
celery init v10.0.
Using config script: /etc/default/pulp_resource_manager
celery multi v3.1.11 (Cipater)
> Stopping nodes...
    > resource_manager@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: QUIT -> 18130
> Waiting for 1 node -> 18130.....
    > resource_manager@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK

celery multi v3.1.11 (Cipater)
> Starting nodes...
    > resource_manager@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
celery init v10.0.
Using config script: /etc/default/pulp_workers
celery multi v3.1.11 (Cipater)
> Stopping nodes...
    > reserved_resource_worker-2@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: QUIT -> 18344
    > reserved_resource_worker-1@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: QUIT -> 18313
    > reserved_resource_worker-0@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: QUIT -> 18284
    > reserved_resource_worker-3@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: QUIT -> 18377
> Waiting for 4 nodes -> 18344, 18313, 18284, 18377........
    > reserved_resource_worker-2@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
> Waiting for 3 nodes -> 18313, 18284, 18377....
    > reserved_resource_worker-1@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
> Waiting for 2 nodes -> 18284, 18377....
    > reserved_resource_worker-0@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
> Waiting for 1 node -> 18377....
    > reserved_resource_worker-3@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK

celery multi v3.1.11 (Cipater)
> Starting nodes...
    > reserved_resource_worker-0@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
    > reserved_resource_worker-1@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
    > reserved_resource_worker-2@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
    > reserved_resource_worker-3@cloud-qe-12.idmqe.lab.eng.bos.redhat.com: OK
Stopping httpd: [  OK  ]
Starting httpd: [  OK  ]
[root@cloud-qe-12 ~]#  pulp-admin orphan remove --all
This command may be exited via ctrl+c without affecting the request.

[-]
Running...

Task Succeeded

<\pre>

#9 Updated by dkliban@redhat.com over 6 years ago

  • Status changed from 6 to CLOSED - CURRENTRELEASE

#13 Updated by bmbouter over 2 years ago

  • Tags Pulp 2 added

Please register to edit this issue

Also available in: Atom PDF