Project

Profile

Help

Issue #1290

closed

missing DB during boot causes resource manager to hang

Added by mhrivnak over 8 years ago. Updated over 3 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
2.6.0
Platform Release:
2.8.0
OS:
RHEL 6
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

Copied from bugzilla:

Steps to Reproduce:
1. Ensure mongod cannot start properly. For instance, shutdown mongod then create a bogus /var/lib/mongod/mongod.lock pid file.
2. $ katello-service restart

Actual results:

$ katello-service restart
Stopping Qpid AMQP daemon:                                 [  OK  ]
Starting Qpid AMQP daemon:                                 [  OK  ]
Shutting down qdrouterd services:                          [  OK  ]
Starting qdrouterd services:                               [  OK  ]
celery multi v3.1.11 (Cipater)
> Stopping nodes...
        > resource_manager@hostname: QUIT -> 9704
> Waiting for 1 node -> 9704.....
        > resource_manager@hostname: OK

celery multi v3.1.11 (Cipater)
> Starting nodes...
        > resource_manager@hostname: OK
celery init v10.0.
Using config script: /etc/default/pulp_resource_manager
Stopping mongod:                                           [  OK  ]
Starting mongod:                                           [  OK  ]
Waiting for mongod to become available:                    [FAILED]
Stopping elasticsearch:                                    [  OK  ]
Starting elasticsearch:                                    [  OK  ]
Stopping tomcat6:                                          [  OK  ]
Starting tomcat6:                                          [  OK  ]
Stopping foreman-proxy:                                    [  OK  ]
Starting foreman-proxy:                                    [  OK  ]
celery init v10.0.
Using configuration: /etc/default/pulp_workers, /etc/default/pulp_celerybeat
Restarting celery periodic task scheduler
Stopping pulp_celerybeat... OK
Starting pulp_celerybeat...
celery multi v3.1.11 (Cipater)
> Stopping nodes...
        > reserved_resource_worker-0@hostname: QUIT -> 9936
        > reserved_resource_worker-1@hostname: QUIT -> 9962
> Waiting for 2 nodes -> 9936, 9962......
        > reserved_resource_worker-0@hostname: OK
> Waiting for 1 node -> 9962....
        > reserved_resource_worker-1@hostname: OK

celery multi v3.1.11 (Cipater)
> Starting nodes...
        > reserved_resource_worker-0@hostname: No handlers could be found for logger "pulp.server.db.connection"

*** gets stuck here indefinitely until user CTRL-c's ***

Expected results:
Even if a database connection cannot be established the startup should still return.
Actions #1

Updated by mhrivnak over 8 years ago

  • Triaged changed from No to Yes
Actions #2

Updated by bmbouter over 8 years ago

I suspect this is only a problem for the init scripts (EL6) and not for EL7 (systemd). The init script daemonizes the pulp_resource_manager process, and the init script can't know if the spawned process has "connected correctly to the db" or not because the pulp_resource_manager process has a wait-and-continue behavior when connecting to the database. As such the spawned process will always be in the running state if it daemonized correctly and didn't experience a fatal exception.

Given that, I propose the init script returns exit code 0 immediately if the spawned process daemonizes correctly and is running, and 1 otherwise. The init script should return at some point in both cases.

Actions #3

Updated by mhrivnak over 8 years ago

  • OS set to RHEL 6
Actions #4

Updated by mhrivnak over 8 years ago

To reproduce with pulp, you should be able to

service mongod stop
service pulp_resource_manager start
Actions #5

Updated by pcreech over 8 years ago

  • Status changed from NEW to MODIFIED

[root@localhost pulp_packaging]# service mongod stop
Stopping mongod: [ OK ]
[root@localhost pulp_packaging]# service pulp_resource_manager start
celery init v10.0.
Using config script: /etc/default/pulp_resource_manager
celery multi v3.1.11 (Cipater)

Starting nodes...

> : OK
[root@localhost pulp_packaging]#

Was not able to reproduce with latest build of pulp 2.8. It appears the work done in https://pulp.plan.io/issues/988 fixes the issue

Actions #6

Updated by amacdona@redhat.com over 8 years ago

  • Assignee set to pcreech
Actions #7

Updated by dkliban@redhat.com about 8 years ago

  • Status changed from MODIFIED to 5
Actions #8

Updated by dkliban@redhat.com about 8 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #10

Updated by bmbouter almost 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF