Issue #1290
closedmissing DB during boot causes resource manager to hang
Description
Copied from bugzilla:
Steps to Reproduce:
1. Ensure mongod cannot start properly. For instance, shutdown mongod then create a bogus /var/lib/mongod/mongod.lock pid file.
2. $ katello-service restart
Actual results:
$ katello-service restart
Stopping Qpid AMQP daemon: [ OK ]
Starting Qpid AMQP daemon: [ OK ]
Shutting down qdrouterd services: [ OK ]
Starting qdrouterd services: [ OK ]
celery multi v3.1.11 (Cipater)
> Stopping nodes...
> resource_manager@hostname: QUIT -> 9704
> Waiting for 1 node -> 9704.....
> resource_manager@hostname: OK
celery multi v3.1.11 (Cipater)
> Starting nodes...
> resource_manager@hostname: OK
celery init v10.0.
Using config script: /etc/default/pulp_resource_manager
Stopping mongod: [ OK ]
Starting mongod: [ OK ]
Waiting for mongod to become available: [FAILED]
Stopping elasticsearch: [ OK ]
Starting elasticsearch: [ OK ]
Stopping tomcat6: [ OK ]
Starting tomcat6: [ OK ]
Stopping foreman-proxy: [ OK ]
Starting foreman-proxy: [ OK ]
celery init v10.0.
Using configuration: /etc/default/pulp_workers, /etc/default/pulp_celerybeat
Restarting celery periodic task scheduler
Stopping pulp_celerybeat... OK
Starting pulp_celerybeat...
celery multi v3.1.11 (Cipater)
> Stopping nodes...
> reserved_resource_worker-0@hostname: QUIT -> 9936
> reserved_resource_worker-1@hostname: QUIT -> 9962
> Waiting for 2 nodes -> 9936, 9962......
> reserved_resource_worker-0@hostname: OK
> Waiting for 1 node -> 9962....
> reserved_resource_worker-1@hostname: OK
celery multi v3.1.11 (Cipater)
> Starting nodes...
> reserved_resource_worker-0@hostname: No handlers could be found for logger "pulp.server.db.connection"
*** gets stuck here indefinitely until user CTRL-c's ***
Expected results:
Even if a database connection cannot be established the startup should still return.
Updated by bmbouter over 8 years ago
I suspect this is only a problem for the init scripts (EL6) and not for EL7 (systemd). The init script daemonizes the pulp_resource_manager process, and the init script can't know if the spawned process has "connected correctly to the db" or not because the pulp_resource_manager process has a wait-and-continue behavior when connecting to the database. As such the spawned process will always be in the running state if it daemonized correctly and didn't experience a fatal exception.
Given that, I propose the init script returns exit code 0 immediately if the spawned process daemonizes correctly and is running, and 1 otherwise. The init script should return at some point in both cases.
Updated by mhrivnak over 8 years ago
To reproduce with pulp, you should be able to
service mongod stop
service pulp_resource_manager start
Updated by pcreech over 8 years ago
- Status changed from NEW to MODIFIED
[root@localhost pulp_packaging]# service mongod stop
Stopping mongod: [ OK ]
[root@localhost pulp_packaging]# service pulp_resource_manager start
celery init v10.0.
Using config script: /etc/default/pulp_resource_manager
celery multi v3.1.11 (Cipater)
Starting nodes...
> resource_manager@localhost.localdomain: OK
[root@localhost pulp_packaging]#
Was not able to reproduce with latest build of pulp 2.8. It appears the work done in https://pulp.plan.io/issues/988 fixes the issue
Updated by dkliban@redhat.com about 8 years ago
- Status changed from MODIFIED to 5
Updated by dkliban@redhat.com about 8 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE