Refactor #131
Updated by bmbouter almost 10 years ago
**The Problem** Throughout the production and test code for Pulp and its plugins there are many calls to pulp.server.db.connection.initialize(). Starting with Pulp 2.6.0, mongoengine caches the last initialized connection as the connection all database interaction will use. At test run-time can cause the testing database to mistakenly attach to the production database through liberal calls to connection.initialize() that occur without the server.conf being mocked. This causes horrible things to occur to Pulp development environments. Also, its sloppy to not have a structured way to connect to the database. This also leads to a second problem which is that many of these calls to pulp.server.db.connection.initialize() are at the module level which means that simply by importing that code you are connecting to the database! A very unexpected side-effect that makes the above test-environment database switching occur depending on what you import. Ouch! **The goal** Move all of the pulp.server.db.connection.initialize() calls to the required entry point for all Pulp components to run. Each component that starts shall make exactly one call to pulp.server.db.connection.initialize() as part of the entry startup. **The Entry Points** We need to consider entry points for both the development and production uses where Pulp code will be run that need the database connection to already be initialized. * The pulp-server WSGI app. "/srv/pulp/webservices.wsgi":https://github.com/pulp/pulp/blob/master/server/srv/pulp/webservices.wsgi * The pulp_celerybeat entry point "pulp.server.async.scheduler.Scheduler":https://github.com/pulp/pulp/blob/master/server/pulp/server/async/scheduler.py#L241 * The pulp_workers entry point. This is shared by both pulp_workers and pulp_resource_manager. "pulp.server.async.app":https://github.com/pulp/pulp/blob/master/server/pulp/server/async/app.py * The repo protection WSGI entry point. "/srv/pulp/repo_auth.wsgi":https://github.com/pulp/pulp_rpm/blob/master/plugins/srv/pulp/repo_auth.wsgi * The pulp_puppet forge pre 33 API WSGI app. "/srv/pulp/puppet_forge_pre33_api.wsgi":https://github.com/pulp/pulp_puppet/blob/master/pulp_puppet_plugins/srv/pulp/puppet_forge_pre33_api.wsgi * The pulp_puppet forge post 33 API WSGI app. "/srv/pulp/puppet_forge_post33_api.wsgi":https://github.com/pulp/pulp_puppet/blob/master/pulp_puppet_plugins/srv/pulp/puppet_forge_post33_api.wsgi * The test runner used by platform and all plugins "pulp.devel.test_runner.run_tests":https://github.com/pulp/pulp/blob/master/devel/pulp/devel/test_runner.py#L13 **Steps** 1. Each one of the entry point components above need to have exactly one call to pulp.server.db.connection.initialize() 2. All other calls to pulp.server.db.connection.initialize() needs to be removed. This includes all platform and all plugins. Search and destroy! 3. Rework pulp.server.db.connection.PulpCollectionFailure to be PulpDatabaseFailure instead. 4. "pulp.server.db.connection.initialize()":https://github.com/pulp/pulp/blob/master/server/pulp/server/db/connection.py#L30 needs to have a behavior added that raises an Exception if it has already been called. This should enforce that a second call to initialize() will fail hard. Maybe have the exception live in the db.connection similar to PulpDatabaseFailure from step (3). 5. 3. Rework pulp.server.db.connection.PulpCollectionFailure to be PulpDatabaseFailure instead. 4. Add a behavior to "get_collection()":https://github.com/pulp/pulp/blob/master/server/pulp/server/db/connection.py#L229 "get_database()":https://github.com/pulp/pulp/blob/master/server/pulp/server/db/connection.py#L242 and "get_connection()":https://github.com/pulp/pulp/blob/master/server/pulp/server/db/connection.py#L250 whereby if the database has not been initialized successfully a PulpDatabaseFailure (step 3) is raised. This will ensure that when the database is to be used initialize has been called at least 1 time. 6. 5. Add test coverage for changes 2-5. 2-4. **Testing Concerns** Any testing code should rely on the testing entry point to provide the call to connection.initialize(). If the test runner provides the call to initialize() then an exception will be raised on the second call to connection.initialize(). * The WSGI scripts are so simple they likely don't need to be tested at all. * The celery worker entry point is also very simple and can likely not be tested. * The celerybeat entry point can be tested because it is guarded by "scheduler.setup_schedules":https://github.com/pulp/pulp/blob/master/server/pulp/server/async/scheduler.py#L333 method. We can mock the connection.initialize() before we call setup_schedules().