Project

Profile

Help

Issue #2545

closed

pulp-manage-db wait time calculation can end up with many hours of wait time

Added by elyezer almost 8 years ago. Updated over 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Urgent
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
4. Urgent
Version:
2.12.0
Platform Release:
2.12.0
OS:
Triaged:
No
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

Running the automation jobs showed that pulp-manage-db can wait for hours instead of to wait for up to 92 seconds (checked constants.MIGRATION_WAIT_TIME for this information).

I've ssh into the machine where pulp-manage-db is running for 3 hours and counting, the running time was calculated by checking ps and date as demonstrated below:

$ ps aux | grep pulp-manage-db

root      8505  0.0  0.0 193332  2788 ?        S    09:35   0:00 sudo -u apache pulp-manage-db

$ date
Wed Jan 25 12:41:43 EST 2017

I did some investigation and, for example, the pulp_resource_manager is inactive for the same amount of time:

$ systemctl status pulp_resource_manager
● pulp_resource_manager.service - Pulp Resource Manager
   Loaded: loaded (/usr/lib/systemd/system/pulp_resource_manager.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Wed 2017-01-25 09:33:30 EST; 3h 10min ago
 Main PID: 5252 (code=exited, status=0/SUCCESS)

Running pulp-manage-db again shows that it will wait for about more 2 hours:

$ sudo -u apache pulp-manage-db
Attempting to connect to localhost:27017
Attempting to connect to localhost:27017
Write concern for Mongo connection: {}

The following processes might still be running:
    scheduler@host-172-16-46-204.openstacklocal
    reserved_resource_worker-0@host-172-16-46-204.openstacklocal
    reserved_resource_worker-1@host-172-16-46-204.openstacklocal
    resource_manager@host-172-16-46-204.openstacklocal
Please wait 6570 seconds while Pulp confirms this.^C
Traceback (most recent call last):
  File "/bin/pulp-manage-db", line 9, in <module>
    load_entry_point('pulp-server==2.12c2', 'console_scripts', 'pulp-manage-db')()
  File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 214, in main
    time.sleep(1)
KeyboardInterrupt

I decide to check the source code and run a python shell session to check what the code is doing and the math there is making the wait time be that long:

$ sudo -u apache python

>>> from pulp.server.managers import status
>>> from pulp.server.db import connection
>>> from pulp.server.db.fields import UTCDateTimeField
>>> from datetime import datetime, timedelta
>>> from pulp.common import constants

>>> connection.initialize()
>>> active_workers = status.get_workers()

>>> last_worker_time = max([worker['last_heartbeat'] for worker in active_workers])
>>> last_worker_time
datetime.datetime(2017, 1, 25, 14, 33, 22, 289000, tzinfo=<isodate.tzinfo.Utc object at 0x1f83510>)

>>> time_from_last_worker = UTCDateTimeField().to_python(datetime.now()) - last_worker_time
>>> time_from_last_worker
datetime.timedelta(-1, 79217, 994870)
>>> time_from_last_worker.seconds
79217

>>> wait_time = timedelta(seconds=constants.MIGRATION_WAIT_TIME) - time_from_last_worker
>>> wait_time
datetime.timedelta(0, 7241, 855873)
>>> wait_time.seconds
7241
>>> str(wait_time)
'2:00:41.855873'

>>> constants.MIGRATION_WAIT_TIME
92

Also available in: Atom PDF