Actions
Issue #2545
closedpulp-manage-db wait time calculation can end up with many hours of wait time
Start date:
Due date:
Estimated time:
Severity:
4. Urgent
Version:
2.12.0
Platform Release:
2.12.0
OS:
Triaged:
No
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:
Description
Running the automation jobs showed that pulp-manage-db can wait for hours instead of to wait for up to 92 seconds (checked constants.MIGRATION_WAIT_TIME
for this information).
I've ssh into the machine where pulp-manage-db is running for 3 hours and counting, the running time was calculated by checking ps and date as demonstrated below:
$ ps aux | grep pulp-manage-db
root 8505 0.0 0.0 193332 2788 ? S 09:35 0:00 sudo -u apache pulp-manage-db
$ date
Wed Jan 25 12:41:43 EST 2017
I did some investigation and, for example, the pulp_resource_manager is inactive for the same amount of time:
$ systemctl status pulp_resource_manager
● pulp_resource_manager.service - Pulp Resource Manager
Loaded: loaded (/usr/lib/systemd/system/pulp_resource_manager.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Wed 2017-01-25 09:33:30 EST; 3h 10min ago
Main PID: 5252 (code=exited, status=0/SUCCESS)
Running pulp-manage-db again shows that it will wait for about more 2 hours:
$ sudo -u apache pulp-manage-db
Attempting to connect to localhost:27017
Attempting to connect to localhost:27017
Write concern for Mongo connection: {}
The following processes might still be running:
scheduler@host-172-16-46-204.openstacklocal
reserved_resource_worker-0@host-172-16-46-204.openstacklocal
reserved_resource_worker-1@host-172-16-46-204.openstacklocal
resource_manager@host-172-16-46-204.openstacklocal
Please wait 6570 seconds while Pulp confirms this.^C
Traceback (most recent call last):
File "/bin/pulp-manage-db", line 9, in <module>
load_entry_point('pulp-server==2.12c2', 'console_scripts', 'pulp-manage-db')()
File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 214, in main
time.sleep(1)
KeyboardInterrupt
I decide to check the source code and run a python shell session to check what the code is doing and the math there is making the wait time be that long:
$ sudo -u apache python
>>> from pulp.server.managers import status
>>> from pulp.server.db import connection
>>> from pulp.server.db.fields import UTCDateTimeField
>>> from datetime import datetime, timedelta
>>> from pulp.common import constants
>>> connection.initialize()
>>> active_workers = status.get_workers()
>>> last_worker_time = max([worker['last_heartbeat'] for worker in active_workers])
>>> last_worker_time
datetime.datetime(2017, 1, 25, 14, 33, 22, 289000, tzinfo=<isodate.tzinfo.Utc object at 0x1f83510>)
>>> time_from_last_worker = UTCDateTimeField().to_python(datetime.now()) - last_worker_time
>>> time_from_last_worker
datetime.timedelta(-1, 79217, 994870)
>>> time_from_last_worker.seconds
79217
>>> wait_time = timedelta(seconds=constants.MIGRATION_WAIT_TIME) - time_from_last_worker
>>> wait_time
datetime.timedelta(0, 7241, 855873)
>>> wait_time.seconds
7241
>>> str(wait_time)
'2:00:41.855873'
>>> constants.MIGRATION_WAIT_TIME
92
Actions
Fix time_from_last_worker calculation
Call
datetime.utcnow()
to get the current time because thelast_heartbeat
is stored on database on UTC timezone.Fix #2545