Issue #2472
closedpulp-manage-db does not clear the running worker flag even after the 300 seconds limit when upgrading
Description
When upgrading Pulp to 2.11 pulp-manage-db does not identify that worker are stopped for more than 300 seconds.
The following output is from a RHEL7 system which has Pulp upgraded from Pulp 2.10:
# systemctl status httpd pulp_celerybeat pulp_resource_manager pulp_workers
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Mon 2016-12-05 09:08:45 EST; 4h 48min ago
Docs: man:httpd(8)
man:apachectl(8)
Main PID: 17163 (code=exited, status=0/SUCCESS)
Status: "Total requests: 0; Current requests/sec: 0; Current traffic: 0 B/sec"
● pulp_celerybeat.service - Pulp's Celerybeat
Loaded: loaded (/usr/lib/systemd/system/pulp_celerybeat.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Mon 2016-12-05 09:08:55 EST; 4h 48min ago
Main PID: 17658 (code=exited, status=0/SUCCESS)
● pulp_resource_manager.service - Pulp Resource Manager
Loaded: loaded (/usr/lib/systemd/system/pulp_resource_manager.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Mon 2016-12-05 09:09:00 EST; 4h 48min ago
Main PID: 17701 (code=exited, status=0/SUCCESS)
● pulp_workers.service - Pulp Celery Workers
Loaded: loaded (/usr/lib/systemd/system/pulp_workers.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Mon 2016-12-05 09:08:51 EST; 4h 48min ago
Main PID: 17436 (code=exited, status=0/SUCCESS)
# sudo -u apache pulp-manage-db
Attempting to connect to localhost:27017
Attempting to connect to localhost:27017
Write concern for Mongo connection: {}
There are still running workers, continuing could corrupt your Pulp installation. Are you sure you wish to continue? (y/N): n
Even after 4+ hours that the processes were stopped the pulp-manage-db still asking if the user wants to continue.
This is related to #2468.
Files
Updated by elyezer almost 8 years ago
I have tried to run manually the same commands pulp-manage-db runs in order to check for the last time workers checked out and after doing that the pulp-manage-db worked fine:
# sudo -u apache python
Python 2.7.5 (default, Aug 2 2016, 04:20:16)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pulp.server.managers import status
>>> from pulp.server.db import connection
>>> connection.initialize(max_timeout=1)
>>> workers = status.get_workers()
>>> from pulp.server.db.fields import UTCDateTimeField
>>> from datetime import datetime
>>> now = UTCDateTimeField().to_python(datetime.now())
>>> from datetime import timedelta
>>> [now - worker['last_heartbeat'] < timedelta(seconds=300) for worker in workers]
[False, False]
>>> exit()
# sudo -u apache pulp-manage-db
Attempting to connect to localhost:27017
Attempting to connect to localhost:27017
Write concern for Mongo connection: {}
Loading content types.
Loading type descriptors []
Parsing type descriptors
Validating type descriptor syntactic integrity
Validating type descriptor semantic integrity
Loading unit model: python_package = pulp_python.plugins.models:Package
Loading unit model: docker_blob = pulp_docker.plugins.models:Blob
Loading unit model: docker_manifest = pulp_docker.plugins.models:Manifest
Loading unit model: docker_image = pulp_docker.plugins.models:Image
Loading unit model: docker_tag = pulp_docker.plugins.models:Tag
Loading unit model: ostree = pulp_ostree.plugins.db.model:Branch
Loading unit model: erratum = pulp_rpm.plugins.db.models:Errata
Loading unit model: distribution = pulp_rpm.plugins.db.models:Distribution
Loading unit model: srpm = pulp_rpm.plugins.db.models:SRPM
Loading unit model: package_group = pulp_rpm.plugins.db.models:PackageGroup
Loading unit model: package_category = pulp_rpm.plugins.db.models:PackageCategory
Loading unit model: iso = pulp_rpm.plugins.db.models:ISO
Loading unit model: package_environment = pulp_rpm.plugins.db.models:PackageEnvironment
Loading unit model: drpm = pulp_rpm.plugins.db.models:DRPM
Loading unit model: package_langpacks = pulp_rpm.plugins.db.models:PackageLangpacks
Loading unit model: rpm = pulp_rpm.plugins.db.models:RPM
Loading unit model: yum_repo_metadata_file = pulp_rpm.plugins.db.models:YumMetadataFile
Loading unit model: puppet_module = pulp_puppet.plugins.db.models:Module
Updating the database with types []
Found the following type definitions that were not present in the update collection [puppet_module, drpm, ostree, package_langpacks, erratum, docker_blob, docker_manifest, yum_repo_metadata_file, package_group, package_category, iso, package_environment, docker_tag, python_package, srpm, rpm, distribution, docker_image]
Updating the database with types [puppet_module, drpm, ostree, package_langpacks, docker_manifest, python_package, erratum, yum_repo_metadata_file, package_group, docker_blob, package_category, iso, package_environment, docker_tag, distribution, rpm, srpm, docker_image]
Content types loaded.
Ensuring the admin role and user are in place.
Admin role and user are in place.
Beginning database migrations.
Applying pulp.server.db.migrations version 27
/usr/lib/python2.7/site-packages/mongoengine/document.py:367: DeprecationWarning: update is deprecated. Use replace_one, update_one or update_many instead.
upsert=upsert, **write_concern)
Migration to pulp.server.db.migrations version 27 complete.
Migration package pulp_docker.plugins.migrations is up to date at version 2
Migration package pulp_puppet.plugins.migrations is up to date at version 5
Migration package pulp_python.plugins.migrations is up to date at version 1
Migration package pulp_rpm.plugins.migrations is up to date at version 36
Loading unit model: python_package = pulp_python.plugins.models:Package
Loading unit model: docker_blob = pulp_docker.plugins.models:Blob
Loading unit model: docker_manifest = pulp_docker.plugins.models:Manifest
Loading unit model: docker_image = pulp_docker.plugins.models:Image
Loading unit model: docker_tag = pulp_docker.plugins.models:Tag
Loading unit model: ostree = pulp_ostree.plugins.db.model:Branch
Loading unit model: erratum = pulp_rpm.plugins.db.models:Errata
Loading unit model: distribution = pulp_rpm.plugins.db.models:Distribution
Loading unit model: srpm = pulp_rpm.plugins.db.models:SRPM
Loading unit model: package_group = pulp_rpm.plugins.db.models:PackageGroup
Loading unit model: package_category = pulp_rpm.plugins.db.models:PackageCategory
Loading unit model: iso = pulp_rpm.plugins.db.models:ISO
Loading unit model: package_environment = pulp_rpm.plugins.db.models:PackageEnvironment
Loading unit model: drpm = pulp_rpm.plugins.db.models:DRPM
Loading unit model: package_langpacks = pulp_rpm.plugins.db.models:PackageLangpacks
Loading unit model: rpm = pulp_rpm.plugins.db.models:RPM
Loading unit model: yum_repo_metadata_file = pulp_rpm.plugins.db.models:YumMetadataFile
Loading unit model: puppet_module = pulp_puppet.plugins.db.models:Module
Database migrations complete.
I am starting fresh to make sure the issue is reproducible.
Updated by bmbouter almost 8 years ago
I would like to see a simple reproducer posted on a fresh 2.11 install (not an upgrade). This would let us disambiguate between an upgrade issue versus an issue with the feature.
Updated by bizhang almost 8 years ago
- File 2472_reproducer.sh 2472_reproducer.sh added
Here is a super simple reproducer we can use to test this:
Run it when all pulp services are running and it kills all celery processes, sleeps for ~5 minutes, and then calls pulp-manage-db
Updated by elyezer almost 8 years ago
This is an upgrade issue, running the reproducer script on a fresh 2.11 install works good:
# ./2472_reproducer.sh
+----------------------------------------------------------------------+
Status of the server
+----------------------------------------------------------------------+
Api Version: 2
Database Connection:
Connected: True
Known Workers:
_id: reserved_resource_worker-0@sat-qe-4.rhq.lab.eng.bos.redhat.com
_ns: workers
Last Heartbeat: 2016-12-05T22:39:52Z
_id: reserved_resource_worker-1@sat-qe-4.rhq.lab.eng.bos.redhat.com
_ns: workers
Last Heartbeat: 2016-12-05T22:39:52Z
_id: reserved_resource_worker-4@sat-qe-4.rhq.lab.eng.bos.redhat.com
_ns: workers
Last Heartbeat: 2016-12-05T22:39:52Z
_id: reserved_resource_worker-3@sat-qe-4.rhq.lab.eng.bos.redhat.com
_ns: workers
Last Heartbeat: 2016-12-05T22:39:52Z
_id: reserved_resource_worker-8@sat-qe-4.rhq.lab.eng.bos.redhat.com
_ns: workers
Last Heartbeat: 2016-12-05T22:39:52Z
_id: reserved_resource_worker-7@sat-qe-4.rhq.lab.eng.bos.redhat.com
_ns: workers
Last Heartbeat: 2016-12-05T22:39:52Z
_id: scheduler@sat-qe-4.rhq.lab.eng.bos.redhat.com
_ns: workers
Last Heartbeat: 2016-12-05T22:39:54Z
_id: reserved_resource_worker-5@sat-qe-4.rhq.lab.eng.bos.redhat.com
_ns: workers
Last Heartbeat: 2016-12-05T22:39:52Z
_id: reserved_resource_worker-9@sat-qe-4.rhq.lab.eng.bos.redhat.com
_ns: workers
Last Heartbeat: 2016-12-05T22:39:52Z
_id: reserved_resource_worker-6@sat-qe-4.rhq.lab.eng.bos.redhat.com
_ns: workers
Last Heartbeat: 2016-12-05T22:39:52Z
_id: reserved_resource_worker-10@sat-qe-4.rhq.lab.eng.bos.redhat.co
m
_ns: workers
Last Heartbeat: 2016-12-05T22:39:52Z
_id: reserved_resource_worker-11@sat-qe-4.rhq.lab.eng.bos.redhat.co
m
_ns: workers
Last Heartbeat: 2016-12-05T22:39:52Z
_id: reserved_resource_worker-2@sat-qe-4.rhq.lab.eng.bos.redhat.com
_ns: workers
Last Heartbeat: 2016-12-05T22:39:52Z
_id: resource_manager@sat-qe-4.rhq.lab.eng.bos.redhat.com
_ns: workers
Last Heartbeat: 2016-12-05T22:39:57Z
Messaging Connection:
Connected: True
Versions:
Platform Version: 2.11b4
+----------------------------------------------------------------------+
Status of the server
+----------------------------------------------------------------------+
Api Version: 2
Database Connection:
Connected: True
Known Workers:
Messaging Connection:
Connected: True
Versions:
Platform Version: 2.11b4
Attempting to connect to localhost:27017
Attempting to connect to localhost:27017
Write concern for Mongo connection: {}
Loading content types.
Loading type descriptors []
Parsing type descriptors
Validating type descriptor syntactic integrity
Validating type descriptor semantic integrity
Loading unit model: docker_blob = pulp_docker.plugins.models:Blob
Loading unit model: docker_manifest = pulp_docker.plugins.models:Manifest
Loading unit model: docker_image = pulp_docker.plugins.models:Image
Loading unit model: docker_tag = pulp_docker.plugins.models:Tag
Loading unit model: erratum = pulp_rpm.plugins.db.models:Errata
Loading unit model: distribution = pulp_rpm.plugins.db.models:Distribution
Loading unit model: srpm = pulp_rpm.plugins.db.models:SRPM
Loading unit model: package_group = pulp_rpm.plugins.db.models:PackageGroup
Loading unit model: package_category = pulp_rpm.plugins.db.models:PackageCategory
Loading unit model: iso = pulp_rpm.plugins.db.models:ISO
Loading unit model: package_environment = pulp_rpm.plugins.db.models:PackageEnvironment
Loading unit model: drpm = pulp_rpm.plugins.db.models:DRPM
Loading unit model: package_langpacks = pulp_rpm.plugins.db.models:PackageLangpacks
Loading unit model: rpm = pulp_rpm.plugins.db.models:RPM
Loading unit model: yum_repo_metadata_file = pulp_rpm.plugins.db.models:YumMetadataFile
Loading unit model: puppet_module = pulp_puppet.plugins.db.models:Module
Loading unit model: ostree = pulp_ostree.plugins.db.model:Branch
Loading unit model: python_package = pulp_python.plugins.models:Package
Updating the database with types []
Found the following type definitions that were not present in the update collection [puppet_module, drpm, ostree, package_langpacks, erratum, docker_blob, docker_manifest, yum_repo_metadata_file, package_group, package_category, iso, package_environment, docker_tag, python_package, srpm, rpm, distribution, docker_image]
Updating the database with types [puppet_module, drpm, ostree, package_langpacks, erratum, docker_blob, docker_manifest, yum_repo_metadata_file, package_group, package_category, iso, package_environment, docker_tag, python_package, distribution, rpm, srpm, docker_image]
Content types loaded.
Ensuring the admin role and user are in place.
Admin role and user are in place.
Beginning database migrations.
Migration package pulp.server.db.migrations is up to date at version 27
Migration package pulp_docker.plugins.migrations is up to date at version 2
Migration package pulp_puppet.plugins.migrations is up to date at version 5
Migration package pulp_python.plugins.migrations is up to date at version 1
Migration package pulp_rpm.plugins.migrations is up to date at version 36
Loading unit model: docker_blob = pulp_docker.plugins.models:Blob
Loading unit model: docker_manifest = pulp_docker.plugins.models:Manifest
Loading unit model: docker_image = pulp_docker.plugins.models:Image
Loading unit model: docker_tag = pulp_docker.plugins.models:Tag
Loading unit model: erratum = pulp_rpm.plugins.db.models:Errata
Loading unit model: distribution = pulp_rpm.plugins.db.models:Distribution
Loading unit model: srpm = pulp_rpm.plugins.db.models:SRPM
Loading unit model: package_group = pulp_rpm.plugins.db.models:PackageGroup
Loading unit model: package_category = pulp_rpm.plugins.db.models:PackageCategory
Loading unit model: iso = pulp_rpm.plugins.db.models:ISO
Loading unit model: package_environment = pulp_rpm.plugins.db.models:PackageEnvironment
Loading unit model: drpm = pulp_rpm.plugins.db.models:DRPM
Loading unit model: package_langpacks = pulp_rpm.plugins.db.models:PackageLangpacks
Loading unit model: rpm = pulp_rpm.plugins.db.models:RPM
Loading unit model: yum_repo_metadata_file = pulp_rpm.plugins.db.models:YumMetadataFile
Loading unit model: puppet_module = pulp_puppet.plugins.db.models:Module
Loading unit model: ostree = pulp_ostree.plugins.db.model:Branch
Loading unit model: python_package = pulp_python.plugins.models:Package
Database migrations complete.
Updated by elyezer almost 8 years ago
I think this should be a release blocker. #2469 asks for adding a new flag which is good but upgrades should just work and does not require any additional flag to be specified if the services are shutdown properly.
Updated by elyezer almost 8 years ago
After some conversation we decided that the blocker was #2469.
Updated by semyers almost 8 years ago
Have we confirmed that this behavior is seen as a result of the problem that bmbouter explained over in https://pulp.plan.io/issues/2468#note-2, or is this is still a separate issue that we'll need to address in a future release of pulp? I'd still expect the 300-second maximum to work, whether upgrading to 2.11 or installing it directly, so that 4-hour described in the description worries me.
Updated by bmbouter almost 8 years ago
Since this was confirmed to work with a new installation and upgrade issues are being tracked by other issues, I propose this be closed as NOTABUG or WORKSFORME.
Updated by bizhang almost 8 years ago
- Status changed from NEW to CLOSED - NOTABUG
closed since the offending feature #2186 was removed