Project

Profile

Help

Issue #7847

closed

Worker has gone missing during migration

Added by iballou over 3 years ago. Updated about 3 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Platform Release:
OS:
Triaged:
No
Groomed:
No
Sprint Candidate:
No
Tags:
Katello
Sprint:
Sprint 91
Quarter:

Description

I'm trying to migrate Pulp 2 -> Pulp 3. The migration has been working consistently until I tried 180K RPMs and 31 repos in Katello. Now, consistently, after about 15 minutes, the migration fails with the following errors:

...
Nov 17 14:52:49 centos7-katello-nightly-2 pulpcore-api: - - [17/Nov/2020:14:52:49 +0000] "GET /pulp/api/v3/tasks/dfa912b7-6b4c-4d29-b546-4926e8d00a93/ HTTP/1.1" 200 3678 "-" "OpenAPI-Generator/3.7.1/ruby"
Nov 17 14:52:50 centos7-katello-nightly-2 pulpcore-api: - - [17/Nov/2020:14:52:50 +0000] "GET /pulp/api/v3/task-groups/5a30cf02-4354-4b72-a8c5-a3d0d4ce6fcb/ HTTP/1.1" 200 440 "-" "OpenAPI-Generator/3.7.1/ruby"
Nov 17 14:53:06 centos7-katello-nightly-2 pulpcore-api: - - [17/Nov/2020:14:53:06 +0000] "GET /pulp/api/v3/tasks/dfa912b7-6b4c-4d29-b546-4926e8d00a93/ HTTP/1.1" 200 3678 "-" "OpenAPI-Generator/3.7.1/ruby"
Nov 17 14:53:06 centos7-katello-nightly-2 pulpcore-api: - - [17/Nov/2020:14:53:06 +0000] "GET /pulp/api/v3/task-groups/5a30cf02-4354-4b72-a8c5-a3d0d4ce6fcb/ HTTP/1.1" 200 440 "-" "OpenAPI-Generator/3.7.1/ruby"
Nov 17 14:53:52 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22312@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:52 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22312@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-7: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22311@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-5: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22311@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-5: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22311@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-7: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22311@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-resource-manager: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22311@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-resource-manager: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22311@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-2: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22311@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-2: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22311@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22319@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22319@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-6: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22311@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-6: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22311@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-8: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22311@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-8: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22311@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-1: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22311@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-1: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22311@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-3: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22311@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-3: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22311@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22323@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22323@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22315@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22315@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker 'resource-manager' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named resource-manager is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22307@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22307@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22314@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22314@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22310@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22310@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: Worker '22311@centos7-katello-nightly-2.cannolo.example.com' has gone missing, removing from list of workers
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22311@centos7-katello-nightly-2.cannolo.example.com is missing. Canceling the tasks in its queue.
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-api: - - [17/Nov/2020:14:53:54 +0000] "GET /pulp/api/v3/tasks/dfa912b7-6b4c-4d29-b546-4926e8d00a93/ HTTP/1.1" 200 7283 "-" "OpenAPI-Generator/3.7.1/ruby"
Nov 17 14:53:54 centos7-katello-nightly-2 pulpcore-api: - - [17/Nov/2020:14:53:54 +0000] "GET /pulp/api/v3/task-groups/5a30cf02-4354-4b72-a8c5-a3d0d4ce6fcb/ HTTP/1.1" 200 440 "-" "OpenAPI-Generator/3.7.1/ruby"
Nov 17 14:54:23 centos7-katello-nightly-2 pulpcore-worker-7: pulp: pulpcore.tasking.util:INFO: Task canceled: dfa912b7-6b4c-4d29-b546-4926e8d00a93.
Nov 17 14:54:23 centos7-katello-nightly-2 pulpcore-worker-6: pulp: pulpcore.tasking.util:INFO: Task canceled: dfa912b7-6b4c-4d29-b546-4926e8d00a93.
Nov 17 14:54:23 centos7-katello-nightly-2 pulpcore-resource-manager: pulp: pulpcore.tasking.util:INFO: Task canceled: dfa912b7-6b4c-4d29-b546-4926e8d00a93.
Nov 17 14:54:24 centos7-katello-nightly-2 pulpcore-worker-1: pulp: pulpcore.tasking.util:INFO: Task canceled: dfa912b7-6b4c-4d29-b546-4926e8d00a93.
Nov 17 14:54:24 centos7-katello-nightly-2 pulpcore-worker-8: pulp: pulpcore.tasking.util:INFO: Task canceled: dfa912b7-6b4c-4d29-b546-4926e8d00a93.
Nov 17 14:54:24 centos7-katello-nightly-2 pulpcore-worker-5: pulp: pulpcore.tasking.util:INFO: Task canceled: dfa912b7-6b4c-4d29-b546-4926e8d00a93.
Nov 17 14:54:24 centos7-katello-nightly-2 pulpcore-api: - - [17/Nov/2020:14:54:24 +0000] "GET /pulp/api/v3/tasks/dfa912b7-6b4c-4d29-b546-4926e8d00a93/ HTTP/1.1" 200 7284 "-" "OpenAPI-Generator/3.7.1/ruby"
Nov 17 14:54:24 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.util:INFO: Task canceled: dfa912b7-6b4c-4d29-b546-4926e8d00a93.
Nov 17 14:54:24 centos7-katello-nightly-2 pulpcore-worker-2: pulp: pulpcore.tasking.util:INFO: Task canceled: dfa912b7-6b4c-4d29-b546-4926e8d00a93.
Nov 17 14:54:24 centos7-katello-nightly-2 pulpcore-worker-4: pulp: rq.worker:INFO: Cleaning registries for queue: 22311@centos7-katello-nightly-2.cannolo.example.com
Nov 17 14:54:24 centos7-katello-nightly-2 pulpcore-worker-3: pulp: pulpcore.tasking.util:INFO: Task canceled: dfa912b7-6b4c-4d29-b546-4926e8d00a93.
Nov 17 14:54:24 centos7-katello-nightly-2 pulpcore-worker-4: pulp: rq.worker:INFO: 22311@centos7-katello-nightly-2.cannolo.example.com: e698be4e-6943-4f7c-9614-07e7f81c2265
Nov 17 14:54:24 centos7-katello-nightly-2 pulpcore-api: - - [17/Nov/2020:14:54:24 +0000] "GET /pulp/api/v3/task-groups/5a30cf02-4354-4b72-a8c5-a3d0d4ce6fcb/ HTTP/1.1" 200 440 "-" "OpenAPI-Generator/3.7.1/ruby"
Nov 17 14:54:24 centos7-katello-nightly-2 pulpcore-worker-4: pulp: rq.worker:INFO: 22311@centos7-katello-nightly-2.cannolo.example.com: Job OK (e698be4e-6943-4f7c-9614-07e7f81c2265)
Nov 17 14:54:39 centos7-katello-nightly-2 pulpcore-worker-6: pulp: pulpcore.tasking.services.worker_watcher:INFO: Worker '22312@centos7-katello-nightly-2.cannolo.example.com' is back online.
Nov 17 14:54:39 centos7-katello-nightly-2 pulpcore-worker-6: pulp: pulpcore.tasking.services.worker_watcher:ERROR: There are 0 pulpcore-resource-manager processes running. Pulp will not operate correctly without at least one pulpcore-resource-mananger process running.
Nov 17 14:54:39 centos7-katello-nightly-2 pulpcore-worker-8: pulp: pulpcore.tasking.services.worker_watcher:INFO: Worker '22319@centos7-katello-nightly-2.cannolo.example.com' is back online.
Nov 17 14:54:39 centos7-katello-nightly-2 pulpcore-worker-8: pulp: pulpcore.tasking.services.worker_watcher:ERROR: There are 0 pulpcore-resource-manager processes running. Pulp will not operate correctly without at least one pulpcore-resource-mananger process running.
Nov 17 14:54:39 centos7-katello-nightly-2 pulpcore-worker-7: pulp: pulpcore.tasking.services.worker_watcher:INFO: Worker '22315@centos7-katello-nightly-2.cannolo.example.com' is back online.
Nov 17 14:54:39 centos7-katello-nightly-2 pulpcore-worker-7: pulp: pulpcore.tasking.services.worker_watcher:ERROR: There are 0 pulpcore-resource-manager processes running. Pulp will not operate correctly without at least one pulpcore-resource-mananger process running.
Nov 17 14:54:39 centos7-katello-nightly-2 pulpcore-worker-4: pulp: pulpcore.tasking.services.worker_watcher:ERROR: There are 0 pulpcore-resource-manager processes running. Pulp will not operate correctly without at least one pulpcore-resource-mananger process running.
Nov 17 14:54:39 centos7-katello-nightly-2 pulpcore-worker-2: pulp: pulpcore.tasking.services.worker_watcher:INFO: Worker '22314@centos7-katello-nightly-2.cannolo.example.com' is back online.
Nov 17 14:54:39 centos7-katello-nightly-2 pulpcore-resource-manager: pulp: pulpcore.tasking.services.worker_watcher:INFO: Worker 'resource-manager' is back online.
Actions #1

Updated by iballou over 3 years ago

My machine has 32 GB RAM, and I haven't seen the memory get close to that, so I don't think it's a memory problem.

Actions #2

Updated by iballou over 3 years ago

pulp-2to3-migration (0.5.1)
pulp-certguard (1.0.3)
pulp-container (2.1.0)
pulp-file (1.3.0)
pulp-rpm (3.7.0)
pulpcore (3.7.3)

python3-createrepo_c-0.16.2-1.el7.x86_64
createrepo_c-libs-0.16.2-1.el7.x86_64
python2-createrepo_c-0.16.2-1.el7.x86_64
createrepo-0.9.9-28.el7.noarch
createrepo_c-0.16.2-1.el7.x86_64
createrepo_c-debuginfo-0.16.2-1.el7.x86_64
createrepo_c-devel-0.16.2-1.el7.x86_64
Actions #3

Updated by iballou over 3 years ago

/var/lib/pulp is on a separate virtual disk and symlinked

Actions #4

Updated by ggainey about 3 years ago

I (inadvertently) recreated this, let me try listing the stpes to help whoever is looking into it.

I was on a pulp-developer vagrant box, testing pulp-2to3-migration, with a LARGE pulp2 dataset.

Steps to get to where I hit the problem:

  • Clone pulpcore, pulp_rpm, pulp_installer, and pulp-2to3-migration from github.
  • get The Big Dataset - this is 7.5GB of mongo, ~260K RPMs sync'd from the repo-list in issue #7537
    • download from grgainey's GDrive
    • put mongo_dumps directory into your pulpcore checkout
  • cd pulp_installer
    • cp example.dev-config.yml local.dev-config.yml
    • make changes in local.dev:
      • uncomment pulp_rpm and pulp-2to3-migration
      • add "memory: 16384"
    • vagrant up pulp2-nightly-pulp3-source-centos7
    • vagrant ssh pulp2-nightly-pulp3-source-centos7

You are now on the 2to3 vagrant box. Pulp2 and Pulp3 are both installed and running.

  • cd /home/vagrant/devel/pulpcore/mongo_dumps
  • mongorestore --gzip . # it's 7.5GB - got get breakfast
  • edit /etc/pulp/settings.py according to the 2to3 migration doc
  • create a migration-plan for "all rpms"
    • http POST :24817/pulp/api/v3/migration-plans/ plan='{"plugins": [{"type": "rpm"}]}'
  • run the plan
    • http POST :/pulp/api/v3/migration-plans/<UUID-returned-above>/run/ skip_corrupted=true
    • NOTE: this will take A LONG TIME (ie, 5+ hours)
  • monitor progress of the task
    • http :/pulp/api/v3/tasks/<UUID-returned-above>/
  • watch journalctl for errors like the following:
Jan 27 17:27:35 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com gunicorn[21974]: [2021-01-27 17:27:35 +0000] [21974] [CRITICAL] WORKER TIMEOUT (pid:5365)
...
Jan 27 18:46:32 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[21780]: pulp [None]: pulpcore.tasking.worker_watcher:ERROR: The worker named resource-manager is missing. Canceling the tasks in its queue.
Actions #5

Updated by ttereshc about 3 years ago

  • Priority changed from Normal to High
Actions #6

Updated by ttereshc about 3 years ago

  • Sprint/Milestone set to 0.9.0
Actions #8

Updated by bmbouter about 3 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to bmbouter
  • Sprint set to Sprint 90
Actions #9

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 90 to Sprint 91
Actions #10

Updated by bmbouter about 3 years ago

  • Status changed from ASSIGNED to POST

Added by bmbouter about 3 years ago

Revision d05de133 | View on GitHub

Adds notes about worker timeouts in config docs

closes #7847

Added by bmbouter about 3 years ago

Revision d05de133 | View on GitHub

Adds notes about worker timeouts in config docs

closes #7847

Added by bmbouter about 3 years ago

Revision d05de133 | View on GitHub

Adds notes about worker timeouts in config docs

closes #7847

Actions #11

Updated by bmbouter about 3 years ago

  • Status changed from POST to MODIFIED
Actions #12

Updated by pulpbot about 3 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF