Issue #2849
closedWorker terminates abnormally while processing task
Description
When i am syncing manifests that have foreign layers, i eventually get worker get killed.
$ pulp-admin docker repo sync run --repo-id bz
Warning: path should have mode 0700 because it may contain sensitive information: /home/ipanova/.pulp/
+----------------------------------------------------------------------+
Synchronizing Repository [bz]
+----------------------------------------------------------------------+
This command may be exited via ctrl+c without affecting the request.
Downloading manifests
[\]
... completed
Copying units already in pulp
[-]
... completed
Copying units already in pulp
[-]
... completed
Downloading remote files
[= ] 3%
8 of 221 items
Task Failed
Worker terminated abnormally while processing task
42fabc46-17b3-419f-a79c-ccd0bcba1474. Check the logs for details
Pulp starts to panic when it hits 404.
We need to check if the registry closes the connection prematurely and investigate what causes the worker termination. It is not OOM kill.
Jun 28 12:34:14 ina pulp: nectar.downloaders.threaded:INFO: Download failed: Download of https://registry-1.docker.io/v2/portainer/portainer/blobs/sha256:6330793656b1565cd8b5d5c1e2a736a351dec39f7250daf31eb91e3a76cc872b failed with code 404: Not Found
Jun 28 12:34:14 ina pulp: celery.worker.job:ERROR: (11813-08608) Task pulp.server.managers.repo.sync.sync[b28a4117-b647-484f-a9da-45a15e06f09e] raised unexpected: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Related issues
Updated by ipanova@redhat.com over 7 years ago
- Subject changed from | to Worker terminates abnormally while processing task
Updated by bmbouter over 7 years ago
If it's not being killed by the OOM, is it segfaulting? I'm not 100% sure how to do this, but you can configure linux to create a coredump for any process that is segfaulting.
Updated by ttereshc over 7 years ago
- Priority changed from Normal to High
- Severity changed from 2. Medium to 3. High
- Triaged changed from No to Yes
Updated by kdelee@redhat.com over 7 years ago
- Related to Issue #2966: Unable to sync docker repo because worker dies added
Updated by daviddavis about 6 years ago
- Sprint set to Sprint 44
There's another BZ opened against this bug downstream and I just worked with a customer that reported this issue today. Seems like this is happening regularly. I'm adding this to the current sprint and hoping we can get a fix into pulp 2.18.
Updated by amacdona@redhat.com about 6 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to amacdona@redhat.com
Updated by amacdona@redhat.com about 6 years ago
- Status changed from ASSIGNED to POST
Updated by daviddavis about 6 years ago
- Sprint/Milestone set to 2.18.0
- Platform Release deleted (
2.18.0)
Updated by amacdona@redhat.com about 6 years ago
The patch from https://github.com/pulp/pulp_docker/pull/275 is against latest 2-master.
To reproduce this issue, I used pulp 2.13.4 and pulp_docker 2.4.1, syncing portainer from docker hub. The problem was not related to OOM, but was just a mishandled 404.
The cherry-pick did not apply cleanly, but it the conflict was easily handled. After the cherry-pick was applied, the worker was no longer killed and the new coded exception was raised and the sync failed. This confirms that the patch will correctly handle 404s.
Updated by amacdona@redhat.com about 6 years ago
- Related to Test #4128: Test sync of a repository that is missing blobs added
Updated by amacdona@redhat.com about 6 years ago
I was able to reproduce this more rigorously against latest 2-master and prove the correctness of the patch. I set up a user installation of pulp on a vm:
User Pulp (to sync from)
[puser@omgwut ~]$ rpm -qa | grep pulp
pulp-admin-client-2.17.1-1.fc27.noarch
python-pulp-common-2.17.1-1.fc27.noarch
pulp-selinux-2.17.1-1.fc27.noarch
pulp-docker-admin-extensions-3.2.1-1.fc27.noarch
python-pulp-docker-common-3.2.1-1.fc27.noarch
python-pulp-repoauth-2.17.1-1.fc27.noarch
pulp-server-2.17.1-1.fc27.noarch
python-pulp-bindings-2.17.1-1.fc27.noarch
python-pulp-client-lib-2.17.1-1.fc27.noarch
pulp-docker-plugins-3.2.1-1.fc27.noarch
[puser@omgwut ~]$ rpm -qa | grep crane
python-crane-3.2.0-1.fc27.noarch
On the user installation box, I synced the busybox repository, published it, and served with crane.
From a developer installation running latest 2-master pulp and 2-master pulp_docker, I was able to sync from the user installation (sanity check).
Back on the user installation box, I moved one of the symlinks for a blob, stored in
/var/lib/pulp/published/docker/v2/web/synctest2/blobs
After wiping the db, I did a --force-full sync from the 2-master installation, which failed and killed the worker.
Nov 09 15:34:39 pulp2.dev pulp[3228]: pulp.server.async.tasks:INFO: [5f1bd81c] Task failed : [67a41d34-faed-4680-9f57-0a77b4a3178e] : Worker terminated abnormally while processing task 67a41d34-faed-4680-9f57-0a77b4a3178e. Check the logs for details
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) Task pulp.server.async.tasks._release_resource[5f1bd81c-1644-4743-a093-6a119d6f1be7] raised unexpected: AttributeError("'NoneType' object has no attribute 'top'",)
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) Traceback (most recent call last):
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 367, in trace_task
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) R = retval = fun(*args, **kwargs)
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 107, in __call__
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) return super(PulpTask, self).__call__(*args, **kwargs)
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 622, in __protected_call__
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) return self.run(*args, **kwargs)
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 296, in _release_resource
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) new_task.on_failure(exception, task_id, (), {}, MyEinfo)
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 602, in on_failure
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) if not self.request.called_directly:
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) File "/usr/lib/python2.7/site-packages/celery/app/task.py", line 978, in _get_request
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) req = self.request_stack.top
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) AttributeError: 'NoneType' object has no attribute 'top'
Nov 09 15:34:49 pulp2.dev pulp[1608]: celery.worker.request:ERROR: (1608-81120) Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Nov 09 15:34:49 pulp2.dev pulp[1608]: celery.worker.request:ERROR: (1608-81120) Traceback (most recent call last):
Nov 09 15:34:49 pulp2.dev pulp[1608]: celery.worker.request:ERROR: (1608-81120) File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1223, in mark_as_worker_lost
Nov 09 15:34:49 pulp2.dev pulp[1608]: celery.worker.request:ERROR: (1608-81120) human_status(exitcode)),
Nov 09 15:34:49 pulp2.dev pulp[1608]: celery.worker.request:ERROR: (1608-81120) WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
On the devel box, I changed branches to include this patch, cleaned the db, and again triggered a --force-full sync, which failed but did not kill the worker.
[vagrant@pulp2 pulp_docker]$ pulp-admin docker repo create --repo-id fixture --upstream-name synctest2 --feed http://omgwut:5000
Repository [fixture] successfully created
[vagrant@pulp2 pulp_docker]$ pulp-admin docker repo sync run --repo-id fixture --force-full
+----------------------------------------------------------------------+
Synchronizing Repository [fixture]
+----------------------------------------------------------------------+
The following sync configuration options will be used:
Force Full: True
This command may be exited via ctrl+c without affecting the request.
Downloading manifests
[\]
... completed
Copying units already in pulp
[-]
... completed
Copying units already in pulp
[-]
... completed
Downloading remote files
[================================================= ] 99%
373 of 375 items
... failed
Task Failed
Image download(s) from
http://omgwut:5000/v2/synctest2/blobs/sha256:edc53726e077686a24352d99aa150c01d47
24637541162238195e72b86739adf failed. Sync task has failed to prevent a
corrupted repository.
The worker did not die, and the logs were more reasonable.
Nov 09 15:46:17 pulp2.dev pulp[8658]: nectar.downloaders.threaded:INFO: Download of http://omgwut:5000/v2/synctest2/blobs/sha256:2de56e1a41335b794e4c830d4da72e440f4551d0ab50174b6d634abd9f4502d6 was cancelled
Nov 09 15:46:17 pulp2.dev pulp[8658]: pulp.server.async.tasks:INFO: [d16f4e65] Task failed : [d16f4e65-7876-4a4e-9385-566f34ad12c7] : Image download(s) from http://omgwut:5000/v2/synctest2/blobs/sha256:edc53726e077686a24352d99aa150c01d4724637541162238195e72b86739adf failed. Sync task has failed to prevent a corrupted repository.
Nov 09 15:46:17 pulp2.dev pulp[8658]: celery.app.trace:INFO: [d16f4e65] Task pulp.server.managers.repo.sync.sync[d16f4e65-7876-4a4e-9385-566f34ad12c7] raised expected: PulpCodedException()
Added by amacdona@redhat.com about 6 years ago
Added by amacdona@redhat.com about 6 years ago
Revision b09801b1 | View on GitHub
Fail gracefully when image download fails.
Updated by amacdona@redhat.com about 6 years ago
- Status changed from POST to MODIFIED
Applied in changeset b09801b1de94ea7a542b565e6c9caa456aae8619.
Added by amacdona@redhat.com about 6 years ago
Revision c81bc353 | View on GitHub
Fail gracefully when image download fails.
https://pulp.plan.io/issues/2849 fixes #2849
https://pulp.plan.io/issues/2966 fixes #2966
(cherry picked from commit b09801b1de94ea7a542b565e6c9caa456aae8619)
Added by amacdona@redhat.com about 6 years ago
Revision c81bc353 | View on GitHub
Fail gracefully when image download fails.
https://pulp.plan.io/issues/2849 fixes #2849
https://pulp.plan.io/issues/2966 fixes #2966
(cherry picked from commit b09801b1de94ea7a542b565e6c9caa456aae8619)
Updated by amacdona@redhat.com about 6 years ago
Applied in changeset c81bc353aa9427e9545827c6cd7ac4a5f946379f.
Updated by ttereshc about 6 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Fail gracefully when image download fails.
https://pulp.plan.io/issues/2849 fixes #2849
https://pulp.plan.io/issues/2966 fixes #2966