Issue #2849: Worker terminates abnormally while processing task - Docker Support - Pulp

Actions

Send by e-mail Copy link

Issue #2849

closed

Worker terminates abnormally while processing task

Added by ipanova@redhat.com over 7 years ago. Updated almost 6 years ago.

Status:

CLOSED - CURRENTRELEASE

Priority:

High

Assignee:

amacdona@redhat.com

Sprint/Milestone:

Pulp - 2.18.0

Start date:

Due date:

Estimated time:

Severity:

3. High

Version - Docker:

Platform Release:

2.18.0

Target Release - Docker:

OS:

Triaged:

Yes

Groomed:

Sprint Candidate:

Tags:

Pulp 2

Sprint:

Sprint 45

Quarter:

Description

When i am syncing manifests that have foreign layers, i eventually get worker get killed.


$ pulp-admin docker repo  sync run --repo-id bz
Warning: path should have mode 0700 because it may contain sensitive information: /home/ipanova/.pulp/

+----------------------------------------------------------------------+
                     Synchronizing Repository [bz]
+----------------------------------------------------------------------+

This command may be exited via ctrl+c without affecting the request.

Downloading manifests
[\]
... completed

Copying units already in pulp
[-]
... completed

Copying units already in pulp
[-]
... completed

Downloading remote files
[=                                                 ] 3%
8 of 221 items

Task Failed

Worker terminated abnormally while processing task
42fabc46-17b3-419f-a79c-ccd0bcba1474.  Check the logs for details

Pulp starts to panic when it hits 404.
We need to check if the registry closes the connection prematurely and investigate what causes the worker termination. It is not OOM kill.

Jun 28 12:34:14 ina pulp: nectar.downloaders.threaded:INFO: Download failed: Download of https://registry-1.docker.io/v2/portainer/portainer/blobs/sha256:6330793656b1565cd8b5d5c1e2a736a351dec39f7250daf31eb91e3a76cc872b failed with code 404: Not Found
Jun 28 12:34:14 ina pulp: celery.worker.job:ERROR: (11813-08608) Task pulp.server.managers.repo.sync.sync[b28a4117-b647-484f-a9da-45a15e06f09e] raised unexpected: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)

Related issues

Actions

Copy link

Updated by ipanova@redhat.com over 7 years ago

Subject changed from | to Worker terminates abnormally while processing task

Actions

Copy link

Updated by bmbouter over 7 years ago

If it's not being killed by the OOM, is it segfaulting? I'm not 100% sure how to do this, but you can configure linux to create a coredump for any process that is segfaulting.

Actions

Copy link

Updated by ttereshc over 7 years ago

Priority changed from Normal to High
Severity changed from 2. Medium to 3. High
Triaged changed from No to Yes

Actions

Copy link

Updated by kdelee@redhat.com over 7 years ago

Related to Issue #2966: Unable to sync docker repo because worker dies added

Actions

Copy link

Updated by daviddavis over 6 years ago

Sprint set to Sprint 44

There's another BZ opened against this bug downstream and I just worked with a customer that reported this issue today. Seems like this is happening regularly. I'm adding this to the current sprint and hoping we can get a fix into pulp 2.18.

Actions

Copy link

Updated by amacdona@redhat.com about 6 years ago

Status changed from NEW to ASSIGNED
Assignee set to amacdona@redhat.com

Actions

Copy link

Updated by amacdona@redhat.com about 6 years ago

Status changed from ASSIGNED to POST

https://github.com/pulp/pulp_docker/pull/275

Actions

Copy link

Updated by daviddavis about 6 years ago

Platform Release set to 2.18.0

Actions

Copy link

#10

Updated by daviddavis about 6 years ago

Sprint/Milestone set to 2.18.0
Platform Release deleted (~~2.18.0~~)

Actions

Copy link

#11

Updated by rchan about 6 years ago

Sprint changed from Sprint 44 to Sprint 45

Actions

Copy link

#12

Updated by amacdona@redhat.com about 6 years ago

The patch from https://github.com/pulp/pulp_docker/pull/275 is against latest 2-master.

To reproduce this issue, I used pulp 2.13.4 and pulp_docker 2.4.1, syncing portainer from docker hub. The problem was not related to OOM, but was just a mishandled 404.

The cherry-pick did not apply cleanly, but it the conflict was easily handled. After the cherry-pick was applied, the worker was no longer killed and the new coded exception was raised and the sync failed. This confirms that the patch will correctly handle 404s.

Actions

Copy link

#13

Updated by amacdona@redhat.com about 6 years ago

Related to Test #4128: Test sync of a repository that is missing blobs added

Actions

Copy link

#14

Updated by amacdona@redhat.com about 6 years ago

I was able to reproduce this more rigorously against latest 2-master and prove the correctness of the patch. I set up a user installation of pulp on a vm:

User Pulp (to sync from)

[puser@omgwut ~]$ rpm -qa | grep pulp
pulp-admin-client-2.17.1-1.fc27.noarch
python-pulp-common-2.17.1-1.fc27.noarch
pulp-selinux-2.17.1-1.fc27.noarch
pulp-docker-admin-extensions-3.2.1-1.fc27.noarch
python-pulp-docker-common-3.2.1-1.fc27.noarch
python-pulp-repoauth-2.17.1-1.fc27.noarch
pulp-server-2.17.1-1.fc27.noarch
python-pulp-bindings-2.17.1-1.fc27.noarch
python-pulp-client-lib-2.17.1-1.fc27.noarch
pulp-docker-plugins-3.2.1-1.fc27.noarch
[puser@omgwut ~]$ rpm -qa | grep crane
python-crane-3.2.0-1.fc27.noarch

On the user installation box, I synced the busybox repository, published it, and served with crane.

From a developer installation running latest 2-master pulp and 2-master pulp_docker, I was able to sync from the user installation (sanity check).

Back on the user installation box, I moved one of the symlinks for a blob, stored in

/var/lib/pulp/published/docker/v2/web/synctest2/blobs

After wiping the db, I did a --force-full sync from the 2-master installation, which failed and killed the worker.

Nov 09 15:34:39 pulp2.dev pulp[3228]: pulp.server.async.tasks:INFO: [5f1bd81c] Task failed : [67a41d34-faed-4680-9f57-0a77b4a3178e] : Worker terminated abnormally while processing task 67a41d34-faed-4680-9f57-0a77b4a3178e.  Check the logs for details
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) Task pulp.server.async.tasks._release_resource[5f1bd81c-1644-4743-a093-6a119d6f1be7] raised unexpected: AttributeError("'NoneType' object has no attribute 'top'",)
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) Traceback (most recent call last):
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 367, in trace_task
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120)     R = retval = fun(*args, **kwargs)
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120)   File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 107, in __call__
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120)     return super(PulpTask, self).__call__(*args, **kwargs)
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 622, in __protected_call__
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120)     return self.run(*args, **kwargs)
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120)   File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 296, in _release_resource
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120)     new_task.on_failure(exception, task_id, (), {}, MyEinfo)
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120)   File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 602, in on_failure
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120)     if not self.request.called_directly:
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120)   File "/usr/lib/python2.7/site-packages/celery/app/task.py", line 978, in _get_request
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120)     req = self.request_stack.top
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) AttributeError: 'NoneType' object has no attribute 'top'
Nov 09 15:34:49 pulp2.dev pulp[1608]: celery.worker.request:ERROR: (1608-81120) Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Nov 09 15:34:49 pulp2.dev pulp[1608]: celery.worker.request:ERROR: (1608-81120) Traceback (most recent call last):
Nov 09 15:34:49 pulp2.dev pulp[1608]: celery.worker.request:ERROR: (1608-81120)   File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1223, in mark_as_worker_lost
Nov 09 15:34:49 pulp2.dev pulp[1608]: celery.worker.request:ERROR: (1608-81120)     human_status(exitcode)),
Nov 09 15:34:49 pulp2.dev pulp[1608]: celery.worker.request:ERROR: (1608-81120) WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).

On the devel box, I changed branches to include this patch, cleaned the db, and again triggered a --force-full sync, which failed but did not kill the worker.

[vagrant@pulp2 pulp_docker]$ pulp-admin docker repo create --repo-id fixture --upstream-name synctest2 --feed http://omgwut:5000
Repository [fixture] successfully created

[vagrant@pulp2 pulp_docker]$ pulp-admin docker repo sync run --repo-id fixture --force-full
+----------------------------------------------------------------------+
                   Synchronizing Repository [fixture]
+----------------------------------------------------------------------+

The following sync configuration options will be used:

Force Full:  True

This command may be exited via ctrl+c without affecting the request.

Downloading manifests
[\]
... completed

Copying units already in pulp
[-]
... completed

Copying units already in pulp
[-]
... completed

Downloading remote files
[================================================= ] 99%
373 of 375 items
... failed

Task Failed

Image download(s) from
http://omgwut:5000/v2/synctest2/blobs/sha256:edc53726e077686a24352d99aa150c01d47
24637541162238195e72b86739adf failed. Sync task has failed to prevent a
corrupted repository.

The worker did not die, and the logs were more reasonable.

Nov 09 15:46:17 pulp2.dev pulp[8658]: nectar.downloaders.threaded:INFO: Download of http://omgwut:5000/v2/synctest2/blobs/sha256:2de56e1a41335b794e4c830d4da72e440f4551d0ab50174b6d634abd9f4502d6 was cancelled
Nov 09 15:46:17 pulp2.dev pulp[8658]: pulp.server.async.tasks:INFO: [d16f4e65] Task failed : [d16f4e65-7876-4a4e-9385-566f34ad12c7] : Image download(s) from http://omgwut:5000/v2/synctest2/blobs/sha256:edc53726e077686a24352d99aa150c01d4724637541162238195e72b86739adf failed. Sync task has failed to prevent a corrupted repository.
Nov 09 15:46:17 pulp2.dev pulp[8658]: celery.app.trace:INFO: [d16f4e65] Task pulp.server.managers.repo.sync.sync[d16f4e65-7876-4a4e-9385-566f34ad12c7] raised expected: PulpCodedException()