I was able to reproduce this more rigorously against latest 2-master and prove the correctness of the patch. I set up a user installation of pulp on a vm:
User Pulp (to sync from)
[puser@omgwut ~]$ rpm -qa | grep pulp
pulp-admin-client-2.17.1-1.fc27.noarch
python-pulp-common-2.17.1-1.fc27.noarch
pulp-selinux-2.17.1-1.fc27.noarch
pulp-docker-admin-extensions-3.2.1-1.fc27.noarch
python-pulp-docker-common-3.2.1-1.fc27.noarch
python-pulp-repoauth-2.17.1-1.fc27.noarch
pulp-server-2.17.1-1.fc27.noarch
python-pulp-bindings-2.17.1-1.fc27.noarch
python-pulp-client-lib-2.17.1-1.fc27.noarch
pulp-docker-plugins-3.2.1-1.fc27.noarch
[puser@omgwut ~]$ rpm -qa | grep crane
python-crane-3.2.0-1.fc27.noarch
On the user installation box, I synced the busybox repository, published it, and served with crane.
From a developer installation running latest 2-master pulp and 2-master pulp_docker, I was able to sync from the user installation (sanity check).
Back on the user installation box, I moved one of the symlinks for a blob, stored in
/var/lib/pulp/published/docker/v2/web/synctest2/blobs
After wiping the db, I did a --force-full sync from the 2-master installation, which failed and killed the worker.
Nov 09 15:34:39 pulp2.dev pulp[3228]: pulp.server.async.tasks:INFO: [5f1bd81c] Task failed : [67a41d34-faed-4680-9f57-0a77b4a3178e] : Worker terminated abnormally while processing task 67a41d34-faed-4680-9f57-0a77b4a3178e. Check the logs for details
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) Task pulp.server.async.tasks._release_resource[5f1bd81c-1644-4743-a093-6a119d6f1be7] raised unexpected: AttributeError("'NoneType' object has no attribute 'top'",)
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) Traceback (most recent call last):
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 367, in trace_task
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) R = retval = fun(*args, **kwargs)
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 107, in __call__
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) return super(PulpTask, self).__call__(*args, **kwargs)
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 622, in __protected_call__
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) return self.run(*args, **kwargs)
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 296, in _release_resource
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) new_task.on_failure(exception, task_id, (), {}, MyEinfo)
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 602, in on_failure
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) if not self.request.called_directly:
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) File "/usr/lib/python2.7/site-packages/celery/app/task.py", line 978, in _get_request
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) req = self.request_stack.top
Nov 09 15:34:39 pulp2.dev pulp[3228]: celery.app.trace:ERROR: [5f1bd81c] (3228-81120) AttributeError: 'NoneType' object has no attribute 'top'
Nov 09 15:34:49 pulp2.dev pulp[1608]: celery.worker.request:ERROR: (1608-81120) Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Nov 09 15:34:49 pulp2.dev pulp[1608]: celery.worker.request:ERROR: (1608-81120) Traceback (most recent call last):
Nov 09 15:34:49 pulp2.dev pulp[1608]: celery.worker.request:ERROR: (1608-81120) File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1223, in mark_as_worker_lost
Nov 09 15:34:49 pulp2.dev pulp[1608]: celery.worker.request:ERROR: (1608-81120) human_status(exitcode)),
Nov 09 15:34:49 pulp2.dev pulp[1608]: celery.worker.request:ERROR: (1608-81120) WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
On the devel box, I changed branches to include this patch, cleaned the db, and again triggered a --force-full sync, which failed but did not kill the worker.
[vagrant@pulp2 pulp_docker]$ pulp-admin docker repo create --repo-id fixture --upstream-name synctest2 --feed http://omgwut:5000
Repository [fixture] successfully created
[vagrant@pulp2 pulp_docker]$ pulp-admin docker repo sync run --repo-id fixture --force-full
+----------------------------------------------------------------------+
Synchronizing Repository [fixture]
+----------------------------------------------------------------------+
The following sync configuration options will be used:
Force Full: True
This command may be exited via ctrl+c without affecting the request.
Downloading manifests
[\]
... completed
Copying units already in pulp
[-]
... completed
Copying units already in pulp
[-]
... completed
Downloading remote files
[================================================= ] 99%
373 of 375 items
... failed
Task Failed
Image download(s) from
http://omgwut:5000/v2/synctest2/blobs/sha256:edc53726e077686a24352d99aa150c01d47
24637541162238195e72b86739adf failed. Sync task has failed to prevent a
corrupted repository.
The worker did not die, and the logs were more reasonable.
Nov 09 15:46:17 pulp2.dev pulp[8658]: nectar.downloaders.threaded:INFO: Download of http://omgwut:5000/v2/synctest2/blobs/sha256:2de56e1a41335b794e4c830d4da72e440f4551d0ab50174b6d634abd9f4502d6 was cancelled
Nov 09 15:46:17 pulp2.dev pulp[8658]: pulp.server.async.tasks:INFO: [d16f4e65] Task failed : [d16f4e65-7876-4a4e-9385-566f34ad12c7] : Image download(s) from http://omgwut:5000/v2/synctest2/blobs/sha256:edc53726e077686a24352d99aa150c01d4724637541162238195e72b86739adf failed. Sync task has failed to prevent a corrupted repository.
Nov 09 15:46:17 pulp2.dev pulp[8658]: celery.app.trace:INFO: [d16f4e65] Task pulp.server.managers.repo.sync.sync[d16f4e65-7876-4a4e-9385-566f34ad12c7] raised expected: PulpCodedException()
Fail gracefully when image download fails.
https://pulp.plan.io/issues/2849 fixes #2849
https://pulp.plan.io/issues/2966 fixes #2966