Issue #2966
closedUnable to sync docker repo because worker dies
Description
When attempting to sync a docker repo on Pulp 2.14 beta (see below for detailed version info),
a worker dies while syncing and the sync is abandoned.
Versions of pulp packages:
# rpm -qa | grep pulp
python-pulp-bindings-2.14.0-0.1.beta.fc25.noarch
pulp-ostree-admin-extensions-1.3.0-0.3.rc.git.1.3a770cd.fc25.noarch
python-pulp-docker-common-3.0.0-0.1.beta.fc25.noarch
pulp-deb-plugins-1.5.2-0.2.beta.fc25.noarch
python-pulp-deb-common-1.5.2-0.2.beta.fc25.noarch
pulp-puppet-admin-extensions-2.14.0-0.3.rc.git.1.12bf98c.fc25.noarch
python-pulp-rpm-common-2.14.0-0.1.beta.fc25.noarch
python-pulp-python-common-2.0.1-0.1.beta.fc25.noarch
pulp-selinux-2.14.0-0.3.rc.git.1.b8a44dd.fc25.noarch
pulp-deb-admin-extensions-1.5.2-0.2.beta.fc25.noarch
python-pulp-repoauth-2.14.0-0.1.beta.fc25.noarch
pulp-puppet-plugins-2.14.0-0.3.rc.git.1.12bf98c.fc25.noarch
pulp-rpm-admin-extensions-2.14.0-0.3.rc.git.1.5746bbf.fc25.noarch
python-pulp-oid_validation-2.14.0-0.1.beta.fc25.noarch
pulp-admin-client-2.14.0-0.3.rc.git.1.b8a44dd.fc25.noarch
pulp-python-plugins-2.0.1-0.3.rc.git.1.1d5b1c8.fc25.noarch
python-pulp-client-lib-2.14.0-0.1.beta.fc25.noarch
python-pulp-streamer-2.14.0-0.1.beta.fc25.noarch
pulp-docker-plugins-3.0.0-0.3.rc.git.1.5149268.fc25.noarch
python-pulp-common-2.14.0-0.1.beta.fc25.noarch
pulp-ostree-plugins-1.3.0-0.3.rc.git.1.3a770cd.fc25.noarch
python-pulp-puppet-common-2.14.0-0.1.beta.fc25.noarch
python-pulp-ostree-common-1.3.0-0.1.beta.fc25.noarch
pulp-server-2.14.0-0.3.rc.git.1.b8a44dd.fc25.noarch
pulp-docker-admin-extensions-3.0.0-0.3.rc.git.1.5149268.fc25.noarch
pulp-python-admin-extensions-2.0.1-0.3.rc.git.1.1d5b1c8.fc25.noarch
pulp-rpm-plugins-2.14.0-0.3.rc.git.1.5746bbf.fc25.noarch
# cat /etc/redhat-release
Fedora release 25 (Twenty Five)
Steps to recreate:
[root@fedora25base ~]# pulp-admin docker repo create --repo-id containers --feed https://index.docker.io --upstream-name library/busybox
Repository [containers] successfully created
[root@fedora25base ~]# pulp-admin docker repo sync run --repo-id containers
+----------------------------------------------------------------------+
Synchronizing Repository [containers]
+----------------------------------------------------------------------+
This command may be exited via ctrl+c without affecting the request.
Downloading manifests
[|]
... completed
Copying units already in pulp
[-]
... completed
Copying units already in pulp
[-]
... completed
Downloading remote files
[==================================================] 100%
57 of 57 items
... failed
Task Failed
Worker terminated abnormally while processing task
f1fcebda-3f45-44f9-8047-876ddb6e5962. Check the logs for details
Logs:
// REMOVED wrong logs were attached. See below for logs from same steps as above for correct repo
Files
Related issues
Updated by kdelee@redhat.com over 7 years ago
- Related to Issue #2849: Worker terminates abnormally while processing task added
Updated by ipanova@redhat.com over 7 years ago
this looks like a dup of #2849, please close it.
Updated by kdelee@redhat.com over 7 years ago
- Status changed from NEW to CLOSED - DUPLICATE
Closed as duplicate of #2849
Updated by kdelee@redhat.com over 7 years ago
- File docker.sync.logs docker.sync.logs added
- Description updated (diff)
Aug 10 10:51:54 fedora25base pulp[3638]: requests.packages.urllib3.connectionpool:INFO: Starting new HTTPS connection (5): index.docker.io
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download failed: Download of https://index.docker.io/v2/library/busybox/blobs/sha256:3aa1364bbd971e57a41bc0fe67dc188ad81e0090b8f3648e1a84ffabaf134ae1 failed with code 401: Unauthorized
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download succeeded: https://auth.docker.io/token?scope=repository%3Alibrary%2Fbusybox%3Apull&service=registry.docker.io.
Aug 10 10:51:54 fedora25base pulp[3638]: requests.packages.urllib3.connectionpool:INFO: Starting new HTTPS connection (6): index.docker.io
Aug 10 10:51:54 fedora25base pulp[3638]: requests.packages.urllib3.connectionpool:INFO: Starting new HTTPS connection (1): dseasb33srnrn.cloudfront.net
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download succeeded: https://index.docker.io/v2/library/busybox/blobs/sha256:3aa1364bbd971e57a41bc0fe67dc188ad81e0090b8f3648e1a84ffabaf134ae1.
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download failed: Download of https://index.docker.io/v2/library/busybox/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 failed with code 401: Unauthorized
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download failed: Download of https://index.docker.io/v2/library/busybox/blobs/sha256:f5c4e6c495776bf839ba15ab1a68dc7ef2f491ec4c258431a107445812ecd65d failed with code 401: Unauthorized
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download failed: Download of https://index.docker.io/v2/library/busybox/blobs/sha256:9495713bb019088f95e1d6e007340baab2acceb41cfbdbaac0dbae695de06044 failed with code 401: Unauthorized
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download failed: Download of https://index.docker.io/v2/library/busybox/blobs/sha256:e8bcc4b83f9c57e3ef7c831803d30ccc9ca2017915b309451ce5437e4f6a5247 failed with code 401: Unauthorized
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download succeeded: https://auth.docker.io/token?scope=repository%3Alibrary%2Fbusybox%3Apull&service=registry.docker.io.
Aug 10 10:51:55 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download succeeded: https://index.docker.io/v2/library/busybox/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4.
Aug 10 10:51:55 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download succeeded: https://auth.docker.io/token?scope=repository%3Alibrary%2Fbusybox%3Apull&service=registry.docker.io.
Aug 10 10:52:24 fedora25base pulp[3638]: nectar.downloaders.threaded:ERROR: Skipping requests to index.docker.io due to repeated connection failures: HTTPSConnectionPool(host='dseasb33srnrn.cloudfront.net', port=443): Read timed out.
Aug 10 10:52:28 fedora25base pulp[2272]: celery.worker.job:ERROR: (2272-29216) Task pulp.server.managers.repo.sync.sync[6fc440c7-052d-481a-a0b2-529b45acd84d] raised unexpected: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Aug 10 10:52:28 fedora25base pulp[2272]: celery.worker.job:ERROR: (2272-29216) Traceback (most recent call last):
Aug 10 10:52:28 fedora25base pulp[2272]: celery.worker.job:ERROR: (2272-29216) File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1175, in mark_as_worker_lost
Aug 10 10:52:28 fedora25base pulp[2272]: celery.worker.job:ERROR: (2272-29216) human_status(exitcode)),
Aug 10 10:52:28 fedora25base pulp[2272]: celery.worker.job:ERROR: (2272-29216) WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
Aug 10 10:52:28 fedora25base pulp[5835]: py.warnings:WARNING: [58286580] (5835-29216) /usr/lib64/python2.7/site-packages/pymongo/topology.py:143: UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#using-pymongo-with-multiprocessing>
Aug 10 10:52:28 fedora25base pulp[5835]: py.warnings:WARNING: [58286580] (5835-29216) "MongoClient opened before fork. Create MongoClient "
Aug 10 10:52:28 fedora25base pulp[5835]: py.warnings:WARNING: [58286580] (5835-29216)
Aug 10 10:52:28 fedora25base pulp[5835]: pulp.server.async.tasks:INFO: [58286580] Task failed : [6fc440c7-052d-481a-a0b2-529b45acd84d] : Worker terminated abnormally while processing task 6fc440c7-052d-481a-a0b2-529b45acd84d. Check the logs for details
Aug 10 10:52:28 fedora25base pulp[2272]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[58286580-7a24-4ae9-a295-f6a8c34b586e] succeeded in 0.10011015198s: None
Note that this failure was replicated on same Fedora 25 install as above.
Also, other installs on RHEL7 have successfully synced from this URL.
See attached for full logs from most recent recreation (starts at creation of repo, ends after failure)
Steps to create:
$ journalctl -f > docker.sync.logs &
$ pulp-admin docker repo create --repo-id containers --feed https://index.docker.io --upstream-name library/busybox
$ pulp-admin docker repo sync run --repo-id containers
Then I killed the journalctl command so these should be correct logs.
Updated by ipanova@redhat.com over 7 years ago
- Status changed from CLOSED - DUPLICATE to NEW
i am re-opening this so the team can a look st this during triage. i think the root cause is the same as of issue #2849, it just manifests in different cases.
Updated by ttereshc over 7 years ago
More investigation needed to figure out if it's a duplicate of #2849 or not. Can't be triaged without that info.
Updated by mhrivnak over 7 years ago
The logs appear to omit anything from the kernel. Can you run journalctl as root and see what else you find? Hopefully there is an explanation of why SIGKILL was sent, or at least who/what initiated it.
Updated by kdelee@redhat.com over 7 years ago
NOTE:
I did this on a fresh f25 vm provisioned through beaker and it did not have any problems.
I'm trying again on a vm on my own machine.
May just be something with the VM's network connection, but still undesirable behaviour just from slow connection.
Updated by kdelee@redhat.com over 7 years ago
Ok, on fresh install of Fedora 25 on VM on my machine, able to re-create again.
Ran all commands as root. See attached for complete logs.
[root@fedora25vm1 ansible]# journalctl -f > docker.sync.logs &
[1] 2895
[root@fedora25vm1 ansible]# pulp-admin login -u admin -p admin
Successfully logged in. Session certificate will expire at Aug 24 13:14:46 2017
GMT.
[root@fedora25vm1 ansible]# pulp-admin docker repo create --repo-id containers --feed https://index.docker.io --upstream-name library/busybox
Repository [containers] successfully created
[root@fedora25vm1 ansible]# pulp-admin docker repo sync run --repo-id containers
+----------------------------------------------------------------------+
Synchronizing Repository [containers]
+----------------------------------------------------------------------+
This command may be exited via ctrl+c without affecting the request.
Downloading manifests
[-]
... completed
Copying units already in pulp
[-]
... completed
Copying units already in pulp
[-]
... completed
Downloading remote files
[=============================================== ] 94%
54 of 57 items
... failed
Task Failed
Worker terminated abnormally while processing task
3653b86a-78ab-44e4-a746-12dc66373674. Check the logs for details
[root@fedora25vm1 ansible]# pkill -s 15 2895
Excerpt from logs:
Aug 17 09:16:06 fedora25vm1 pulp[2331]: nectar.downloaders.threaded:INFO: Download succeeded: https://index.docker.io/v2/library/busybox/blobs/sha256:557a0c95bfcd88b3630d35f69d765f319138df9010ddea6ab24f6ea1e3d19092.
Aug 17 09:16:09 fedora25vm1 pulp[2331]: nectar.downloaders.threaded:ERROR: Skipping requests to index.docker.io due to repeated connection failures: HTTPSConnectionPool(host='dseasb33srnrn.cloudfront.net', port=443): Read timed out.
Aug 17 09:16:09 fedora25vm1 pulp[2331]: nectar.downloaders.threaded:ERROR: Skipping requests to index.docker.io due to repeated connection failures: HTTPSConnectionPool(host='dseasb33srnrn.cloudfront.net', port=443): Read timed out.
Aug 17 09:16:10 fedora25vm1 chronyd[740]: Selected source 172.104.8.139
Aug 17 09:16:12 fedora25vm1 pulp[2254]: celery.worker.job:ERROR: (2254-59392) Task pulp.server.managers.repo.sync.sync[3653b86a-78ab-44e4-a746-12dc66373674] raised unexpected: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Aug 17 09:16:12 fedora25vm1 pulp[2254]: celery.worker.job:ERROR: (2254-59392) Traceback (most recent call last):
Aug 17 09:16:12 fedora25vm1 pulp[2254]: celery.worker.job:ERROR: (2254-59392) File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1175, in mark_as_worker_lost
Aug 17 09:16:12 fedora25vm1 pulp[2254]: celery.worker.job:ERROR: (2254-59392) human_status(exitcode)),
Aug 17 09:16:12 fedora25vm1 pulp[2254]: celery.worker.job:ERROR: (2254-59392) WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
Aug 17 09:16:12 fedora25vm1 celery[2254]: reserved_resource_worker-0@fedora25vm1 ready.
Aug 17 09:16:12 fedora25vm1 pulp[2992]: py.warnings:WARNING: [92de69ac] (2992-59392) /usr/lib64/python2.7/site-packages/pymongo/topology.py:143: UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#using-pymongo-with-multiprocessing>
Aug 17 09:16:12 fedora25vm1 pulp[2992]: py.warnings:WARNING: [92de69ac] (2992-59392) "MongoClient opened before fork. Create MongoClient "
Aug 17 09:16:12 fedora25vm1 pulp[2992]: py.warnings:WARNING: [92de69ac] (2992-59392)
Aug 17 09:16:12 fedora25vm1 pulp[2992]: pulp.server.async.tasks:INFO: [92de69ac] Task failed : [3653b86a-78ab-44e4-a746-12dc66373674] : Worker terminated abnormally while processing task 3653b86a-78ab-44e4-a746-12dc66373674. Check the logs for details
Aug 17 09:16:12 fedora25vm1 pulp[2254]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[92de69ac-1d9c-44ef-804f-51340515fb3f] succeeded in 0.0250126969986s: None
Updated by mhrivnak over 7 years ago
Using the audit system, I managed to observe the signal being sent to the child worker process. It is being sent by the child worker process itself! So it is committing suicide. I do not yet know why.
I am using the repo "portainer/portainer" as was used in #2849
Aug 18 14:49:42 fedora25vm1 pulp[2556]: nectar.downloaders.threaded:INFO: Download failed: Download of https://registry-1.docker.io/v2/portainer/portainer/blobs/sha256:6330793656b1565cd8b5d5c1e2a736a351dec39f7250daf31eb91e3a76cc872b failed with code 404: Not Found
Aug 18 14:49:42 fedora25vm1 audit[2556]: SYSCALL arch=c000003e syscall=62 success=yes exit=0 a0=9fc a1=9 a2=1 a3=7fda62410e18 items=0 ppid=2489 pid=2556 auid=4294967295 uid=48 gid=48 euid=48 suid=48 fsuid=48 egid=48 sgid=48 fsgid=48 tty=(none) ses=4294967295 comm="celery" exe="/usr/bin/python2.7" subj=system_u:system_r:celery_t:s0 key=(null)
Aug 18 14:49:42 fedora25vm1 audit: OBJ_PID opid=2556 oauid=-1 ouid=48 oses=-1 obj=system_u:system_r:celery_t:s0 ocomm="celery"
Aug 18 14:49:42 fedora25vm1 audit: PROCTITLE proctitle="(null)"
Aug 18 14:49:42 fedora25vm1 pulp[2489]: celery.worker.job:ERROR: (2489-13920) Task pulp.server.managers.repo.sync.sync[7c224c2c-bf15-481e-80c4-448daa30df1c] raised unexpected: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Aug 18 14:49:42 fedora25vm1 pulp[2489]: celery.worker.job:ERROR: (2489-13920) Traceback (most recent call last):
Aug 18 14:49:42 fedora25vm1 pulp[2489]: celery.worker.job:ERROR: (2489-13920) File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1175, in mark_as_worker_lost
Aug 18 14:49:42 fedora25vm1 pulp[2489]: celery.worker.job:ERROR: (2489-13920) human_status(exitcode)),
Aug 18 14:49:42 fedora25vm1 pulp[2489]: celery.worker.job:ERROR: (2489-13920) WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
Updated by Ichimonji10 over 7 years ago
I attempted to reproduce this issue on a matrix of six VMs, where one axis is F24, F25 and RHEL 7, and the other axis is nightly builds of Pulp 2.13 and 2.14. To reproduce, I logged in with pulp-admin login ...
, then ran each of these scripts in turn:
#!/usr/bin/env bash
set -euo pipefail
for i in {1..20}; do
pulp-admin docker repo create --repo-id containers --feed https://index.docker.io --upstream-name library/busybox
pulp-admin docker repo sync run --repo-id containers
pulp-admin docker repo delete --repo-id containers
done
#!/usr/bin/env bash
set -euo pipefail
for i in {1..10}; do
pulp-admin docker repo create --repo-id containers --feed https://index.docker.io --upstream-name library/busybox
pulp-admin docker repo sync run --repo-id containers
pulp-admin docker repo delete --repo-id containers
pulp-admin docker orphan remove --all
done
In total, that's (20 sync/publish with cache + 10 sync/publish without cache) * 6 VMs == 180 sync/publish cycles, and I was unable to reproduce the issue. I made sure to visually scan the output of pulp-admin, as I know how it doesn't handle things like return codes correctly. Here's the packages installed on the F25 Pulp 2.14 system:
# rpm -qa | grep -i pulp | sort
pulp-admin-client-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
pulp-docker-admin-extensions-3.0.1-0.1.alpha.git.1.b5164f7.fc25.noarch
pulp-docker-plugins-3.0.1-0.1.alpha.git.1.b5164f7.fc25.noarch
pulp-ostree-admin-extensions-1.3.1-0.1.alpha.git.7.f1dbbc3.fc25.noarch
pulp-ostree-plugins-1.3.1-0.1.alpha.git.7.f1dbbc3.fc25.noarch
pulp-puppet-admin-extensions-2.14.1-0.1.alpha.git.7.78fba43.fc25.noarch
pulp-puppet-plugins-2.14.1-0.1.alpha.git.7.78fba43.fc25.noarch
pulp-python-admin-extensions-2.0.2-0.1.alpha.git.8.79b0442.fc25.noarch
pulp-python-plugins-2.0.2-0.1.alpha.git.8.79b0442.fc25.noarch
pulp-rpm-admin-extensions-2.14.1-0.1.alpha.git.7.231e2a8.fc25.noarch
pulp-rpm-plugins-2.14.1-0.1.alpha.git.7.231e2a8.fc25.noarch
pulp-selinux-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
pulp-server-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
python-pulp-bindings-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
python-pulp-client-lib-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
python-pulp-common-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
python-pulp-docker-common-3.0.1-0.1.alpha.git.1.b5164f7.fc25.noarch
python-pulp-oid_validation-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
python-pulp-ostree-common-1.3.1-0.1.alpha.git.7.f1dbbc3.fc25.noarch
python-pulp-puppet-common-2.14.1-0.1.alpha.git.7.78fba43.fc25.noarch
python-pulp-python-common-2.0.2-0.1.alpha.git.8.79b0442.fc25.noarch
python-pulp-repoauth-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
python-pulp-rpm-common-2.14.1-0.1.alpha.git.7.231e2a8.fc25.noarch
python-pulp-streamer-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
Updated by mhrivnak about 7 years ago
This was seen again during an upgrade test job, running on RHEL 7 with this set of tests: https://github.com/PulpQE/pulp-migrate
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18128]: nectar.downloaders.threaded:INFO: Download succeeded: https://registry-1.docker.io/v2/library/swarm/blobs/sha256:97bbaf5315541c39a89aa1ab46cd17edbc2be434147349352c1feaf83664c089.
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18128]: nectar.downloaders.threaded:INFO: Download succeeded: https://registry-1.docker.io/v2/library/swarm/blobs/sha256:26fd50703284d344a2892cbc30a24c8a2b336d965b07033fe0de45d32b8b4926.
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18128]: nectar.downloaders.threaded:INFO: Download succeeded: https://registry-1.docker.io/v2/library/swarm/blobs/sha256:ccde00925b2ff6517e1acb30c0ef0106280c56af97480a485641e8bba925066e.
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18128]: nectar.downloaders.threaded:INFO: Download failed: Download of https://registry-1.docker.io/v2/library/swarm/blobs/sha256:fa34e1c94103c5e9b9c4d9685b23a3df772400387a304b6fb39846e8c4a294ba failed with code 503: Service Temporarily Unavailable
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18043]: celery.worker.job:ERROR: (18043-22976) Task pulp.server.managers.repo.sync.sync[c00bba68-1d0a-42cb-a896-0819bb1238c7] raised unexpected: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Oct 02 12:52:39 host-172-16-46-20.openstacklocal celery[18043]: reserved_resource_worker-0@host-172-16-46-20.openstacklocal ready.
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18043]: celery.worker.job:ERROR: (18043-22976) Traceback (most recent call last):
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18043]: celery.worker.job:ERROR: (18043-22976) File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1171, in mark_as_worker_lost
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18043]: celery.worker.job:ERROR: (18043-22976) human_status(exitcode)),
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18043]: celery.worker.job:ERROR: (18043-22976) WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
Updated by amacdona@redhat.com over 6 years ago
- Sprint Candidate changed from Yes to No
Updated by amacdona@redhat.com about 6 years ago
- Related to Test #4129: Test sync of a repository that returns a 403 response for a download request added
Updated by amacdona@redhat.com about 6 years ago
- Status changed from NEW to POST
- Assignee set to amacdona@redhat.com
https://github.com/pulp/pulp_docker/pull/275
I was able to reproduce this with the same boxes I used for https://pulp.plan.io/issues/2849#note-14.
In this case, rather than moving or deleting the symlink, I moved what the symlink pointed to. This resulted in a 403.
From the user pulp httpd log:
==> ssl_error_log <==
[Fri Nov 09 11:28:32.644359 2018] [core:error] [pid 2779:tid 140209110451968] [client 192.168.122.1:47674] AH00037: Symbolic link not allowed or link target not accessible: /var/www/pub/docker/v2/web/synctest2/blobs/sha256:edc53726e077686a24352d99aa150c01d4724637541162238195e72b86739adf
2-master
[vagrant@pulp2 pulp_docker]$ pulp-admin docker repo create --repo-id wut --upstream-name synctest2 --feed http://omgwut:5000
Repository [wut] successfully created
[vagrant@pulp2 pulp_docker]$ pulp-admin docker repo sync run --repo-id wut --force-full
+----------------------------------------------------------------------+
Synchronizing Repository [wut]
+----------------------------------------------------------------------+
The following sync configuration options will be used:
Force Full: True
This command may be exited via ctrl+c without affecting the request.
Downloading manifests
[-]
... completed
Copying units already in pulp
[-]
... completed
Copying units already in pulp
[-]
... completed
Downloading remote files
[================================================= ] 99%
372 of 375 items
... failed
pulp-admin did not return.
Log:
Nov 09 16:28:33 pulp2.dev pulp[20050]: pulp.server.async.tasks:INFO: [20b6ec07] Task failed : [2d1f5278-e84d-41ba-a912-cfb186d7fe5f] : Worker terminated abnormally while processing task 2d1f5278-e84d-41ba-a912-cfb186d7fe5f. Check the logs for details
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) Task pulp.server.async.tasks._release_resource[20b6ec07-1ae1-486d-a6fc-59945b0016fe] raised unexpected: AttributeError("'NoneType' object has no attribute 'top'",)
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) Traceback (most recent call last):
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 367, in trace_task
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) R = retval = fun(*args, **kwargs)
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 107, in __call__
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) return super(PulpTask, self).__call__(*args, **kwargs)
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 622, in __protected_call__
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) return self.run(*args, **kwargs)
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 296, in _release_resource
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) new_task.on_failure(exception, task_id, (), {}, MyEinfo)
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 602, in on_failure
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) if not self.request.called_directly:
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) File "/usr/lib/python2.7/site-packages/celery/app/task.py", line 978, in _get_request
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) req = self.request_stack.top
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) AttributeError: 'NoneType' object has no attribute 'top'
Nov 09 16:28:43 pulp2.dev pulp[19004]: celery.worker.request:ERROR: (19004-03808) Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Nov 09 16:28:43 pulp2.dev pulp[19004]: celery.worker.request:ERROR: (19004-03808) Traceback (most recent call last):
Nov 09 16:28:43 pulp2.dev pulp[19004]: celery.worker.request:ERROR: (19004-03808) File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1223, in mark_as_worker_lost
Nov 09 16:28:43 pulp2.dev pulp[19004]: celery.worker.request:ERROR: (19004-03808) human_status(exitcode)),
Nov 09 16:28:43 pulp2.dev pulp[19004]: celery.worker.request:ERROR: (19004-03808) WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
With patch
[vagrant@pulp2 pulp]$ pulp-admin docker repo sync run --repo-id wut --force-full
+----------------------------------------------------------------------+
Synchronizing Repository [wut]
+----------------------------------------------------------------------+
The following sync configuration options will be used:
Force Full: True
This command may be exited via ctrl+c without affecting the request.
Downloading manifests
[-]
... completed
Copying units already in pulp
[-]
... completed
Copying units already in pulp
[-]
... completed
Downloading remote files
[================================================= ] 99%
374 of 375 items
... failed
Task Failed
Image download(s) from
http://omgwut:5000/v2/synctest2/blobs/sha256:edc53726e077686a24352d99aa150c01d47
24637541162238195e72b86739adf failed. Sync task has failed to prevent a
corrupted repository.
Added by amacdona@redhat.com about 6 years ago
Added by amacdona@redhat.com about 6 years ago
Revision b09801b1 | View on GitHub
Fail gracefully when image download fails.
Updated by amacdona@redhat.com about 6 years ago
- Status changed from POST to MODIFIED
Applied in changeset b09801b1de94ea7a542b565e6c9caa456aae8619.
Added by amacdona@redhat.com about 6 years ago
Revision c81bc353 | View on GitHub
Fail gracefully when image download fails.
https://pulp.plan.io/issues/2849 fixes #2849
https://pulp.plan.io/issues/2966 fixes #2966
(cherry picked from commit b09801b1de94ea7a542b565e6c9caa456aae8619)
Added by amacdona@redhat.com about 6 years ago
Revision c81bc353 | View on GitHub
Fail gracefully when image download fails.
https://pulp.plan.io/issues/2849 fixes #2849
https://pulp.plan.io/issues/2966 fixes #2966
(cherry picked from commit b09801b1de94ea7a542b565e6c9caa456aae8619)
Updated by amacdona@redhat.com about 6 years ago
Applied in changeset c81bc353aa9427e9545827c6cd7ac4a5f946379f.
Updated by amacdona@redhat.com about 6 years ago
- Sprint changed from Sprint 46 to Sprint 45
Updated by ttereshc about 6 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Fail gracefully when image download fails.
https://pulp.plan.io/issues/2849 fixes #2849
https://pulp.plan.io/issues/2966 fixes #2966