Project

Profile

Help

Issue #2966

closed

Unable to sync docker repo because worker dies

Added by kdelee@redhat.com over 6 years ago. Updated almost 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
3. High
Version - Docker:
Platform Release:
2.18.0
Target Release - Docker:
OS:
Fedora 25
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 45
Quarter:

Description

When attempting to sync a docker repo on Pulp 2.14 beta (see below for detailed version info),
a worker dies while syncing and the sync is abandoned.

Versions of pulp packages:

# rpm -qa | grep pulp

python-pulp-bindings-2.14.0-0.1.beta.fc25.noarch
pulp-ostree-admin-extensions-1.3.0-0.3.rc.git.1.3a770cd.fc25.noarch
python-pulp-docker-common-3.0.0-0.1.beta.fc25.noarch
pulp-deb-plugins-1.5.2-0.2.beta.fc25.noarch
python-pulp-deb-common-1.5.2-0.2.beta.fc25.noarch
pulp-puppet-admin-extensions-2.14.0-0.3.rc.git.1.12bf98c.fc25.noarch
python-pulp-rpm-common-2.14.0-0.1.beta.fc25.noarch
python-pulp-python-common-2.0.1-0.1.beta.fc25.noarch
pulp-selinux-2.14.0-0.3.rc.git.1.b8a44dd.fc25.noarch
pulp-deb-admin-extensions-1.5.2-0.2.beta.fc25.noarch
python-pulp-repoauth-2.14.0-0.1.beta.fc25.noarch
pulp-puppet-plugins-2.14.0-0.3.rc.git.1.12bf98c.fc25.noarch
pulp-rpm-admin-extensions-2.14.0-0.3.rc.git.1.5746bbf.fc25.noarch
python-pulp-oid_validation-2.14.0-0.1.beta.fc25.noarch
pulp-admin-client-2.14.0-0.3.rc.git.1.b8a44dd.fc25.noarch
pulp-python-plugins-2.0.1-0.3.rc.git.1.1d5b1c8.fc25.noarch
python-pulp-client-lib-2.14.0-0.1.beta.fc25.noarch
python-pulp-streamer-2.14.0-0.1.beta.fc25.noarch
pulp-docker-plugins-3.0.0-0.3.rc.git.1.5149268.fc25.noarch
python-pulp-common-2.14.0-0.1.beta.fc25.noarch
pulp-ostree-plugins-1.3.0-0.3.rc.git.1.3a770cd.fc25.noarch
python-pulp-puppet-common-2.14.0-0.1.beta.fc25.noarch
python-pulp-ostree-common-1.3.0-0.1.beta.fc25.noarch
pulp-server-2.14.0-0.3.rc.git.1.b8a44dd.fc25.noarch
pulp-docker-admin-extensions-3.0.0-0.3.rc.git.1.5149268.fc25.noarch
pulp-python-admin-extensions-2.0.1-0.3.rc.git.1.1d5b1c8.fc25.noarch
pulp-rpm-plugins-2.14.0-0.3.rc.git.1.5746bbf.fc25.noarch

# cat /etc/redhat-release 
Fedora release 25 (Twenty Five)

Steps to recreate:

[root@fedora25base ~]# pulp-admin docker repo create --repo-id containers --feed https://index.docker.io --upstream-name library/busybox
Repository [containers] successfully created

[root@fedora25base ~]# pulp-admin docker repo sync run --repo-id containers
+----------------------------------------------------------------------+
                 Synchronizing Repository [containers]
+----------------------------------------------------------------------+

This command may be exited via ctrl+c without affecting the request.

Downloading manifests
[|]
... completed

Copying units already in pulp
[-]
... completed

Copying units already in pulp
[-]
... completed

Downloading remote files
[==================================================] 100%
57 of 57 items
... failed

Task Failed

Worker terminated abnormally while processing task
f1fcebda-3f45-44f9-8047-876ddb6e5962.  Check the logs for details

Logs:

// REMOVED wrong logs were attached. See below for logs from same steps as above for correct repo

Files

docker.sync.logs (29.3 KB) docker.sync.logs kdelee@redhat.com, 08/10/2017 04:58 PM
fedora25vm.docker.sync.failed.logs (42.4 KB) fedora25vm.docker.sync.failed.logs kdelee@redhat.com, 08/17/2017 03:20 PM

Related issues

Related to Docker Support - Issue #2849: Worker terminates abnormally while processing taskCLOSED - CURRENTRELEASEamacdona@redhat.comActions
Related to Docker Support - Test #4129: Test sync of a repository that returns a 403 response for a download requestCLOSED - COMPLETEbherringActions
Actions #1

Updated by kdelee@redhat.com over 6 years ago

  • Related to Issue #2849: Worker terminates abnormally while processing task added
Actions #2

Updated by kdelee@redhat.com over 6 years ago

  • Description updated (diff)
Actions #3

Updated by ipanova@redhat.com over 6 years ago

this looks like a dup of #2849, please close it.

Actions #4

Updated by kdelee@redhat.com over 6 years ago

  • Status changed from NEW to CLOSED - DUPLICATE

Closed as duplicate of #2849

Actions #5

Updated by kdelee@redhat.com over 6 years ago

Aug 10 10:51:54 fedora25base pulp[3638]: requests.packages.urllib3.connectionpool:INFO: Starting new HTTPS connection (5): index.docker.io
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download failed: Download of https://index.docker.io/v2/library/busybox/blobs/sha256:3aa1364bbd971e57a41bc0fe67dc188ad81e0090b8f3648e1a84ffabaf134ae1 failed with code 401: Unauthorized
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download succeeded: https://auth.docker.io/token?scope=repository%3Alibrary%2Fbusybox%3Apull&service=registry.docker.io.
Aug 10 10:51:54 fedora25base pulp[3638]: requests.packages.urllib3.connectionpool:INFO: Starting new HTTPS connection (6): index.docker.io
Aug 10 10:51:54 fedora25base pulp[3638]: requests.packages.urllib3.connectionpool:INFO: Starting new HTTPS connection (1): dseasb33srnrn.cloudfront.net
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download succeeded: https://index.docker.io/v2/library/busybox/blobs/sha256:3aa1364bbd971e57a41bc0fe67dc188ad81e0090b8f3648e1a84ffabaf134ae1.
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download failed: Download of https://index.docker.io/v2/library/busybox/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 failed with code 401: Unauthorized
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download failed: Download of https://index.docker.io/v2/library/busybox/blobs/sha256:f5c4e6c495776bf839ba15ab1a68dc7ef2f491ec4c258431a107445812ecd65d failed with code 401: Unauthorized
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download failed: Download of https://index.docker.io/v2/library/busybox/blobs/sha256:9495713bb019088f95e1d6e007340baab2acceb41cfbdbaac0dbae695de06044 failed with code 401: Unauthorized
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download failed: Download of https://index.docker.io/v2/library/busybox/blobs/sha256:e8bcc4b83f9c57e3ef7c831803d30ccc9ca2017915b309451ce5437e4f6a5247 failed with code 401: Unauthorized
Aug 10 10:51:54 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download succeeded: https://auth.docker.io/token?scope=repository%3Alibrary%2Fbusybox%3Apull&service=registry.docker.io.
Aug 10 10:51:55 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download succeeded: https://index.docker.io/v2/library/busybox/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4.
Aug 10 10:51:55 fedora25base pulp[3638]: nectar.downloaders.threaded:INFO: Download succeeded: https://auth.docker.io/token?scope=repository%3Alibrary%2Fbusybox%3Apull&service=registry.docker.io.
Aug 10 10:52:24 fedora25base pulp[3638]: nectar.downloaders.threaded:ERROR: Skipping requests to index.docker.io due to repeated connection failures: HTTPSConnectionPool(host='dseasb33srnrn.cloudfront.net', port=443): Read timed out.
Aug 10 10:52:28 fedora25base pulp[2272]: celery.worker.job:ERROR: (2272-29216) Task pulp.server.managers.repo.sync.sync[6fc440c7-052d-481a-a0b2-529b45acd84d] raised unexpected: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Aug 10 10:52:28 fedora25base pulp[2272]: celery.worker.job:ERROR: (2272-29216) Traceback (most recent call last):
Aug 10 10:52:28 fedora25base pulp[2272]: celery.worker.job:ERROR: (2272-29216)   File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1175, in mark_as_worker_lost
Aug 10 10:52:28 fedora25base pulp[2272]: celery.worker.job:ERROR: (2272-29216)     human_status(exitcode)),
Aug 10 10:52:28 fedora25base pulp[2272]: celery.worker.job:ERROR: (2272-29216) WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
Aug 10 10:52:28 fedora25base pulp[5835]: py.warnings:WARNING: [58286580] (5835-29216) /usr/lib64/python2.7/site-packages/pymongo/topology.py:143: UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#using-pymongo-with-multiprocessing>
Aug 10 10:52:28 fedora25base pulp[5835]: py.warnings:WARNING: [58286580] (5835-29216)   "MongoClient opened before fork. Create MongoClient "
Aug 10 10:52:28 fedora25base pulp[5835]: py.warnings:WARNING: [58286580] (5835-29216)
Aug 10 10:52:28 fedora25base pulp[5835]: pulp.server.async.tasks:INFO: [58286580] Task failed : [6fc440c7-052d-481a-a0b2-529b45acd84d] : Worker terminated abnormally while processing task 6fc440c7-052d-481a-a0b2-529b45acd84d.  Check the logs for details
Aug 10 10:52:28 fedora25base pulp[2272]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[58286580-7a24-4ae9-a295-f6a8c34b586e] succeeded in 0.10011015198s: None

Note that this failure was replicated on same Fedora 25 install as above.

Also, other installs on RHEL7 have successfully synced from this URL.

See attached for full logs from most recent recreation (starts at creation of repo, ends after failure)

Steps to create:

  $ journalctl -f > docker.sync.logs &
  $ pulp-admin docker repo create --repo-id containers --feed https://index.docker.io --upstream-name library/busybox
  $ pulp-admin docker repo sync run --repo-id containers

Then I killed the journalctl command so these should be correct logs.

Actions #6

Updated by ipanova@redhat.com over 6 years ago

  • Status changed from CLOSED - DUPLICATE to NEW

i am re-opening this so the team can a look st this during triage. i think the root cause is the same as of issue #2849, it just manifests in different cases.

Actions #7

Updated by ttereshc over 6 years ago

More investigation needed to figure out if it's a duplicate of #2849 or not. Can't be triaged without that info.

Actions #8

Updated by mhrivnak over 6 years ago

  • OS set to Fedora 25
Actions #9

Updated by mhrivnak over 6 years ago

The logs appear to omit anything from the kernel. Can you run journalctl as root and see what else you find? Hopefully there is an explanation of why SIGKILL was sent, or at least who/what initiated it.

Actions #10

Updated by kdelee@redhat.com over 6 years ago

NOTE:

I did this on a fresh f25 vm provisioned through beaker and it did not have any problems.

I'm trying again on a vm on my own machine.
May just be something with the VM's network connection, but still undesirable behaviour just from slow connection.

Actions #11

Updated by kdelee@redhat.com over 6 years ago

Ok, on fresh install of Fedora 25 on VM on my machine, able to re-create again.

Ran all commands as root. See attached for complete logs.

[root@fedora25vm1 ansible]# journalctl -f > docker.sync.logs &
[1] 2895
[root@fedora25vm1 ansible]# pulp-admin login -u admin -p admin
Successfully logged in. Session certificate will expire at Aug 24 13:14:46 2017
GMT.

[root@fedora25vm1 ansible]# pulp-admin docker repo create --repo-id containers --feed https://index.docker.io --upstream-name library/busybox
Repository [containers] successfully created

[root@fedora25vm1 ansible]# pulp-admin docker repo sync run --repo-id containers
+----------------------------------------------------------------------+
                 Synchronizing Repository [containers]
+----------------------------------------------------------------------+

This command may be exited via ctrl+c without affecting the request.

Downloading manifests
[-]
... completed

Copying units already in pulp
[-]
... completed

Copying units already in pulp
[-]
... completed

Downloading remote files
[===============================================   ] 94%
54 of 57 items
... failed

Task Failed

Worker terminated abnormally while processing task
3653b86a-78ab-44e4-a746-12dc66373674.  Check the logs for details

[root@fedora25vm1 ansible]# pkill -s 15  2895

Excerpt from logs:

Aug 17 09:16:06 fedora25vm1 pulp[2331]: nectar.downloaders.threaded:INFO: Download succeeded: https://index.docker.io/v2/library/busybox/blobs/sha256:557a0c95bfcd88b3630d35f69d765f319138df9010ddea6ab24f6ea1e3d19092.
Aug 17 09:16:09 fedora25vm1 pulp[2331]: nectar.downloaders.threaded:ERROR: Skipping requests to index.docker.io due to repeated connection failures: HTTPSConnectionPool(host='dseasb33srnrn.cloudfront.net', port=443): Read timed out.
Aug 17 09:16:09 fedora25vm1 pulp[2331]: nectar.downloaders.threaded:ERROR: Skipping requests to index.docker.io due to repeated connection failures: HTTPSConnectionPool(host='dseasb33srnrn.cloudfront.net', port=443): Read timed out.
Aug 17 09:16:10 fedora25vm1 chronyd[740]: Selected source 172.104.8.139
Aug 17 09:16:12 fedora25vm1 pulp[2254]: celery.worker.job:ERROR: (2254-59392) Task pulp.server.managers.repo.sync.sync[3653b86a-78ab-44e4-a746-12dc66373674] raised unexpected: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Aug 17 09:16:12 fedora25vm1 pulp[2254]: celery.worker.job:ERROR: (2254-59392) Traceback (most recent call last):
Aug 17 09:16:12 fedora25vm1 pulp[2254]: celery.worker.job:ERROR: (2254-59392)   File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1175, in mark_as_worker_lost
Aug 17 09:16:12 fedora25vm1 pulp[2254]: celery.worker.job:ERROR: (2254-59392)     human_status(exitcode)),
Aug 17 09:16:12 fedora25vm1 pulp[2254]: celery.worker.job:ERROR: (2254-59392) WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
Aug 17 09:16:12 fedora25vm1 celery[2254]: reserved_resource_worker-0@fedora25vm1 ready.
Aug 17 09:16:12 fedora25vm1 pulp[2992]: py.warnings:WARNING: [92de69ac] (2992-59392) /usr/lib64/python2.7/site-packages/pymongo/topology.py:143: UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#using-pymongo-with-multiprocessing>
Aug 17 09:16:12 fedora25vm1 pulp[2992]: py.warnings:WARNING: [92de69ac] (2992-59392)   "MongoClient opened before fork. Create MongoClient "
Aug 17 09:16:12 fedora25vm1 pulp[2992]: py.warnings:WARNING: [92de69ac] (2992-59392)
Aug 17 09:16:12 fedora25vm1 pulp[2992]: pulp.server.async.tasks:INFO: [92de69ac] Task failed : [3653b86a-78ab-44e4-a746-12dc66373674] : Worker terminated abnormally while processing task 3653b86a-78ab-44e4-a746-12dc66373674.  Check the logs for details
Aug 17 09:16:12 fedora25vm1 pulp[2254]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[92de69ac-1d9c-44ef-804f-51340515fb3f] succeeded in 0.0250126969986s: None
Actions #12

Updated by mhrivnak over 6 years ago

Using the audit system, I managed to observe the signal being sent to the child worker process. It is being sent by the child worker process itself! So it is committing suicide. I do not yet know why.

I am using the repo "portainer/portainer" as was used in #2849

Aug 18 14:49:42 fedora25vm1 pulp[2556]: nectar.downloaders.threaded:INFO: Download failed: Download of https://registry-1.docker.io/v2/portainer/portainer/blobs/sha256:6330793656b1565cd8b5d5c1e2a736a351dec39f7250daf31eb91e3a76cc872b failed with code 404: Not Found
Aug 18 14:49:42 fedora25vm1 audit[2556]: SYSCALL arch=c000003e syscall=62 success=yes exit=0 a0=9fc a1=9 a2=1 a3=7fda62410e18 items=0 ppid=2489 pid=2556 auid=4294967295 uid=48 gid=48 euid=48 suid=48 fsuid=48 egid=48 sgid=48 fsgid=48 tty=(none) ses=4294967295 comm="celery" exe="/usr/bin/python2.7" subj=system_u:system_r:celery_t:s0 key=(null)
Aug 18 14:49:42 fedora25vm1 audit: OBJ_PID opid=2556 oauid=-1 ouid=48 oses=-1 obj=system_u:system_r:celery_t:s0 ocomm="celery"
Aug 18 14:49:42 fedora25vm1 audit: PROCTITLE proctitle="(null)"
Aug 18 14:49:42 fedora25vm1 pulp[2489]: celery.worker.job:ERROR: (2489-13920) Task pulp.server.managers.repo.sync.sync[7c224c2c-bf15-481e-80c4-448daa30df1c] raised unexpected: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Aug 18 14:49:42 fedora25vm1 pulp[2489]: celery.worker.job:ERROR: (2489-13920) Traceback (most recent call last):
Aug 18 14:49:42 fedora25vm1 pulp[2489]: celery.worker.job:ERROR: (2489-13920)   File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1175, in mark_as_worker_lost
Aug 18 14:49:42 fedora25vm1 pulp[2489]: celery.worker.job:ERROR: (2489-13920)     human_status(exitcode)),
Aug 18 14:49:42 fedora25vm1 pulp[2489]: celery.worker.job:ERROR: (2489-13920) WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
Actions #13

Updated by ttereshc over 6 years ago

  • Triaged changed from No to Yes
Actions #14

Updated by ttereshc over 6 years ago

  • Sprint Candidate changed from No to Yes
Actions #15

Updated by Ichimonji10 over 6 years ago

I attempted to reproduce this issue on a matrix of six VMs, where one axis is F24, F25 and RHEL 7, and the other axis is nightly builds of Pulp 2.13 and 2.14. To reproduce, I logged in with pulp-admin login ..., then ran each of these scripts in turn:

#!/usr/bin/env bash
set -euo pipefail

for i in {1..20}; do
        pulp-admin docker repo create --repo-id containers --feed https://index.docker.io --upstream-name library/busybox
        pulp-admin docker repo sync run --repo-id containers
        pulp-admin docker repo delete --repo-id containers
done
#!/usr/bin/env bash
set -euo pipefail

for i in {1..10}; do
        pulp-admin docker repo create --repo-id containers --feed https://index.docker.io --upstream-name library/busybox
        pulp-admin docker repo sync run --repo-id containers
        pulp-admin docker repo delete --repo-id containers
        pulp-admin docker orphan remove --all
done

In total, that's (20 sync/publish with cache + 10 sync/publish without cache) * 6 VMs == 180 sync/publish cycles, and I was unable to reproduce the issue. I made sure to visually scan the output of pulp-admin, as I know how it doesn't handle things like return codes correctly. Here's the packages installed on the F25 Pulp 2.14 system:

# rpm -qa | grep -i pulp | sort
pulp-admin-client-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
pulp-docker-admin-extensions-3.0.1-0.1.alpha.git.1.b5164f7.fc25.noarch
pulp-docker-plugins-3.0.1-0.1.alpha.git.1.b5164f7.fc25.noarch
pulp-ostree-admin-extensions-1.3.1-0.1.alpha.git.7.f1dbbc3.fc25.noarch
pulp-ostree-plugins-1.3.1-0.1.alpha.git.7.f1dbbc3.fc25.noarch
pulp-puppet-admin-extensions-2.14.1-0.1.alpha.git.7.78fba43.fc25.noarch
pulp-puppet-plugins-2.14.1-0.1.alpha.git.7.78fba43.fc25.noarch
pulp-python-admin-extensions-2.0.2-0.1.alpha.git.8.79b0442.fc25.noarch
pulp-python-plugins-2.0.2-0.1.alpha.git.8.79b0442.fc25.noarch
pulp-rpm-admin-extensions-2.14.1-0.1.alpha.git.7.231e2a8.fc25.noarch
pulp-rpm-plugins-2.14.1-0.1.alpha.git.7.231e2a8.fc25.noarch
pulp-selinux-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
pulp-server-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
python-pulp-bindings-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
python-pulp-client-lib-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
python-pulp-common-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
python-pulp-docker-common-3.0.1-0.1.alpha.git.1.b5164f7.fc25.noarch
python-pulp-oid_validation-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
python-pulp-ostree-common-1.3.1-0.1.alpha.git.7.f1dbbc3.fc25.noarch
python-pulp-puppet-common-2.14.1-0.1.alpha.git.7.78fba43.fc25.noarch
python-pulp-python-common-2.0.2-0.1.alpha.git.8.79b0442.fc25.noarch
python-pulp-repoauth-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
python-pulp-rpm-common-2.14.1-0.1.alpha.git.7.231e2a8.fc25.noarch
python-pulp-streamer-2.14.1-0.1.alpha.git.10.f84edbd.fc25.noarch
Actions #16

Updated by mhrivnak over 6 years ago

This was seen again during an upgrade test job, running on RHEL 7 with this set of tests: https://github.com/PulpQE/pulp-migrate

Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18128]: nectar.downloaders.threaded:INFO: Download succeeded: https://registry-1.docker.io/v2/library/swarm/blobs/sha256:97bbaf5315541c39a89aa1ab46cd17edbc2be434147349352c1feaf83664c089.
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18128]: nectar.downloaders.threaded:INFO: Download succeeded: https://registry-1.docker.io/v2/library/swarm/blobs/sha256:26fd50703284d344a2892cbc30a24c8a2b336d965b07033fe0de45d32b8b4926.
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18128]: nectar.downloaders.threaded:INFO: Download succeeded: https://registry-1.docker.io/v2/library/swarm/blobs/sha256:ccde00925b2ff6517e1acb30c0ef0106280c56af97480a485641e8bba925066e.
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18128]: nectar.downloaders.threaded:INFO: Download failed: Download of https://registry-1.docker.io/v2/library/swarm/blobs/sha256:fa34e1c94103c5e9b9c4d9685b23a3df772400387a304b6fb39846e8c4a294ba failed with code 503: Service Temporarily Unavailable
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18043]: celery.worker.job:ERROR: (18043-22976) Task pulp.server.managers.repo.sync.sync[c00bba68-1d0a-42cb-a896-0819bb1238c7] raised unexpected: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Oct 02 12:52:39 host-172-16-46-20.openstacklocal celery[18043]: reserved_resource_worker-0@host-172-16-46-20.openstacklocal ready.
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18043]: celery.worker.job:ERROR: (18043-22976) Traceback (most recent call last):
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18043]: celery.worker.job:ERROR: (18043-22976)   File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1171, in mark_as_worker_lost
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18043]: celery.worker.job:ERROR: (18043-22976)     human_status(exitcode)),
Oct 02 12:52:39 host-172-16-46-20.openstacklocal pulp[18043]: celery.worker.job:ERROR: (18043-22976) WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
Actions #18

Updated by amacdona@redhat.com over 5 years ago

  • Sprint Candidate changed from Yes to No
Actions #19

Updated by amacdona@redhat.com over 5 years ago

  • Related to Test #4129: Test sync of a repository that returns a 403 response for a download request added
Actions #20

Updated by amacdona@redhat.com over 5 years ago

  • Status changed from NEW to POST
  • Assignee set to amacdona@redhat.com

https://github.com/pulp/pulp_docker/pull/275

I was able to reproduce this with the same boxes I used for https://pulp.plan.io/issues/2849#note-14.

In this case, rather than moving or deleting the symlink, I moved what the symlink pointed to. This resulted in a 403.

From the user pulp httpd log:


==> ssl_error_log <==
[Fri Nov 09 11:28:32.644359 2018] [core:error] [pid 2779:tid 140209110451968] [client 192.168.122.1:47674] AH00037: Symbolic link not allowed or link target not accessible: /var/www/pub/docker/v2/web/synctest2/blobs/sha256:edc53726e077686a24352d99aa150c01d4724637541162238195e72b86739adf

2-master


[vagrant@pulp2 pulp_docker]$ pulp-admin docker repo create --repo-id wut --upstream-name synctest2 --feed http://omgwut:5000
Repository [wut] successfully created

[vagrant@pulp2 pulp_docker]$ pulp-admin docker repo sync run --repo-id wut --force-full
+----------------------------------------------------------------------+
                     Synchronizing Repository [wut]
+----------------------------------------------------------------------+

The following sync configuration options will be used:

Force Full:  True

This command may be exited via ctrl+c without affecting the request.

Downloading manifests
[-]
... completed

Copying units already in pulp
[-]
... completed

Copying units already in pulp
[-]
... completed

Downloading remote files
[================================================= ] 99%
372 of 375 items
... failed

pulp-admin did not return.

Log:

Nov 09 16:28:33 pulp2.dev pulp[20050]: pulp.server.async.tasks:INFO: [20b6ec07] Task failed : [2d1f5278-e84d-41ba-a912-cfb186d7fe5f] : Worker terminated abnormally while processing task 2d1f5278-e84d-41ba-a912-cfb186d7fe5f.  Check the logs for details
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) Task pulp.server.async.tasks._release_resource[20b6ec07-1ae1-486d-a6fc-59945b0016fe] raised unexpected: AttributeError("'NoneType' object has no attribute 'top'",)
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) Traceback (most recent call last):
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 367, in trace_task
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808)     R = retval = fun(*args, **kwargs)
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808)   File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 107, in __call__
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808)     return super(PulpTask, self).__call__(*args, **kwargs)
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 622, in __protected_call__
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808)     return self.run(*args, **kwargs)
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808)   File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 296, in _release_resource
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808)     new_task.on_failure(exception, task_id, (), {}, MyEinfo)
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808)   File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 602, in on_failure
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808)     if not self.request.called_directly:
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808)   File "/usr/lib/python2.7/site-packages/celery/app/task.py", line 978, in _get_request
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808)     req = self.request_stack.top
Nov 09 16:28:33 pulp2.dev pulp[20050]: celery.app.trace:ERROR: [20b6ec07] (20050-03808) AttributeError: 'NoneType' object has no attribute 'top'
Nov 09 16:28:43 pulp2.dev pulp[19004]: celery.worker.request:ERROR: (19004-03808) Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Nov 09 16:28:43 pulp2.dev pulp[19004]: celery.worker.request:ERROR: (19004-03808) Traceback (most recent call last):
Nov 09 16:28:43 pulp2.dev pulp[19004]: celery.worker.request:ERROR: (19004-03808)   File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1223, in mark_as_worker_lost
Nov 09 16:28:43 pulp2.dev pulp[19004]: celery.worker.request:ERROR: (19004-03808)     human_status(exitcode)),
Nov 09 16:28:43 pulp2.dev pulp[19004]: celery.worker.request:ERROR: (19004-03808) WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).

With patch

[vagrant@pulp2 pulp]$ pulp-admin docker repo sync run --repo-id wut --force-full
+----------------------------------------------------------------------+
                     Synchronizing Repository [wut]
+----------------------------------------------------------------------+

The following sync configuration options will be used:

Force Full:  True

This command may be exited via ctrl+c without affecting the request.

Downloading manifests
[-]
... completed

Copying units already in pulp
[-]
... completed

Copying units already in pulp
[-]
... completed

Downloading remote files
[================================================= ] 99%
374 of 375 items
... failed

Task Failed

Image download(s) from
http://omgwut:5000/v2/synctest2/blobs/sha256:edc53726e077686a24352d99aa150c01d47
24637541162238195e72b86739adf failed. Sync task has failed to prevent a
corrupted repository.
Actions #21

Updated by amacdona@redhat.com over 5 years ago

  • Status changed from POST to MODIFIED
Actions #22

Updated by ttereshc over 5 years ago

  • Platform Release set to 2.18.0
Actions #23

Updated by ttereshc over 5 years ago

  • Sprint/Milestone set to 2.18.0

Added by amacdona@redhat.com over 5 years ago

Revision c81bc353 | View on GitHub

Fail gracefully when image download fails.

https://pulp.plan.io/issues/2849 fixes #2849

https://pulp.plan.io/issues/2966 fixes #2966

(cherry picked from commit b09801b1de94ea7a542b565e6c9caa456aae8619)

Added by amacdona@redhat.com over 5 years ago

Revision c81bc353 | View on GitHub

Fail gracefully when image download fails.

https://pulp.plan.io/issues/2849 fixes #2849

https://pulp.plan.io/issues/2966 fixes #2966

(cherry picked from commit b09801b1de94ea7a542b565e6c9caa456aae8619)

Actions #25

Updated by ttereshc over 5 years ago

  • Status changed from MODIFIED to 5
Actions #26

Updated by amacdona@redhat.com over 5 years ago

  • Sprint set to Sprint 46
Actions #27

Updated by amacdona@redhat.com over 5 years ago

  • Sprint changed from Sprint 46 to Sprint 45
Actions #28

Updated by bherring over 5 years ago

Verified in #4219.

Actions #29

Updated by ttereshc over 5 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #30

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF