Issue #8867
closeddownloads from cdn.redhat fail with 'Cannot connect to host cdn.redhat.com:443 ssl:default [[SSL: SSLV3_ALERT_UNEXPECTED_MESSAGE] sslv3 alert unexpected message (_ssl.c:877)]'
Description
We've had 3 people (ehelms, mhulan, and myself) all hit this error with the red hat cdn. When downloading packages the sync will fail with:
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/rq/worker.py", line 975, in perform_job
rv = job.perform()
File "/usr/lib/python3.6/site-packages/rq/job.py", line 696, in perform
self._result = self._execute()
File "/usr/lib/python3.6/site-packages/rq/job.py", line 719, in _execute
return self.func(*self.args, **self.kwargs)
File "/usr/lib/python3.6/site-packages/pulp_rpm/app/tasks/synchronizing.py", line 269, in synchronize
dv.create()
File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/declarative_version.py", line 147, in create
loop.run_until_complete(pipeline)
File "/usr/lib64/python3.6/asyncio/base_events.py", line 484, in run_until_complete
return future.result()
File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/api.py", line 225, in create_pipeline
await asyncio.gather(*futures)
File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/api.py", line 43, in __call__
await self.run()
File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/artifact_stages.py", line 171, in run
pb.done += task.result() # download_count
File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/artifact_stages.py", line 197, in _handle_content_unit
await asyncio.gather(*downloaders_for_content)
File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/models.py", line 88, in download
download_result = await downloader.run(extra_data=self.extra_data)
File "/usr/lib/python3.6/site-packages/pulpcore/download/base.py", line 227, in run
return await self._run(extra_data=extra_data)
File "/usr/lib/python3.6/site-packages/pulp_rpm/app/downloaders.py", line 88, in _run
async with self.session.get(url, proxy=self.proxy, auth=self.auth) as response:
File "/usr/lib64/python3.6/site-packages/aiohttp/client.py", line 1117, in __aenter__
self._resp = await self._coro
File "/usr/lib64/python3.6/site-packages/aiohttp/client.py", line 521, in _request
req, traces=traces, timeout=real_timeout
File "/usr/lib64/python3.6/site-packages/aiohttp/connector.py", line 535, in connect
proto = await self._create_connection(req, traces, timeout)
File "/usr/lib64/python3.6/site-packages/aiohttp/connector.py", line 892, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
File "/usr/lib64/python3.6/site-packages/aiohttp/connector.py", line 1051, in _create_direct_connection
raise last_exc
File "/usr/lib64/python3.6/site-packages/aiohttp/connector.py", line 1032, in _create_direct_connection
client_error=client_error,
File "/usr/lib64/python3.6/site-packages/aiohttp/connector.py", line 973, in _wrap_create_connection
raise ClientConnectorSSLError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorSSLError: Cannot connect to host cdn.redhat.com:443 ssl:default [[SSL: SSLV3_ALERT_UNEXPECTED_MESSAGE] sslv3 alert unexpected message (_ssl.c:877)]
after continually retrying it seems to work.
I lowered the download concurrency to 5 and still hit this error. I will try lower.
Its possible this is a cdn issue, but if users can't reliably sync from redhat's cdn, that is a big problem.
[vagrant@devel2 foreman]$ rpm -q python3-pulpcore python3-pulpcore-3.11.0-2.el7.noarch
[vagrant@devel2 foreman]$ rpm -q python3-pulp-rpm python3-pulp-rpm-3.11.0-1.el7.noarch
Updated by jsherril@redhat.com over 3 years ago
To add a little more info:
I started a sync with a download concurrency of 1, and let it run for a while. It downloaded ~2000 rpms without error.
I started a 2nd sync also with a download concurrency of 1 while the first was running.
So at this point i have two syncs running with a download concurrency of 1. Within ~10 minutes the 2nd sync failed with this error.
So it appears that more than 1 download from cdn.redhat.com at a time will cause this to fail?
Updated by jsherril@redhat.com over 3 years ago
i take it back, the 1st sync ended up failing with the same error long after the 2nd sync errored. I don't think concurrency has any bearing on this other than the fact that you're likely to hit it quicker.
Updated by bmbouter over 3 years ago
Does this occur every time or intermittently?
Also maybe we should be thinking about how this is configured: https://docs.python.org/3/library/ssl.html#ssl.SSLContext
What TLS/SSL versions do these servers support exactly?
Updated by ggainey over 3 years ago
Both of the foreman-reports fail with the error:
Mar 31 14:06:50 foremanserver.localpulpcore-worker-4[1069]: aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host cdn.redhat.com:443 ssl:default [None]
One foreman user says "things are suddenly working" : https://community.theforeman.org/t/unable-to-install-update-packages-from-rhel-7-server-repository/23638/20?u=ggainey
One foreman user isn't complaining about sync failing - the report is "sync'd on_demand, failure happens when client requests an RPM".
"Cannot connect to host" means Pulp said "open a connection to this address please" and TCP/IP said "sorry boss, can't find it".
SSLV3_ALERT_UNEXPECTED_MESSAGE is ssl-lib telling us "the server handed me back a response that I don't understand". If it happened every time, it could be a Pulp3-side configuration/disagreement on TLS protocols. Happening "sometimes", makes it hard to believe there's anything Pulp could be doing to cause this. With high concurrent requests, and the way CDN works, some requests "at the same time" are likely to be hitting different physical machines in the CDN-cloud; it's possible some/one/several of them have/had issues?
I have been trying to get syncs to fail without success. I'm on our pulp2-nightly-pulp3-source-centos7 box, running pulpcore-3.11 and pulp_rpm-3.11. I have tried syncing rhel7 base/extras/optional all once, syncing multiple rhel7-base concurrently, and syncing multiple rhel8 base/appstream/kickstart concurrently, all with download_concurrency=10. The only failures I get are "my box doesn't have enough memory to sync RHEL7 three times concurrently", and OOM-Killer shoots a task.
I'm using this script in my latest tests:
REMOTES=(\
https://cdn.redhat.com/content/dist/rhel8/8/x86_64/appstream/os \
https://cdn.redhat.com/content/dist/rhel8/8/x86_64/baseos/os \
https://cdn.redhat.com/content/dist/rhel8/8/x86_64/baseos/os \
https://cdn.redhat.com/content/dist/rhel8/8/x86_64/baseos/os \
https://cdn.redhat.com/content/dist/rhel8/8/x86_64/baseos/kickstart \
https://cdn.redhat.com/content/dist/rhel8/8/x86_64/appstream/kickstart \
)
NAMES=(\
cdn.redhat.com_content_dist_rhel8_8_x86_64_appstream_os \
cdn.redhat.com_content_dist_rhel8_8_x86_64_baseos_os_1 \
cdn.redhat.com_content_dist_rhel8_8_x86_64_baseos_os_2 \
cdn.redhat.com_content_dist_rhel8_8_x86_64_baseos_os_3 \
cdn.redhat.com_content_dist_rhel8_8_x86_64_baseos_kickstart \
cdn.redhat.com_content_dist_rhel8_8_x86_64_appstream_kickstart \
)
for r in ${!REMOTES[@]}; do
echo ">>>>> " TESTING [${REMOTES[$r]}] INTO [${NAMES[$r]}];
# create repo
REPO_HREF=$(http POST :/pulp/api/v3/repositories/rpm/rpm/ name=${NAMES[$r]} | jq -r '.pulp_href')
echo "repo_href : " $REPO_HREF
if [ -z "$REPO_HREF" ]; then echo ">>>>> " FAILED REPO; continue; fi
# create remote
REMOTE_HREF=$(http POST :/pulp/api/v3/remotes/rpm/rpm/ name=${NAMES[$r]} url=${REMOTES[$r]} policy='on_demand' download_concurrency=10 client_cert=@/home/vagrant/devel/pulp_startup/CDN_cert/cdn.crt client_key=@/home/vagrant/devel/pulp_startup/CDN_cert/cdn.key ca_cert=@/home/vagrant/devel/pulp_startup/CDN_cert/redhat-uep.pem | jq -r '.pulp_href')
echo "remote_href : " $REMOTE_HREF
if [ -z "$REMOTE_HREF" ]; then echo "FAILED REMOTE"; continue; fi
# sync repo using that remote
TASK_URL=$(http POST :$REPO_HREF'sync/' remote=$REMOTE_HREF | jq -r '.task')
echo "Task url : " $TASK_URL
if [ -z "$TASK_URL" ]; then echo "FAILED TASK"; continue; fi
echo ""
done
Can someone who is seeing this, please give the exact steps being followed? Pertinest questions are
- which repo-urls
- what concurrency
- immediate or on_demand
- how many repos simultaneously
Updated by jsherril@redhat.com over 3 years ago
I've only seen this on 'immediate' repos. Some download will fail randomly causing the entire sync to fail. (it looks like about 1 in every 1000 or 2000 downloads will fail).
It happens on every sync (for me), when there is a large amount of rpms. If there's only a 100, i may not see it.
Updated by ggainey over 3 years ago
"immediate" is def the key to recreating.
I modified my script to set the remote to "immediate", and to only attempt rhel8-base-ks and rhel8-appstream-ks. At concurrency=10, I hit failures of "aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected". At concurrency=5, I hit "aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host cdn.redhat.com:443 ssl:default [Connection reset by peer]"
Here's the modified script:
REMOTES=(\
https://cdn.redhat.com/content/dist/rhel8/8/x86_64/baseos/kickstart \
https://cdn.redhat.com/content/dist/rhel8/8/x86_64/appstream/kickstart \
)
NAMES=(\
cdn.redhat.com_content_dist_rhel8_8_x86_64_baseos_kickstart \
cdn.redhat.com_content_dist_rhel8_8_x86_64_appstream_kickstart \
)
for r in ${!REMOTES[@]}; do
echo ">>>>> " TESTING [${REMOTES[$r]}] INTO [${NAMES[$r]}];
# create repo
REPO_HREF=$(http POST :/pulp/api/v3/repositories/rpm/rpm/ name=${NAMES[$r]} | jq -r '.pulp_href')
echo "repo_href : " $REPO_HREF
if [ -z "$REPO_HREF" ]; then echo ">>>>> " FAILED REPO; continue; fi
# create remote
REMOTE_HREF=$(http POST :/pulp/api/v3/remotes/rpm/rpm/ name=${NAMES[$r]} url=${REMOTES[$r]} policy='immediate' download_concurrency=5 client_cert=@/home/vagrant/devel/pulp_startup/CDN_cert/cdn.crt client_key=@/home/vagrant/devel/pulp_startup/CDN_cert/cdn.key ca_cert=@/home/vagrant/devel/pulp_startup/CDN_cert/redhat-uep.pem | jq -r '.pulp_href')
echo "remote_href : " $REMOTE_HREF
if [ -z "$REMOTE_HREF" ]; then echo "FAILED REMOTE"; continue; fi
# sync repo using that remote
TASK_URL=$(http POST :$REPO_HREF'sync/' remote=$REMOTE_HREF | jq -r '.task')
echo "Task url : " $TASK_URL
if [ -z "$TASK_URL" ]; then echo "FAILED TASK"; continue; fi
echo ""
done
Here's the output of a failed task at c=10. disconnected:
(pulp) [vagrant@pulp2-nightly-pulp3-source-centos7 ~]$ http :/pulp/api/v3/tasks/a7ab29cc-98a7-49b9-8d5a-104cbc3f1dcd/
{
"child_tasks": [],
"created_resources": [],
"error": {
"description": "Server disconnected",
"traceback": " File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/worker.py\", line 975, in perform_job\n rv = job.perform()\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py\", line 696, in perform\n self._result = self._execute()\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py\", line 719, in _execute\n return self.func(*self.args, **self.kwargs)\n File \"/home/vagrant/devel/pulp_rpm/pulp_rpm/app/tasks/synchronizing.py\", line 269, in synchronize\n dv.create()\n File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/declarative_version.py\", line 147, in create\n loop.run_until_complete(pipeline)\n File \"/usr/lib64/python3.6/asyncio/base_events.py\", line 484, in run_until_complete\n return future.result()\n File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/api.py\", line 225, in create_pipeline\n await asyncio.gather(*futures)\n File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/api.py\", line 43, in __call__\n await self.run()\n File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/artifact_stages.py\", line 171, in run\n pb.done += task.result() # download_count\n File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/artifact_stages.py\", line 197, in _handle_content_unit\n await asyncio.gather(*downloaders_for_content)\n File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/models.py\", line 88, in download\n download_result = await downloader.run(extra_data=self.extra_data)\n File \"/home/vagrant/devel/pulpcore/pulpcore/download/base.py\", line 227, in run\n return await self._run(extra_data=extra_data)\n File \"/home/vagrant/devel/pulp_rpm/pulp_rpm/app/downloaders.py\", line 88, in _run\n async with self.session.get(url, proxy=self.proxy, auth=self.auth) as response:\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/aiohttp/client.py\", line 1117, in __aenter__\n self._resp = await self._coro\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/aiohttp/client.py\", line 544, in _request\n await resp.start(conn)\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/aiohttp/client_reqrep.py\", line 890, in start\n message, payload = await self._protocol.read() # type: ignore\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/aiohttp/streams.py\", line 604, in read\n await self._waiter\n"
},
"finished_at": "2021-06-09T16:50:51.906010Z",
"logging_cid": "444ec57e298749e8a2fba779a1eaa438",
"name": "pulp_rpm.app.tasks.synchronizing.synchronize",
"parent_task": null,
"progress_reports": [
{
"code": "parsing.comps",
"done": 42,
"message": "Parsed Comps",
"state": "completed",
"suffix": null,
"total": 42
},
{
"code": "parsing.packages",
"done": 1709,
"message": "Parsed Packages",
"state": "completed",
"suffix": null,
"total": 1709
},
{
"code": "downloading.metadata",
"done": 4,
"message": "Downloading Metadata Files",
"state": "completed",
"suffix": null,
"total": null
},
{
"code": "downloading.artifacts",
"done": 1205,
"message": "Downloading Artifacts",
"state": "failed",
"suffix": null,
"total": null
},
{
"code": "associating.content",
"done": 1001,
"message": "Associating Content",
"state": "canceled",
"suffix": null,
"total": null
}
],
"pulp_created": "2021-06-09T16:47:30.517188Z",
"pulp_href": "/pulp/api/v3/tasks/a7ab29cc-98a7-49b9-8d5a-104cbc3f1dcd/",
"reserved_resources_record": [
"/pulp/api/v3/repositories/rpm/rpm/ec561f1a-c890-4c79-ba5a-b26b00fee277/",
"/pulp/api/v3/remotes/rpm/rpm/f54b387f-445e-4887-800e-7889ffafe5ca/"
],
"started_at": "2021-06-09T16:47:30.667896Z",
"state": "failed",
"task_group": null,
"worker": "/pulp/api/v3/workers/6c9c1d0e-ad00-4045-9f5b-3b1ecbb067c9/"
}
And here's the output of a failed task with c=5 and "Cannot connect":
(pulp) [vagrant@pulp2-nightly-pulp3-source-centos7 ~]$ http :/pulp/api/v3/tasks/c81eb5d3-a77a-4454-a559-644812017a11/
{
"child_tasks": [],
"created_resources": [],
"error": {
"description": "Cannot connect to host cdn.redhat.com:443 ssl:default [Connection reset by peer]",
"traceback": " File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/worker.py\", line 975, in perform_job\n rv = job.perform()\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py\", line 696, in perform\n self._result = self._execute()\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py\", line 719, in _execute\n return self.func(*self.args, **self.kwargs)\n File \"/home/vagrant/devel/pulp_rpm/pulp_rpm/app/tasks/synchronizing.py\", line 269, in synchronize\n dv.create()\n File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/declarative_version.py\", line 147, in create\n loop.run_until_complete(pipeline)\n File \"/usr/lib64/python3.6/asyncio/base_events.py\", line 484, in run_until_complete\n return future.result()\n File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/api.py\", line 225, in create_pipeline\n await asyncio.gather(*futures)\n File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/api.py\", line 43, in __call__\n await self.run()\n File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/artifact_stages.py\", line 171, in run\n pb.done += task.result() # download_count\n File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/artifact_stages.py\", line 197, in _handle_content_unit\n await asyncio.gather(*downloaders_for_content)\n File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/models.py\", line 88, in download\n download_result = await downloader.run(extra_data=self.extra_data)\n File \"/home/vagrant/devel/pulpcore/pulpcore/download/base.py\", line 227, in run\n return await self._run(extra_data=extra_data)\n File \"/home/vagrant/devel/pulp_rpm/pulp_rpm/app/downloaders.py\", line 88, in _run\n async with self.session.get(url, proxy=self.proxy, auth=self.auth) as response:\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/aiohttp/client.py\", line 1117, in __aenter__\n self._resp = await self._coro\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/aiohttp/client.py\", line 521, in _request\n req, traces=traces, timeout=real_timeout\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/aiohttp/connector.py\", line 535, in connect\n proto = await self._create_connection(req, traces, timeout)\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/aiohttp/connector.py\", line 892, in _create_connection\n _, proto = await self._create_direct_connection(req, traces, timeout)\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/aiohttp/connector.py\", line 1051, in _create_direct_connection\n raise last_exc\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/aiohttp/connector.py\", line 1032, in _create_direct_connection\n client_error=client_error,\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/aiohttp/connector.py\", line 975, in _wrap_create_connection\n raise client_error(req.connection_key, exc) from exc\n"
},
"finished_at": "2021-06-09T17:03:17.200029Z",
"logging_cid": "bedcbea08fec4d9da7313a908ad2a974",
"name": "pulp_rpm.app.tasks.synchronizing.synchronize",
"parent_task": null,
"progress_reports": [
{
"code": "parsing.modulemds",
"done": 138,
"message": "Parsed Modulemd",
"state": "completed",
"suffix": null,
"total": 138
},
{
"code": "parsing.modulemd_defaults",
"done": 45,
"message": "Parsed Modulemd-defaults",
"state": "completed",
"suffix": null,
"total": 45
},
{
"code": "parsing.comps",
"done": 67,
"message": "Parsed Comps",
"state": "completed",
"suffix": null,
"total": 67
},
{
"code": "downloading.artifacts",
"done": 1669,
"message": "Downloading Artifacts",
"state": "failed",
"suffix": null,
"total": null
},
{
"code": "parsing.packages",
"done": 4005,
"message": "Parsed Packages",
"state": "canceled",
"suffix": null,
"total": 6036
},
{
"code": "downloading.metadata",
"done": 4,
"message": "Downloading Metadata Files",
"state": "canceled",
"suffix": null,
"total": null
},
{
"code": "associating.content",
"done": 1501,
"message": "Associating Content",
"state": "canceled",
"suffix": null,
"total": null
}
],
"pulp_created": "2021-06-09T16:55:13.701688Z",
"pulp_href": "/pulp/api/v3/tasks/c81eb5d3-a77a-4454-a559-644812017a11/",
"reserved_resources_record": [
"/pulp/api/v3/remotes/rpm/rpm/c748d893-74dc-4798-90c9-7e527f63e2f3/",
"/pulp/api/v3/repositories/rpm/rpm/96667248-fec9-4b09-8e5d-bacc663b0fe6/"
],
"started_at": "2021-06-09T16:55:13.912093Z",
"state": "failed",
"task_group": null,
"worker": "/pulp/api/v3/workers/44ce86b2-f61a-4ff9-8ca8-9a1ba4ed634a/"
}
Updated by ggainey over 3 years ago
Previous testing was done connected to internal-VPN. Turning off VPN and hitting CDN via external-internet, c=5 generates "aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected".
Bumped c=10, and...everything Just Worked. Sigh. Investigation continues.
Updated by dalley over 3 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to dalley
Added by dalley over 3 years ago
Updated by dalley over 3 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulpcore|39b44a4a53f3ee496e89263b66f772fdf551961c.
Updated by pulpbot over 3 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Retry file downloads on more types of errors
closes: #8881 https://pulp.plan.io/issues/8881 closes: #8867 https://pulp.plan.io/issues/8867