Issue #1629
closedFrequent package download failures: nectar.downloaders.threaded:WARNING: Connection Error - http://10.19.11.110/pulp/repos//content/dist/rhel/server/6/6Server/x86_64/os/cups-1.4.2-67.el6_6.1.x86_64.rpm could not be reached.
Description
We had 2 hackathon events, ~20+ participants of each using Sat 6.1 to sync ~100GB of content spread over ~8 repos.
The majority of the participants ran into issues of many package failures. Majority as in 75% of people had problems.
Some saw ~16k failures of a repo with ~28k packages, some saw less errors.
The syncs were going against a blade that was in same rack and dedicated to serving content.
This wasn't against the CDN, but a pulp server in same rack.
Env:
Pulp server is running in a rack exporting content.
~20+ other blades are each running an instance of Satellite in a VM (16GB of RAM in the VM, 16 virtual CPUs)
Each Satellite was syncing content from the pulp server exporting
Alex Krzos from the performance team looked at the environment to gather some info.
Below is info from him on what he saw:
Issue: Synced multiple Satellite 6 servers to a single pulp repo server and observed system performance to see if system resources were a bottleneck. The repo server did not appear to be bottlenecked at all. The number of current apache processes serving content was observed on a 1s interval and did not exceed ~130 processes during this time frame.
One of the Satellite 6 servers was observed concurrently and it was noted in the log when a package could not be synced (nectar.downloaders.threaded:WARNING: Connection Error - http://10.19.11.110/pulp/repos//content/dist/rhel/server/6/6Server/x86_64/os/cups-1.4.2-67.el6_6.1.x86_64.rpm could not be reached.) A wget on the same url responsible to serve the package shows that the rpm is indeed there and able to be served by apache
The resource usage of the satellite server also suggested that while there was high disk io (Noted through ~20% cpu io wait time on a per second interval) that no other bottlenecks should be preventing the rpm from downloading
Looking through the httpd config files for the pulp repo server it was determined a few experiments should be conducted to determine if "requests.packages.urllib3.connectionpool" python library may not be handling apache's configuration on when a connection is closed and when an apache process is recycled.
Versions Used:
- Server sharing content was Pulp 2.7.0 (Pulp only, not Satellite)
- Clients syncing content was Sat 6.1, so Pulp 2.6.0
some info from Alex:
---Procs--- --------------Memory-(MB)--------------------- Swp(MB) --BlockIO-(MB)-----BlockIOaccum-(MB)-- -----Networks-(MB)--------- -----System------ --------CPU %---- Feb 03
r b swpd free buff cache dirty writeb si so bi bo bi+o biA boA bioA LOio LOioA ETHio ETHioA in cs us sy id wa st
4 4 158 238 0 9137 610 96 0 0 1 152 154 2548 21153 23701 0 558 52 20933 29846 29448 4 2 87 8 0 13:28:52
1 2 158 187 0 9171 337 142 0 0 0 344 344 2548 21497 24045 0 558 42 20976 21850 20103 4 1 87 7 0 13:28:53
1 4 158 147 0 9215 381 142 0 0 0 0 0 2548 21497 24045 0 558 54 21030 24588 19900 3 1 87 9 0 13:28:54
0 8 158 124 0 9241 373 34 0 0 0 0 0 2548 21497 24045 0 558 22 21053 9040 7986 0 0 79 20 0 13:28:55
3 4 158 148 0 9227 353 77 0 0 0 107 108 2548 21604 24152 0 558 59 21112 22888 16322 2 1 81 16 0 13:28:56
(DROP happens here)
2 13 159 135 0 9210 373 38 0 1 5 70 75 2553 21674 24227 0 559 81 21194 53555 59050 7 3 70 20 0 13:28:57
15 2 162 159 0 9154 451 0 0 3 34 14 49 2587 21688 24275 4 563 83 21277 37692 23763 6 3 78 14 0 13:28:58
8 2 164 215 0 9098 602 0 0 1 85 5 90 2672 21693 24365 14 578 126 21404 116107 156688 18 6 73 3 0 13:28:59
2 2 173 193 0 9147 675 0 0 8 62 11 73 2734 21704 24438 4 582 70 21475 86398 105883 13 4 78 5 0 13:29:00
6 1 173 256 0 9091 765 0 0 0 47 22 69 2781 21726 24507 6 589 110 21586 125251 173754 16 6 77 2 0 13:29:01
3 1 173 248 0 9139 837 31 0 0 1 33 34 2782 21759 24541 5 595 100 21686 107694 144871 17 5 77 1 0 13:29:02
3 6 174 208 0 9147 764 104 0 1 1 107 108 2783 21866 24649 1 596 57 21744 42238 44077 8 2 82 8 0 13:29:03
4 2 175 183 0 9214 838 52 0 0 0 24 24 2783 21890 24673 0 596 74 21819 41504 43540 8 2 77 13 0 13:29:04
3 2 175 141 0 9244 798 103 0 0 1 157 159 2784 22047 24831 5 601 115 21934 83234 99274 10 4 84 1 0 13:29:05
14 3 176 184 0 9226 780 78 0 0 0 94 94 2784 22141 24925 2 604 78 22012 58275 70642 9 3 76 12 0 13:29:06
1 11 177 120 0 9279 816 0 0 0 0 40 40 2784 22181 24965 0 605 88 22101 48101 49623 8 2 83 6 0 13:29:07
2 2 177 149 0 9252 754 101 0 0 2 144 146 2786 22325 25111 2 607 99 22201 77630 92513 11 4 77 8 0 13:29:08
2 8 178 127 0 9276 780 66 0 0 1 31 33 2787 22356 25143 0 607 40 22242 17634 12000 6 1 73 20 0 13:29:09
6 1 178 153 0 9225 618 65 0 0 0 195 196 2787 22551 25338 1 608 33 22276 28080 33172 7 2 75 16 0 13:29:10
Files