Project

Profile

Help

Issue #1629

closed

Frequent package download failures: nectar.downloaders.threaded:WARNING: Connection Error - http://10.19.11.110/pulp/repos//content/dist/rhel/server/6/6Server/x86_64/os/cups-1.4.2-67.el6_6.1.x86_64.rpm could not be reached.

Added by jwmatthews about 8 years ago. Updated over 3 years ago.

Status:
CLOSED - WORKSFORME
Priority:
Normal
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
2.6.0
Platform Release:
2.8.0
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

We had 2 hackathon events, ~20+ participants of each using Sat 6.1 to sync ~100GB of content spread over ~8 repos.
The majority of the participants ran into issues of many package failures. Majority as in 75% of people had problems.

Some saw ~16k failures of a repo with ~28k packages, some saw less errors.

The syncs were going against a blade that was in same rack and dedicated to serving content.
This wasn't against the CDN, but a pulp server in same rack.

Env:
Pulp server is running in a rack exporting content.
~20+ other blades are each running an instance of Satellite in a VM (16GB of RAM in the VM, 16 virtual CPUs)
Each Satellite was syncing content from the pulp server exporting

Alex Krzos from the performance team looked at the environment to gather some info.
Below is info from him on what he saw:

Issue:  Synced multiple Satellite 6 servers to a single pulp repo server and observed system performance to see if system resources were a bottleneck.  The repo server did not appear to be bottlenecked at all.  The number of current apache processes serving content was observed on a 1s interval and did not exceed ~130 processes during this time frame.  

One of the Satellite 6 servers was observed concurrently and it was noted in the log when a package could not be synced (nectar.downloaders.threaded:WARNING: Connection Error - http://10.19.11.110/pulp/repos//content/dist/rhel/server/6/6Server/x86_64/os/cups-1.4.2-67.el6_6.1.x86_64.rpm could not be reached.) A wget on the same url responsible to serve the package shows that the rpm is indeed there and able to be served by apache

The resource usage of the satellite server also suggested that while there was high disk io (Noted through ~20% cpu io wait time on a per second interval) that no other bottlenecks should be preventing the rpm from downloading

Looking through the httpd config files for the pulp repo server it was determined a few experiments should be conducted to determine if "requests.packages.urllib3.connectionpool" python library may not be handling apache's configuration on when a connection is closed and when an apache process is recycled.

Versions Used:
- Server sharing content was Pulp 2.7.0 (Pulp only, not Satellite)
- Clients syncing content was Sat 6.1, so Pulp 2.6.0

some info from Alex:

---Procs--- --------------Memory-(MB)--------------------- Swp(MB) --BlockIO-(MB)-----BlockIOaccum-(MB)-- -----Networks-(MB)--------- -----System------ --------CPU %---- Feb 03
r b swpd free buff cache dirty writeb si so bi bo bi+o biA boA bioA LOio LOioA ETHio ETHioA in cs us sy id wa st
4 4 158 238 0 9137 610 96 0 0 1 152 154 2548 21153 23701 0 558 52 20933 29846 29448 4 2 87 8 0 13:28:52
1 2 158 187 0 9171 337 142 0 0 0 344 344 2548 21497 24045 0 558 42 20976 21850 20103 4 1 87 7 0 13:28:53
1 4 158 147 0 9215 381 142 0 0 0 0 0 2548 21497 24045 0 558 54 21030 24588 19900 3 1 87 9 0 13:28:54
0 8 158 124 0 9241 373 34 0 0 0 0 0 2548 21497 24045 0 558 22 21053 9040 7986 0 0 79 20 0 13:28:55
3 4 158 148 0 9227 353 77 0 0 0 107 108 2548 21604 24152 0 558 59 21112 22888 16322 2 1 81 16 0 13:28:56

(DROP happens here) 
2   13  159       135         0      9210    373     38    0    1    5   70   75    2553   21674   24227     0     559    81   21194    53555    59050   7   3  70  20   0  13:28:57
15    2  162       159         0      9154    451      0    0    3   34   14   49    2587   21688   24275     4     563    83   21277    37692    23763   6   3  78  14   0  13:28:58
 8    2  164       215         0      9098    602      0    0    1   85    5   90    2672   21693   24365    14     578   126   21404   116107   156688  18   6  73   3   0  13:28:59
 2    2  173       193         0      9147    675      0    0    8   62   11   73    2734   21704   24438     4     582    70   21475    86398   105883  13   4  78   5   0  13:29:00
 6    1  173       256         0      9091    765      0    0    0   47   22   69    2781   21726   24507     6     589   110   21586   125251   173754  16   6  77   2   0  13:29:01
 3    1  173       248         0      9139    837     31    0    0    1   33   34    2782   21759   24541     5     595   100   21686   107694   144871  17   5  77   1   0  13:29:02
 3    6  174       208         0      9147    764    104    0    1    1  107  108    2783   21866   24649     1     596    57   21744    42238    44077   8   2  82   8   0  13:29:03
 4    2  175       183         0      9214    838     52    0    0    0   24   24    2783   21890   24673     0     596    74   21819    41504    43540   8   2  77  13   0  13:29:04
 3    2  175       141         0      9244    798    103    0    0    1  157  159    2784   22047   24831     5     601   115   21934    83234    99274  10   4  84   1   0  13:29:05
14    3  176       184         0      9226    780     78    0    0    0   94   94    2784   22141   24925     2     604    78   22012    58275    70642   9   3  76  12   0  13:29:06
 1   11  177       120         0      9279    816      0    0    0    0   40   40    2784   22181   24965     0     605    88   22101    48101    49623   8   2  83   6   0  13:29:07
 2    2  177       149         0      9252    754    101    0    0    2  144  146    2786   22325   25111     2     607    99   22201    77630    92513  11   4  77   8   0  13:29:08
 2    8  178       127         0      9276    780     66    0    0    1   31   33    2787   22356   25143     0     607    40   22242    17634    12000   6   1  73  20   0  13:29:09
 6    1  178       153         0      9225    618     65    0    0    0  195  196    2787   22551   25338     1     608    33   22276    28080    33172   7   2  75  16   0  13:29:10

Files

logs-pulp-box-1.tgz (35.6 MB) logs-pulp-box-1.tgz jmrodri, 02/05/2016 03:24 AM
logs-139-busted.tgz (482 KB) logs-139-busted.tgz jmrodri, 02/05/2016 03:25 AM
logs-187-working-1.tgz (346 KB) logs-187-working-1.tgz jmrodri, 02/05/2016 03:27 AM
0001-Log-the-actual-connection-error.patch (901 Bytes) 0001-Log-the-actual-connection-error.patch Patch Nectar to log the actual connection error. jcline@redhat.com, 02/05/2016 09:27 PM

Also available in: Atom PDF