Issue #8180
closedContent serving performance tuning
Description
Ticket moved to GitHub: "pulp/pulpcore/1963":https://github.com/pulp/pulpcore/issues/1963
We have been using a pulp2 installation serving yum repos to ~1200 clients. We have now moved to a pulp3 installation and when we first made the move client yum runs was very slow. I mean timeout-slow, grinding our puppet runs to a halt and forcing me to revert back to our pulp2 instance.
Pulp3 is behind a apache reverse proxy, and I'm seeing errors like:
AH01102: error reading status line from remote server 127.0.0.1:24816
There is no shortage of cpu/ram on the server. Pulp is connected to a remote postgresql server, which was my first suspect but after quite a lot of troubleshooting and testing we concluded that the problem seemed to be the pulp server.
After further debugging I found additional error logs from Apache:
AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
and
AH00484: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting
I'm using mpm_event in Apache where default value for ServerLimit is 16, I increased this to 30. I increased MaxRequestWorkers from 400 (default) to 750.
I also increased the number of aiohttpd workers for the pulpcore-content service from 2 to 6. This measure was more of a wild guess based on aiohttpd documentation and I don't know if it is relevant for the content serving in Pulp.
After these measures the performance was a lot better. Not quite as good as pulp2 and I still get an occasional failed puppetrun because of dnf, but it's pretty good, good enough. I was under some time pressure to get our pulp3 environment up and running and therefore I introduced both measures at the same time, so unfortunately I can't say if the increase in content workers had any effect. The Apache errors are gone though which i guess makes quite a difference.
I was looking for any information about Pulp3 content serving tuning or any existing issues regarding this, but found nothing. But some pointers in the documentation about how to tune the content serving would be really nice.
Using rpm based installation on RHEL8 with
httpd-2.4.37-30.module+el8.3.0+7001+0766b9e7.x86_64
python3-pulp-rpm-3.7.0-1.el8.noarch
python3-pulpcore-3.7.3-1.el8.noarch
Related issues
Updated by adam.winberg@smhi.se almost 4 years ago
I got rid of the Apache errors regarding ServerLimit and MaxRequestWorkers, but I still get occasional errors:
AH01102: error reading status line from remote server 127.0.0.1:24816
I tested increasing my gunicorn/aiohttp workers even more, but with no difference.
I then googled into this aiohttp issue: https://github.com/aio-libs/aiohttp/issues/2687
where one user with similar problems solved this by adding 'disablereuse=on' to the ProxyPass directive in Apache. I tried it and it seems to work for me as well. Since I proxypass to localhost I don't think this parameter has any significant performance impact, none that I have noticed anyway.
All in all the performance is quite acceptable for me now.
Updated by dalley almost 4 years ago
I did a little bit of load testing using Locust.io. Here was my setup.
Install the "locust" package and run the following script with "locust -f $script_name". Connect to the WebUI from your browser with the address "127.0.0.1:8089", using the appropriate address if you're running it from the VM. I did not use the VM.
import random
from urllib.parse import urljoin
from collections import namedtuple
from gettext import gettext as _
from locust import HttpUser, task, between
REPO_LOCATION = "http://pulp2.dev/pulp/isos/file20k/"
REPO_LOCATION = "http://pulp2-nightly-pulp3-source-centos7/pulp/content/file20k/"
# LOCAL_REPO_PATH = "/pulp/isos/file20k/" # Pulp 2
# LOCAL_REPO_PATH = "/pulp/content/file20k/" # Pulp 3
Line = namedtuple("Line", ("number", "content"))
class Entry:
"""
Manifest entry.
Format: <relative_path>,<digest>,<size>.
Lines beginning with `#` are ignored.
Attributes:
relative_path (str): A relative path.
digest (str): The file sha256 hex digest.
size (int): The file size in bytes.
"""
def __init__(self, relative_path, size, digest):
"""
Create a new Entry.
Args:
relative_path (str): A relative path.
digest (str): The file sha256 hex digest.
size (int): The file size in bytes.
"""
self.relative_path = relative_path
self.digest = digest
self.size = size
@staticmethod
def parse(line):
"""
Parse the specified line from the manifest into an Entry.
Args:
line (Line): A line from the manifest.
Returns:
Entry: An entry.
Raises:
ValueError: on parsing error.
"""
part = [s.strip() for s in line.content.split(",")]
if len(part) != 3:
raise ValueError(
_("Error: manifest line:{n}: " "must be: <relative_path>,<digest>,<size>").format(
n=line.number
)
)
return Entry(relative_path=part[0], digest=part[1], size=int(part[2]))
def __str__(self):
"""
Returns a string representation of the Manifest Entry.
Returns:
str: format: "<relative_path>,<digest>,<size>"
"""
fields = [self.relative_path, self.digest]
if isinstance(self.size, int):
fields.append(str(self.size))
return ",".join(fields)
class Manifest:
"""
A file manifest.
Describes files contained within the directory.
Attributes:
relative_path (str): An relative path to the manifest.
"""
def __init__(self, relative_path):
"""
Create a new Manifest.
Args:
relative_path (str): An relative path to the manifest.
"""
self.relative_path = relative_path
@staticmethod
def parse(manifest_str):
"""
Parse a manifest string and yield entries.
Yields:
Entry: for each line.
"""
for n, line in enumerate(manifest_str.splitlines(), 1):
line = line.strip()
if not line:
continue
if line.startswith("#"):
continue
yield Entry.parse(Line(number=n, content=line))
class PulpLoadTestClient(HttpUser):
# wait_time = between(0.5, 3.0)
def on_start(self):
""" on_start is called when a Locust start before any task is scheduled """
pass
def on_stop(self):
""" on_stop is called when the TaskSet is stopping """
pass
@task(1)
def clone_file20k_repository(self):
response = self.client.get("PULP_MANIFEST")
entries = list(Manifest.parse(response.text))
random.shuffle(entries)
for entry in entries:
self.client.get(entry.relative_path)
The WebUI will ask you what URL to test as well as how many workers to use and how fast to spawn them. Use the url of the repos hosted in Pulp 2 or 3, e.g. "http://pulp2.dev/pulp/isos/file20k/" "http://pulp3-source-fedora32.localhost.example.com/pulp/content/file20k/"
I used this repo specifically: https://fixtures.pulpproject.org/file-perf/
I don't want to overgeneralize from my hardware or from this one specific benchmark (which is likely a little pathological due to the tiny files), but I can definitely see that there is a very large gap between Pulp 2 and Pulp 3 content serving performance at the moment, in terms of throughput, latency, and CPU consumption.
I also confirmed that increasing the worker count helped, but there's probably some other things we can do in the code to improve the situation.
Updated by daviddavis almost 4 years ago
- Triaged changed from No to Yes
- Sprint set to Sprint 90
Updated by daviddavis almost 4 years ago
- Priority changed from Normal to High
- Severity changed from 2. Medium to 3. High
Updated by dalley almost 4 years ago
New version
import random
from urllib.parse import urljoin
from collections import namedtuple
from gettext import gettext as _
from locust import HttpUser, task, between
REPO_LOCATION = "http://pulp2.dev/pulp/isos/file20k/"
REPO_LOCATION = "http://pulp2-nightly-pulp3-source-centos7/pulp/content/file20k/"
# LOCAL_REPO_PATH = "/pulp/isos/file20k/" # Pulp 2
# LOCAL_REPO_PATH = "/pulp/content/file20k/" # Pulp 3
Line = namedtuple("Line", ("number", "content"))
class Entry:
"""
Manifest entry.
Format: <relative_path>,<digest>,<size>.
Lines beginning with `#` are ignored.
Attributes:
relative_path (str): A relative path.
digest (str): The file sha256 hex digest.
size (int): The file size in bytes.
"""
def __init__(self, relative_path, size, digest):
"""
Create a new Entry.
Args:
relative_path (str): A relative path.
digest (str): The file sha256 hex digest.
size (int): The file size in bytes.
"""
self.relative_path = relative_path
self.digest = digest
self.size = size
@staticmethod
def parse(line):
"""
Parse the specified line from the manifest into an Entry.
Args:
line (Line): A line from the manifest.
Returns:
Entry: An entry.
Raises:
ValueError: on parsing error.
"""
part = [s.strip() for s in line.content.split(",")]
if len(part) != 3:
raise ValueError(
_("Error: manifest line:{n}: " "must be: <relative_path>,<digest>,<size>").format(
n=line.number
)
)
return Entry(relative_path=part[0], digest=part[1], size=int(part[2]))
def __str__(self):
"""
Returns a string representation of the Manifest Entry.
Returns:
str: format: "<relative_path>,<digest>,<size>"
"""
fields = [self.relative_path, self.digest]
if isinstance(self.size, int):
fields.append(str(self.size))
return ",".join(fields)
class Manifest:
"""
A file manifest.
Describes files contained within the directory.
Attributes:
relative_path (str): An relative path to the manifest.
"""
def __init__(self, relative_path):
"""
Create a new Manifest.
Args:
relative_path (str): An relative path to the manifest.
"""
self.relative_path = relative_path
@staticmethod
def parse(manifest_str):
"""
Parse a manifest string and yield entries.
Yields:
Entry: for each line.
"""
for n, line in enumerate(manifest_str.splitlines(), 1):
line = line.strip()
if not line:
continue
if line.startswith("#"):
continue
yield Entry.parse(Line(number=n, content=line))
class PulpLoadTestClient(HttpUser):
# wait_time = between(0.5, 3.0)
def on_start(self):
""" on_start is called when a Locust start before any task is scheduled """
pass
def on_stop(self):
""" on_stop is called when the TaskSet is stopping """
pass
@task(1)
def emulate_rpm_repository(self, num_pkgs=None):
response = self.client.get("PULP_MANIFEST")
entries = list(Manifest.parse(response.text))
metadata_files = []
packages = []
for entry in entries:
if entry.relative_path.startswith("repodata/"):
metadata_files.append(entry.relative_path)
elif entry.relative_path.startswith("packages/"):
packages.append(entry.relative_path)
for metadata_file in metadata_files:
self.client.get(metadata_file.relative_path)
# is a random sample actually what we want? many clients will likely be requesting the same set of packages
packages = random.sample(packages, num_pkgs) if num_pkgs else packages
for pkg in random.sample(packages, num_pkgs):
self.client.get(pkg.relative_path)
Updated by dalley almost 4 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to dalley
Updated by dalley almost 4 years ago
- Related to Task #6928: Measure Pulp's ability to scale to high #s of client requests added
Updated by gerrod over 3 years ago
- Related to Task #8804: [EPIC] Use Redis to add caching abilities to Pulp added
Updated by dalley over 3 years ago
- Assignee changed from dalley to gerrod
Gerrod's caching PR will likely solve this problem for good. His PR gets some fantastic results in my initial testing.
Updated by dalley over 3 years ago
- Related to Task #8805: Cache the responses of the content app added
Updated by dalley over 3 years ago
A summary of the various changes we've made in the process of improving the content app
- Increase the default number of workers from 2 to 8 and made it configurable in the installer: https://github.com/pulp/pulp_installer/pull/530/
- Query optimization https://github.com/pulp/pulpcore/pull/1124
- Using async properly (+ more query optimization) https://github.com/pulp/pulpcore/pull/1116
All released presently
- Caching responses, to avoid making repeated database queries for commonly-requested files
Coming in 3.14
Updated by dalley over 3 years ago
- Status changed from ASSIGNED to NEW
- Assignee deleted (
gerrod) - Priority changed from High to Normal
- Sprint/Milestone deleted (
3.14.0) - Sprint deleted (
Sprint 99)
This issue is being repurposed to specifically refer to "fine-tuning" of various parameters, to determine the most optimal defaults. The caching changes and other performance improvements are tracked in other issues and they're moving forward soon.
Since the caching will improve performance significantly, the fine-tuning will be a little less important, so I'm dropping the prio.
Updated by pulpbot about 3 years ago
- Description updated (diff)
- Status changed from NEW to CLOSED - DUPLICATE