Project

Profile

Help

Issue #8180

Content serving performance tuning

Added by adam.winberg@smhi.se 3 months ago. Updated 6 days ago.

Status:
ASSIGNED
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 96
Quarter:

Description

We have been using a pulp2 installation serving yum repos to ~1200 clients. We have now moved to a pulp3 installation and when we first made the move client yum runs was very slow. I mean timeout-slow, grinding our puppet runs to a halt and forcing me to revert back to our pulp2 instance.

Pulp3 is behind a apache reverse proxy, and I'm seeing errors like:

 AH01102: error reading status line from remote server 127.0.0.1:24816

There is no shortage of cpu/ram on the server. Pulp is connected to a remote postgresql server, which was my first suspect but after quite a lot of troubleshooting and testing we concluded that the problem seemed to be the pulp server.

After further debugging I found additional error logs from Apache:

AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.

and

AH00484: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting

I'm using mpm_event in Apache where default value for ServerLimit is 16, I increased this to 30. I increased MaxRequestWorkers from 400 (default) to 750.

I also increased the number of aiohttpd workers for the pulpcore-content service from 2 to 6. This measure was more of a wild guess based on aiohttpd documentation and I don't know if it is relevant for the content serving in Pulp.

After these measures the performance was a lot better. Not quite as good as pulp2 and I still get an occasional failed puppetrun because of dnf, but it's pretty good, good enough. I was under some time pressure to get our pulp3 environment up and running and therefore I introduced both measures at the same time, so unfortunately I can't say if the increase in content workers had any effect. The Apache errors are gone though which i guess makes quite a difference.

I was looking for any information about Pulp3 content serving tuning or any existing issues regarding this, but found nothing. But some pointers in the documentation about how to tune the content serving would be really nice.

Using rpm based installation on RHEL8 with

httpd-2.4.37-30.module+el8.3.0+7001+0766b9e7.x86_64
python3-pulp-rpm-3.7.0-1.el8.noarch
python3-pulpcore-3.7.3-1.el8.noarch

Related issues

Related to Pulp - Task #6928: Measure Pulp's ability to scale to high #s of client requestsNEW

<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>

History

#1 Updated by adam.winberg@smhi.se 3 months ago

I got rid of the Apache errors regarding ServerLimit and MaxRequestWorkers, but I still get occasional errors:

AH01102: error reading status line from remote server 127.0.0.1:24816

I tested increasing my gunicorn/aiohttp workers even more, but with no difference.

I then googled into this aiohttp issue: https://github.com/aio-libs/aiohttp/issues/2687

where one user with similar problems solved this by adding 'disablereuse=on' to the ProxyPass directive in Apache. I tried it and it seems to work for me as well. Since I proxypass to localhost I don't think this parameter has any significant performance impact, none that I have noticed anyway.

All in all the performance is quite acceptable for me now.

#2 Updated by dalley 3 months ago

  • Project changed from RPM Support to Pulp

#3 Updated by dalley 3 months ago

I did a little bit of load testing using Locust.io. Here was my setup.

Install the "locust" package and run the following script with "locust -f $script_name". Connect to the WebUI from your browser with the address "127.0.0.1:8089", using the appropriate address if you're running it from the VM. I did not use the VM.

import random
from urllib.parse import urljoin
from collections import namedtuple
from gettext import gettext as _


from locust import HttpUser, task, between

REPO_LOCATION = "http://pulp2.dev/pulp/isos/file20k/"
REPO_LOCATION = "http://pulp2-nightly-pulp3-source-centos7/pulp/content/file20k/"

# LOCAL_REPO_PATH = "/pulp/isos/file20k/"    # Pulp 2
# LOCAL_REPO_PATH = "/pulp/content/file20k/"   # Pulp 3


Line = namedtuple("Line", ("number", "content"))


class Entry:
    """
    Manifest entry.

    Format: <relative_path>,<digest>,<size>.
    Lines beginning with `#` are ignored.

    Attributes:
        relative_path (str): A relative path.
        digest (str): The file sha256 hex digest.
        size (int): The file size in bytes.

    """

    def __init__(self, relative_path, size, digest):
        """
        Create a new Entry.

        Args:
            relative_path (str): A relative path.
            digest (str): The file sha256 hex digest.
            size (int): The file size in bytes.

        """
        self.relative_path = relative_path
        self.digest = digest
        self.size = size

    @staticmethod
    def parse(line):
        """
        Parse the specified line from the manifest into an Entry.

        Args:
            line (Line): A line from the manifest.

        Returns:
            Entry: An entry.

        Raises:
            ValueError: on parsing error.

        """
        part = [s.strip() for s in line.content.split(",")]
        if len(part) != 3:
            raise ValueError(
                _("Error: manifest line:{n}: " "must be: <relative_path>,<digest>,<size>").format(
                    n=line.number
                )
            )
        return Entry(relative_path=part[0], digest=part[1], size=int(part[2]))

    def __str__(self):
        """
        Returns a string representation of the Manifest Entry.

        Returns:
            str: format: "<relative_path>,<digest>,<size>"

        """
        fields = [self.relative_path, self.digest]
        if isinstance(self.size, int):
            fields.append(str(self.size))
        return ",".join(fields)


class Manifest:
    """
    A file manifest.

    Describes files contained within the directory.

    Attributes:
        relative_path (str): An relative path to the manifest.

    """

    def __init__(self, relative_path):
        """
        Create a new Manifest.

        Args:
            relative_path (str): An relative path to the manifest.

        """
        self.relative_path = relative_path

    @staticmethod
    def parse(manifest_str):
        """
        Parse a manifest string and yield entries.

        Yields:
            Entry: for each line.

        """
        for n, line in enumerate(manifest_str.splitlines(), 1):
            line = line.strip()
            if not line:
                continue
            if line.startswith("#"):
                continue
            yield Entry.parse(Line(number=n, content=line))


class PulpLoadTestClient(HttpUser):
    # wait_time = between(0.5, 3.0)

    def on_start(self):
        """ on_start is called when a Locust start before any task is scheduled """
        pass

    def on_stop(self):
        """ on_stop is called when the TaskSet is stopping """
        pass

    @task(1)
    def clone_file20k_repository(self):
        response = self.client.get("PULP_MANIFEST")

        entries = list(Manifest.parse(response.text))

        random.shuffle(entries)

        for entry in entries:
            self.client.get(entry.relative_path)

The WebUI will ask you what URL to test as well as how many workers to use and how fast to spawn them. Use the url of the repos hosted in Pulp 2 or 3, e.g. "http://pulp2.dev/pulp/isos/file20k/" "http://pulp3-source-fedora32.localhost.example.com/pulp/content/file20k/"

I used this repo specifically: https://fixtures.pulpproject.org/file-perf/

I don't want to overgeneralize from my hardware or from this one specific benchmark (which is likely a little pathological due to the tiny files), but I can definitely see that there is a very large gap between Pulp 2 and Pulp 3 content serving performance at the moment, in terms of throughput, latency, and CPU consumption.

I also confirmed that increasing the worker count helped, but there's probably some other things we can do in the code to improve the situation.

#4 Updated by daviddavis 3 months ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 90

#5 Updated by daviddavis 3 months ago

  • Priority changed from Normal to High
  • Severity changed from 2. Medium to 3. High

#6 Updated by dalley 3 months ago

New version

import random
from urllib.parse import urljoin
from collections import namedtuple
from gettext import gettext as _


from locust import HttpUser, task, between

REPO_LOCATION = "http://pulp2.dev/pulp/isos/file20k/"
REPO_LOCATION = "http://pulp2-nightly-pulp3-source-centos7/pulp/content/file20k/"

# LOCAL_REPO_PATH = "/pulp/isos/file20k/"    # Pulp 2
# LOCAL_REPO_PATH = "/pulp/content/file20k/"   # Pulp 3


Line = namedtuple("Line", ("number", "content"))


class Entry:
    """
    Manifest entry.

    Format: <relative_path>,<digest>,<size>.
    Lines beginning with `#` are ignored.

    Attributes:
        relative_path (str): A relative path.
        digest (str): The file sha256 hex digest.
        size (int): The file size in bytes.

    """

    def __init__(self, relative_path, size, digest):
        """
        Create a new Entry.

        Args:
            relative_path (str): A relative path.
            digest (str): The file sha256 hex digest.
            size (int): The file size in bytes.

        """
        self.relative_path = relative_path
        self.digest = digest
        self.size = size

    @staticmethod
    def parse(line):
        """
        Parse the specified line from the manifest into an Entry.

        Args:
            line (Line): A line from the manifest.

        Returns:
            Entry: An entry.

        Raises:
            ValueError: on parsing error.

        """
        part = [s.strip() for s in line.content.split(",")]
        if len(part) != 3:
            raise ValueError(
                _("Error: manifest line:{n}: " "must be: <relative_path>,<digest>,<size>").format(
                    n=line.number
                )
            )
        return Entry(relative_path=part[0], digest=part[1], size=int(part[2]))

    def __str__(self):
        """
        Returns a string representation of the Manifest Entry.

        Returns:
            str: format: "<relative_path>,<digest>,<size>"

        """
        fields = [self.relative_path, self.digest]
        if isinstance(self.size, int):
            fields.append(str(self.size))
        return ",".join(fields)


class Manifest:
    """
    A file manifest.

    Describes files contained within the directory.

    Attributes:
        relative_path (str): An relative path to the manifest.

    """

    def __init__(self, relative_path):
        """
        Create a new Manifest.

        Args:
            relative_path (str): An relative path to the manifest.

        """
        self.relative_path = relative_path

    @staticmethod
    def parse(manifest_str):
        """
        Parse a manifest string and yield entries.

        Yields:
            Entry: for each line.

        """
        for n, line in enumerate(manifest_str.splitlines(), 1):
            line = line.strip()
            if not line:
                continue
            if line.startswith("#"):
                continue
            yield Entry.parse(Line(number=n, content=line))


class PulpLoadTestClient(HttpUser):
    # wait_time = between(0.5, 3.0)

    def on_start(self):
        """ on_start is called when a Locust start before any task is scheduled """
        pass

    def on_stop(self):
        """ on_stop is called when the TaskSet is stopping """
        pass

    @task(1)
    def emulate_rpm_repository(self, num_pkgs=None):
        response = self.client.get("PULP_MANIFEST")
        entries = list(Manifest.parse(response.text))

        metadata_files = []
        packages = []

        for entry in entries:
            if entry.relative_path.startswith("repodata/"):
                metadata_files.append(entry.relative_path)
            elif entry.relative_path.startswith("packages/"):
                packages.append(entry.relative_path)

        for metadata_file in metadata_files:
            self.client.get(metadata_file.relative_path)

        # is a random sample actually what we want? many clients will likely be requesting the same set of packages
        packages = random.sample(packages, num_pkgs) if num_pkgs else packages

        for pkg in random.sample(packages, num_pkgs):
            self.client.get(pkg.relative_path)

#7 Updated by dalley 3 months ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dalley

#8 Updated by rchan 3 months ago

  • Sprint changed from Sprint 90 to Sprint 91

#9 Updated by rchan 2 months ago

  • Sprint changed from Sprint 91 to Sprint 92

#10 Updated by rchan about 2 months ago

  • Sprint changed from Sprint 92 to Sprint 93

#11 Updated by rchan 29 days ago

  • Sprint changed from Sprint 93 to Sprint 94

#12 Updated by dalley 23 days ago

  • Related to Task #6928: Measure Pulp's ability to scale to high #s of client requests added

#13 Updated by rchan 19 days ago

  • Sprint changed from Sprint 94 to Sprint 95

#14 Updated by rchan 6 days ago

  • Sprint changed from Sprint 95 to Sprint 96

Please register to edit this issue

Also available in: Atom PDF