Project

Profile

Help

Issue #8180

closed

Content serving performance tuning

Added by adam.winberg@smhi.se about 3 years ago. Updated over 2 years ago.

Status:
CLOSED - DUPLICATE
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Quarter:

Description

Ticket moved to GitHub: "pulp/pulpcore/1963":https://github.com/pulp/pulpcore/issues/1963


We have been using a pulp2 installation serving yum repos to ~1200 clients. We have now moved to a pulp3 installation and when we first made the move client yum runs was very slow. I mean timeout-slow, grinding our puppet runs to a halt and forcing me to revert back to our pulp2 instance.

Pulp3 is behind a apache reverse proxy, and I'm seeing errors like:

 AH01102: error reading status line from remote server 127.0.0.1:24816

There is no shortage of cpu/ram on the server. Pulp is connected to a remote postgresql server, which was my first suspect but after quite a lot of troubleshooting and testing we concluded that the problem seemed to be the pulp server.

After further debugging I found additional error logs from Apache:

AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.

and

AH00484: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting

I'm using mpm_event in Apache where default value for ServerLimit is 16, I increased this to 30. I increased MaxRequestWorkers from 400 (default) to 750.

I also increased the number of aiohttpd workers for the pulpcore-content service from 2 to 6. This measure was more of a wild guess based on aiohttpd documentation and I don't know if it is relevant for the content serving in Pulp.

After these measures the performance was a lot better. Not quite as good as pulp2 and I still get an occasional failed puppetrun because of dnf, but it's pretty good, good enough. I was under some time pressure to get our pulp3 environment up and running and therefore I introduced both measures at the same time, so unfortunately I can't say if the increase in content workers had any effect. The Apache errors are gone though which i guess makes quite a difference.

I was looking for any information about Pulp3 content serving tuning or any existing issues regarding this, but found nothing. But some pointers in the documentation about how to tune the content serving would be really nice.

Using rpm based installation on RHEL8 with

httpd-2.4.37-30.module+el8.3.0+7001+0766b9e7.x86_64
python3-pulp-rpm-3.7.0-1.el8.noarch
python3-pulpcore-3.7.3-1.el8.noarch

Related issues

Related to Pulp - Task #6928: Measure Pulp's ability to scale to high #s of client requestsCLOSED - DUPLICATE

Actions
Related to Pulp - Task #8804: [EPIC] Use Redis to add caching abilities to PulpCLOSED - DUPLICATE

Actions
Related to Pulp - Task #8805: Cache the responses of the content appCLOSED - CURRENTRELEASEgerrod

Actions
Actions #1

Updated by adam.winberg@smhi.se about 3 years ago

I got rid of the Apache errors regarding ServerLimit and MaxRequestWorkers, but I still get occasional errors:

AH01102: error reading status line from remote server 127.0.0.1:24816

I tested increasing my gunicorn/aiohttp workers even more, but with no difference.

I then googled into this aiohttp issue: https://github.com/aio-libs/aiohttp/issues/2687

where one user with similar problems solved this by adding 'disablereuse=on' to the ProxyPass directive in Apache. I tried it and it seems to work for me as well. Since I proxypass to localhost I don't think this parameter has any significant performance impact, none that I have noticed anyway.

All in all the performance is quite acceptable for me now.

Actions #2

Updated by dalley about 3 years ago

  • Project changed from RPM Support to Pulp
Actions #3

Updated by dalley about 3 years ago

I did a little bit of load testing using Locust.io. Here was my setup.

Install the "locust" package and run the following script with "locust -f $script_name". Connect to the WebUI from your browser with the address "127.0.0.1:8089", using the appropriate address if you're running it from the VM. I did not use the VM.

import random
from urllib.parse import urljoin
from collections import namedtuple
from gettext import gettext as _


from locust import HttpUser, task, between

REPO_LOCATION = "http://pulp2.dev/pulp/isos/file20k/"
REPO_LOCATION = "http://pulp2-nightly-pulp3-source-centos7/pulp/content/file20k/"

# LOCAL_REPO_PATH = "/pulp/isos/file20k/"    # Pulp 2
# LOCAL_REPO_PATH = "/pulp/content/file20k/"   # Pulp 3


Line = namedtuple("Line", ("number", "content"))


class Entry:
    """
    Manifest entry.

    Format: <relative_path>,<digest>,<size>.
    Lines beginning with `#` are ignored.

    Attributes:
        relative_path (str): A relative path.
        digest (str): The file sha256 hex digest.
        size (int): The file size in bytes.

    """

    def __init__(self, relative_path, size, digest):
        """
        Create a new Entry.

        Args:
            relative_path (str): A relative path.
            digest (str): The file sha256 hex digest.
            size (int): The file size in bytes.

        """
        self.relative_path = relative_path
        self.digest = digest
        self.size = size

    @staticmethod
    def parse(line):
        """
        Parse the specified line from the manifest into an Entry.

        Args:
            line (Line): A line from the manifest.

        Returns:
            Entry: An entry.

        Raises:
            ValueError: on parsing error.

        """
        part = [s.strip() for s in line.content.split(",")]
        if len(part) != 3:
            raise ValueError(
                _("Error: manifest line:{n}: " "must be: <relative_path>,<digest>,<size>").format(
                    n=line.number
                )
            )
        return Entry(relative_path=part[0], digest=part[1], size=int(part[2]))

    def __str__(self):
        """
        Returns a string representation of the Manifest Entry.

        Returns:
            str: format: "<relative_path>,<digest>,<size>"

        """
        fields = [self.relative_path, self.digest]
        if isinstance(self.size, int):
            fields.append(str(self.size))
        return ",".join(fields)


class Manifest:
    """
    A file manifest.

    Describes files contained within the directory.

    Attributes:
        relative_path (str): An relative path to the manifest.

    """

    def __init__(self, relative_path):
        """
        Create a new Manifest.

        Args:
            relative_path (str): An relative path to the manifest.

        """
        self.relative_path = relative_path

    @staticmethod
    def parse(manifest_str):
        """
        Parse a manifest string and yield entries.

        Yields:
            Entry: for each line.

        """
        for n, line in enumerate(manifest_str.splitlines(), 1):
            line = line.strip()
            if not line:
                continue
            if line.startswith("#"):
                continue
            yield Entry.parse(Line(number=n, content=line))


class PulpLoadTestClient(HttpUser):
    # wait_time = between(0.5, 3.0)

    def on_start(self):
        """ on_start is called when a Locust start before any task is scheduled """
        pass

    def on_stop(self):
        """ on_stop is called when the TaskSet is stopping """
        pass

    @task(1)
    def clone_file20k_repository(self):
        response = self.client.get("PULP_MANIFEST")

        entries = list(Manifest.parse(response.text))

        random.shuffle(entries)

        for entry in entries:
            self.client.get(entry.relative_path)

The WebUI will ask you what URL to test as well as how many workers to use and how fast to spawn them. Use the url of the repos hosted in Pulp 2 or 3, e.g. "http://pulp2.dev/pulp/isos/file20k/" "http://pulp3-source-fedora32.localhost.example.com/pulp/content/file20k/"

I used this repo specifically: https://fixtures.pulpproject.org/file-perf/

I don't want to overgeneralize from my hardware or from this one specific benchmark (which is likely a little pathological due to the tiny files), but I can definitely see that there is a very large gap between Pulp 2 and Pulp 3 content serving performance at the moment, in terms of throughput, latency, and CPU consumption.

I also confirmed that increasing the worker count helped, but there's probably some other things we can do in the code to improve the situation.

Actions #4

Updated by daviddavis about 3 years ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 90
Actions #5

Updated by daviddavis about 3 years ago

  • Priority changed from Normal to High
  • Severity changed from 2. Medium to 3. High
Actions #6

Updated by dalley about 3 years ago

New version

import random
from urllib.parse import urljoin
from collections import namedtuple
from gettext import gettext as _


from locust import HttpUser, task, between

REPO_LOCATION = "http://pulp2.dev/pulp/isos/file20k/"
REPO_LOCATION = "http://pulp2-nightly-pulp3-source-centos7/pulp/content/file20k/"

# LOCAL_REPO_PATH = "/pulp/isos/file20k/"    # Pulp 2
# LOCAL_REPO_PATH = "/pulp/content/file20k/"   # Pulp 3


Line = namedtuple("Line", ("number", "content"))


class Entry:
    """
    Manifest entry.

    Format: <relative_path>,<digest>,<size>.
    Lines beginning with `#` are ignored.

    Attributes:
        relative_path (str): A relative path.
        digest (str): The file sha256 hex digest.
        size (int): The file size in bytes.

    """

    def __init__(self, relative_path, size, digest):
        """
        Create a new Entry.

        Args:
            relative_path (str): A relative path.
            digest (str): The file sha256 hex digest.
            size (int): The file size in bytes.

        """
        self.relative_path = relative_path
        self.digest = digest
        self.size = size

    @staticmethod
    def parse(line):
        """
        Parse the specified line from the manifest into an Entry.

        Args:
            line (Line): A line from the manifest.

        Returns:
            Entry: An entry.

        Raises:
            ValueError: on parsing error.

        """
        part = [s.strip() for s in line.content.split(",")]
        if len(part) != 3:
            raise ValueError(
                _("Error: manifest line:{n}: " "must be: <relative_path>,<digest>,<size>").format(
                    n=line.number
                )
            )
        return Entry(relative_path=part[0], digest=part[1], size=int(part[2]))

    def __str__(self):
        """
        Returns a string representation of the Manifest Entry.

        Returns:
            str: format: "<relative_path>,<digest>,<size>"

        """
        fields = [self.relative_path, self.digest]
        if isinstance(self.size, int):
            fields.append(str(self.size))
        return ",".join(fields)


class Manifest:
    """
    A file manifest.

    Describes files contained within the directory.

    Attributes:
        relative_path (str): An relative path to the manifest.

    """

    def __init__(self, relative_path):
        """
        Create a new Manifest.

        Args:
            relative_path (str): An relative path to the manifest.

        """
        self.relative_path = relative_path

    @staticmethod
    def parse(manifest_str):
        """
        Parse a manifest string and yield entries.

        Yields:
            Entry: for each line.

        """
        for n, line in enumerate(manifest_str.splitlines(), 1):
            line = line.strip()
            if not line:
                continue
            if line.startswith("#"):
                continue
            yield Entry.parse(Line(number=n, content=line))


class PulpLoadTestClient(HttpUser):
    # wait_time = between(0.5, 3.0)

    def on_start(self):
        """ on_start is called when a Locust start before any task is scheduled """
        pass

    def on_stop(self):
        """ on_stop is called when the TaskSet is stopping """
        pass

    @task(1)
    def emulate_rpm_repository(self, num_pkgs=None):
        response = self.client.get("PULP_MANIFEST")
        entries = list(Manifest.parse(response.text))

        metadata_files = []
        packages = []

        for entry in entries:
            if entry.relative_path.startswith("repodata/"):
                metadata_files.append(entry.relative_path)
            elif entry.relative_path.startswith("packages/"):
                packages.append(entry.relative_path)

        for metadata_file in metadata_files:
            self.client.get(metadata_file.relative_path)

        # is a random sample actually what we want? many clients will likely be requesting the same set of packages
        packages = random.sample(packages, num_pkgs) if num_pkgs else packages

        for pkg in random.sample(packages, num_pkgs):
            self.client.get(pkg.relative_path)

Actions #7

Updated by dalley about 3 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dalley
Actions #8

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 90 to Sprint 91
Actions #9

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 91 to Sprint 92
Actions #10

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 92 to Sprint 93
Actions #11

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 93 to Sprint 94
Actions #12

Updated by dalley about 3 years ago

  • Related to Task #6928: Measure Pulp's ability to scale to high #s of client requests added
Actions #13

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 94 to Sprint 95
Actions #14

Updated by rchan almost 3 years ago

  • Sprint changed from Sprint 95 to Sprint 96
Actions #15

Updated by rchan almost 3 years ago

  • Sprint changed from Sprint 96 to Sprint 97
Actions #16

Updated by gerrod almost 3 years ago

  • Related to Task #8804: [EPIC] Use Redis to add caching abilities to Pulp added
Actions #17

Updated by rchan almost 3 years ago

  • Sprint changed from Sprint 97 to Sprint 98
Actions #18

Updated by dalley almost 3 years ago

  • Assignee changed from dalley to gerrod

Gerrod's caching PR will likely solve this problem for good. His PR gets some fantastic results in my initial testing.

Actions #19

Updated by dalley almost 3 years ago

  • Related to Task #8805: Cache the responses of the content app added
Actions #20

Updated by rchan almost 3 years ago

  • Sprint changed from Sprint 98 to Sprint 99
Actions #21

Updated by dalley almost 3 years ago

A summary of the various changes we've made in the process of improving the content app

All released presently

  • Caching responses, to avoid making repeated database queries for commonly-requested files

Coming in 3.14

Actions #22

Updated by dalley almost 3 years ago

  • Sprint/Milestone set to 3.14.0
Actions #23

Updated by dalley almost 3 years ago

  • Status changed from ASSIGNED to NEW
  • Assignee deleted (gerrod)
  • Priority changed from High to Normal
  • Sprint/Milestone deleted (3.14.0)
  • Sprint deleted (Sprint 99)

This issue is being repurposed to specifically refer to "fine-tuning" of various parameters, to determine the most optimal defaults. The caching changes and other performance improvements are tracked in other issues and they're moving forward soon.

Since the caching will improve performance significantly, the fine-tuning will be a little less important, so I'm dropping the prio.

Actions #24

Updated by dalley almost 3 years ago

  • Severity changed from 3. High to 2. Medium
Actions #25

Updated by pulpbot over 2 years ago

  • Description updated (diff)
  • Status changed from NEW to CLOSED - DUPLICATE

Also available in: Atom PDF