Project

Profile

Help

Issue #8864

closed

Workers go OOM while trying to sync RHEL 7

Added by ehelms@redhat.com over 3 years ago. Updated over 3 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 98
Quarter:

Description

Setup:

  • CentOS 7 Vagrant box
  • Memory: 8GB
  • Running Katello w/ Pulp 3.11
  • 2 workers present

When I attempt to sync RHEL 7 Server x86_64, it fails every time with Pulp workers going OOM. At the time when I initiate the sync there is 3.5GB of available memory on the VM.

Actions #1

Updated by dkliban@redhat.com over 3 years ago

  • Project changed from Pulp to RPM Support
Actions #2

Updated by dalley over 3 years ago

@Eric 3.5 gb isn't enough for syncing large repositories. The RHEL 7 metadata is 800-900 mb when compressed, but reading it requires decompressing it, and it inflates to roughly more than 4gb. There's some overhead on top of that, so Pulp's memory consumption when syncing RHEL 7 is somewhere around 4.3gb.

We definitely recommend having more than 8gb available if large repositories are going to be synced. I think the Satellite requirement is >20gb? Unfortunately there's really no way to fix this due to the way createrepo_c works.

Actions #3

Updated by dalley over 3 years ago

I guess I can mention this: as a hobby project, I've been working on a Rust library for parsing and writing RPM metadata, and one of my design decisions is to avoid this problem by allowing packages to be streamed from the metadata one-by-one without having everything in memory at the same time.

Technically speaking, Pulp 2 rolled its own metadata manipulation code, so it's not totally unprecedented, but it's not a great idea for Pulp 3 at the present time IMO. createrepo_c is "official" and we get a ton of benefit from piggybacking off of their work and not needing to implement every new feature ourselves. The bus factor of a big complex self-maintained library written by one person in a completely different language (it would have Python bindings) is quite bad.

It's something we could only ever consider using if and only if the team / product as a whole thought the benefits (reducing memory consumption to near-zero) was worth the associated long-term maintenance burden. I'm not convinced that it would be, but I will throw the idea out there for completeness.

Actions #4

Updated by dalley over 3 years ago

  • Status changed from NEW to CLOSED - NOTABUG
Actions #5

Updated by dalley over 3 years ago

  • Status changed from CLOSED - NOTABUG to NEW

So this is going to be a problem, not necessarily regarding one sync, but with many at once. I'm re-opening this because we may need to make urgent changes in this direction.

The best short-term option would not be what was described above in note 3, but porting over the other.xml and potentially filelists.xml parsing code from Pulp 2, which apparently does do iterative parsing. We would still use createrepo_c for parsing primary.xml because it is the most complex by far, and not so large.

Actions #6

Updated by dalley over 3 years ago

  • Assignee set to dalley
  • Priority changed from Normal to High
  • Severity changed from 2. Medium to 3. High
  • Triaged changed from No to Yes
  • Sprint set to Sprint 98

This seems workable.

Long-term, we need to fix this upstream in createrepo_c

Actions #7

Updated by ehelms@redhat.com over 3 years ago

After bumping the memory by 4GB on the same setup I encounter the same
issue. I could try going higher to find an upper limit (if one exists). Are
there memory requirement guidelines available ?

On Fri, Jun 11, 2021, 4:33 PM Pulp wrote:

Actions #8

Updated by dalley over 3 years ago

ehelms it has been impressed upon us that we need to fix this :)

We're working on it https://github.com/pulp/pulp_rpm/pull/2016

I haven't run any tests with Pulp but outside of Pulp it used about 25x less RAM.

I'm not moving to POST yet because we need to have some discussions between Pulp / Katello / createrepo_c first. But ultimately I think this will end up being merged until we can get createrepo_c into a more usable state, because the current state is not going to be acceptable for 6.10, and I don't think we can get createrepo_c ready in time either.

Added by dalley over 3 years ago

Revision ca7a599f | View on GitHub

Port Pulp 2 code to iteratively parse other.xml and filelists.xml

closes: #8864 https://pulp.plan.io/issues/8864

Actions #9

Updated by dalley over 3 years ago

  • Status changed from NEW to MODIFIED
Actions #10

Updated by pulpbot over 3 years ago

  • Sprint/Milestone set to 3.13.0
Actions #11

Updated by pulpbot over 3 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF