Issue #8864
closedWorkers go OOM while trying to sync RHEL 7
Description
Setup:
- CentOS 7 Vagrant box
- Memory: 8GB
- Running Katello w/ Pulp 3.11
- 2 workers present
When I attempt to sync RHEL 7 Server x86_64, it fails every time with Pulp workers going OOM. At the time when I initiate the sync there is 3.5GB of available memory on the VM.
Updated by dkliban@redhat.com over 3 years ago
- Project changed from Pulp to RPM Support
Updated by dalley over 3 years ago
@Eric 3.5 gb isn't enough for syncing large repositories. The RHEL 7 metadata is 800-900 mb when compressed, but reading it requires decompressing it, and it inflates to roughly more than 4gb. There's some overhead on top of that, so Pulp's memory consumption when syncing RHEL 7 is somewhere around 4.3gb.
We definitely recommend having more than 8gb available if large repositories are going to be synced. I think the Satellite requirement is >20gb? Unfortunately there's really no way to fix this due to the way createrepo_c works.
Updated by dalley over 3 years ago
I guess I can mention this: as a hobby project, I've been working on a Rust library for parsing and writing RPM metadata, and one of my design decisions is to avoid this problem by allowing packages to be streamed from the metadata one-by-one without having everything in memory at the same time.
Technically speaking, Pulp 2 rolled its own metadata manipulation code, so it's not totally unprecedented, but it's not a great idea for Pulp 3 at the present time IMO. createrepo_c is "official" and we get a ton of benefit from piggybacking off of their work and not needing to implement every new feature ourselves. The bus factor of a big complex self-maintained library written by one person in a completely different language (it would have Python bindings) is quite bad.
It's something we could only ever consider using if and only if the team / product as a whole thought the benefits (reducing memory consumption to near-zero) was worth the associated long-term maintenance burden. I'm not convinced that it would be, but I will throw the idea out there for completeness.
Updated by dalley over 3 years ago
- Status changed from CLOSED - NOTABUG to NEW
So this is going to be a problem, not necessarily regarding one sync, but with many at once. I'm re-opening this because we may need to make urgent changes in this direction.
The best short-term option would not be what was described above in note 3, but porting over the other.xml and potentially filelists.xml parsing code from Pulp 2, which apparently does do iterative parsing. We would still use createrepo_c for parsing primary.xml because it is the most complex by far, and not so large.
Updated by dalley over 3 years ago
- Assignee set to dalley
- Priority changed from Normal to High
- Severity changed from 2. Medium to 3. High
- Triaged changed from No to Yes
- Sprint set to Sprint 98
This seems workable.
Long-term, we need to fix this upstream in createrepo_c
Updated by ehelms@redhat.com over 3 years ago
After bumping the memory by 4GB on the same setup I encounter the same
issue. I could try going higher to find an upper limit (if one exists). Are
there memory requirement guidelines available ?
On Fri, Jun 11, 2021, 4:33 PM Pulp notifications@plan.io wrote:
Updated by dalley over 3 years ago
ehelms it has been impressed upon us that we need to fix this :)
We're working on it https://github.com/pulp/pulp_rpm/pull/2016
I haven't run any tests with Pulp but outside of Pulp it used about 25x less RAM.
I'm not moving to POST yet because we need to have some discussions between Pulp / Katello / createrepo_c first. But ultimately I think this will end up being merged until we can get createrepo_c into a more usable state, because the current state is not going to be acceptable for 6.10, and I don't think we can get createrepo_c ready in time either.
Added by dalley over 3 years ago
Updated by dalley over 3 years ago
- Status changed from NEW to MODIFIED
Applied in changeset ca7a599fb1bfa3efd5ef4ff34e80626cf813aadc.
Updated by pulpbot over 3 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Port Pulp 2 code to iteratively parse other.xml and filelists.xml
closes: #8864 https://pulp.plan.io/issues/8864