Project

Profile

Help

Issue #2621

Syncing an immediate repo with 'on_demand' overridden no longer populates the catalog

Added by jsherril@redhat.com 10 months ago. Updated 8 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Category:
-
Sprint/Milestone:
Severity:
3. High
Version:
Platform Release:
2.12.2
Blocks Release:
OS:
Backwards Incompatible:
No
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No

Description

Prior to https://github.com/pulp/pulp_rpm/pull/1000

the following steps:

pulp-admin rpm repo create --repo-id=justin3 --feed=https://copr-be.cloud.fedoraproject.org/results/jmracek/dnf-search/fedora-25-x86_64/ --download-policy=immediate
pulp-admin rpm repo sync run --repo-id=justin3

# locate the package dnf-2.1.0-1.git.16.54a97bb.fc25
find /var/lib/pulp/content/units/rpm | grep 54a97bb

# See if its in the catalog (it shouldn't be)
mongo pulp_database
db.lazy_content_catalog.find({"path": "/var/lib/pulp/content/units/rpm/b2/6de2a5eaec4d2637d6e022e6ca9b36a8a6ab9063036033a613017f7e979478/dnf-2.1.0-1.git.16.54a97bb.fc25.noarch.rpm"})

# reset revision #, to force a sync
mongo pulp_database --eval 'db.repo_importers.update({"scratchpad": {$ne: null}}, {$set: {"scratchpad.repomd_revision": null}}, {"multi":true})'

#set to on demand and resync
pulp-admin rpm repo update --download-policy=on_demand --repo-id=justin3
pulp-admin rpm repo sync run --repo-id=justin3

#check the content catalog again:
mongo pulp_database
db.lazy_content_catalog.find({"path": "/var/lib/pulp/content/units/rpm/b2/6de2a5eaec4d2637d6e022e6ca9b36a8a6ab9063036033a613017f7e979478/dnf-2.1.0-1.git.16.54a97bb.fc25.noarch.rpm"})

would result in the catalog being populated. Now it does not. This prevents us from verifying the checksums of all rpms on an immediate repo.

Associated revisions

Revision 8058ec8e View on GitHub
Added by jortel@redhat.com 9 months ago

Fix catalog management with unit already associated.
closes #2621

History

#2 Updated by jortel@redhat.com 10 months ago

This is the PR and change that broke this: https://github.com/pulp/pulp_rpm/pull/1000/files#diff-68fa85b69e45aa1c5b5a032166c5ad0dL134

This needs to be done unconditionally:

catalog.add(unit, wanted[unit.unit_key_as_named_tuple].download_path)

back

#3 Updated by jortel@redhat.com 10 months ago

The suggested fix will undo the performance gains provided by #2457 (which has 2 related bugzillas) by 50%.

#5 Updated by jortel@redhat.com 10 months ago

Long term, we should have the importer(s) managing catalog entries regardless of download policy.

The advantages are:

  • overhead is proportional to the units being added/removed.
  • users can more freely change policies.

The disadvantages:

  • Add small about of overhead to initial sync.
  • Customer not using lazy is storing data they don't need.

#6 Updated by bizhang 9 months ago

  • Sprint/Milestone set to Sprint 16
  • Triaged changed from No to Yes

#7 Updated by jortel@redhat.com 9 months ago

  • Sprint/Milestone deleted (Sprint 16)

#9 Updated by jortel@redhat.com 9 months ago

  • Sprint/Milestone set to Sprint 16

#10 Updated by ttereshc 9 months ago

I think to test the performance degradation, sync should be operational.
If it will be no-op, then it won't reach the modified code, the suggested change has no influence on no-op sync, only on the case when there are very few changes to the repo.

My suggestion for step 2 and step after the patch applied is to remove (aka unassiociate) any unit from the repo and only then sync.

#11 Updated by jortel@redhat.com 9 months ago

ttereshc wrote:

I think to test the performance degradation, sync should be operational.
If it will be no-op, then it won't reach the modified code, the suggested change has no influence on no-op sync, only on the case when there are very few changes to the repo.

My suggestion for step 2 and step after the patch applied is to remove (aka unassiociate) any unit from the repo and only then sync.

Agreed. I left out (or implied) the step of doing:

mongo pulp_database --eval 'db.repo_importers.update({"scratchpad": {$ne: null}}, {$set: {"scratchpad.repomd_revision": null}}, {"multi":true})'

so the importer will continue through the affected code path.

#12 Updated by jortel@redhat.com 9 months ago

f25

Created an f25 repository containing 50k RPMs.

Benchmarks:

7:45
7:55
7:41

With catalog management fixed1 (one of the two optimizations undone):

9:25
9:27
9:23

EL7

Created an EL7 repository containing 5k RPMs.

Benchmarks:

1:44
1:43
2:08

With catalog management fixed1 (one of the two optimizations undone):

2:10
1:57
1:55

*No real impact.


Notes:

Nothing had changed upstream but forced the importer to process the metadata anyway by setting the stored metadata version to NULL. In the real world, when nothing has changed upstream the metadata would be the same and the performance is 8 seconds. The difference introduced in this proposed change will be moot.

[1] https://github.com/pulp/pulp_rpm/pull/1038/commits/8058ec8ee34736e83243a8d5096b2caa7626dc61

#13 Updated by jortel@redhat.com 9 months ago

  • Status changed from NEW to POST
  • Assignee set to jortel@redhat.com

#14 Updated by jortel@redhat.com 9 months ago

  • Status changed from POST to MODIFIED

#15 Updated by bizhang 9 months ago

  • Platform Release set to 2.12.2

#16 Updated by bizhang 9 months ago

  • Status changed from MODIFIED to ON_QA

#17 Updated by bizhang 8 months ago

  • Status changed from ON_QA to CLOSED - CURRENTRELEASE

Please register to edit this issue

Also available in: Atom PDF