Project

Profile

Help

Issue #2621

closed

Syncing an immediate repo with 'on_demand' overridden no longer populates the catalog

Added by jsherril@redhat.com about 7 years ago. Updated almost 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Platform Release:
2.12.2
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 16
Quarter:

Description

Prior to https://github.com/pulp/pulp_rpm/pull/1000

the following steps:

pulp-admin rpm repo create --repo-id=justin3 --feed=https://copr-be.cloud.fedoraproject.org/results/jmracek/dnf-search/fedora-25-x86_64/ --download-policy=immediate
pulp-admin rpm repo sync run --repo-id=justin3

# locate the package dnf-2.1.0-1.git.16.54a97bb.fc25
find /var/lib/pulp/content/units/rpm | grep 54a97bb

# See if its in the catalog (it shouldn't be)
mongo pulp_database
db.lazy_content_catalog.find({"path": "/var/lib/pulp/content/units/rpm/b2/6de2a5eaec4d2637d6e022e6ca9b36a8a6ab9063036033a613017f7e979478/dnf-2.1.0-1.git.16.54a97bb.fc25.noarch.rpm"})

# reset revision #, to force a sync
mongo pulp_database --eval 'db.repo_importers.update({"scratchpad": {$ne: null}}, {$set: {"scratchpad.repomd_revision": null}}, {"multi":true})'

#set to on demand and resync
pulp-admin rpm repo update --download-policy=on_demand --repo-id=justin3
pulp-admin rpm repo sync run --repo-id=justin3

#check the content catalog again:
mongo pulp_database
db.lazy_content_catalog.find({"path": "/var/lib/pulp/content/units/rpm/b2/6de2a5eaec4d2637d6e022e6ca9b36a8a6ab9063036033a613017f7e979478/dnf-2.1.0-1.git.16.54a97bb.fc25.noarch.rpm"})

would result in the catalog being populated. Now it does not. This prevents us from verifying the checksums of all rpms on an immediate repo.

Actions #2

Updated by jortel@redhat.com about 7 years ago

This is the PR and change that broke this: https://github.com/pulp/pulp_rpm/pull/1000/files#diff-68fa85b69e45aa1c5b5a032166c5ad0dL134

This needs to be done unconditionally:

catalog.add(unit, wanted[unit.unit_key_as_named_tuple].download_path)

back

Actions #3

Updated by jortel@redhat.com about 7 years ago

The suggested fix will undo the performance gains provided by #2457 (which has 2 related bugzillas) by 50%.

Actions #5

Updated by jortel@redhat.com about 7 years ago

Long term, we should have the importer(s) managing catalog entries regardless of download policy.

The advantages are:

  • overhead is proportional to the units being added/removed.
  • users can more freely change policies.

The disadvantages:

  • Add small about of overhead to initial sync.
  • Customer not using lazy is storing data they don't need.
Actions #6

Updated by bizhang about 7 years ago

  • Sprint/Milestone set to 34
  • Triaged changed from No to Yes
Actions #7

Updated by jortel@redhat.com about 7 years ago

  • Sprint/Milestone deleted (34)
Actions #9

Updated by jortel@redhat.com about 7 years ago

  • Sprint/Milestone set to 34
Actions #10

Updated by ttereshc about 7 years ago

I think to test the performance degradation, sync should be operational.
If it will be no-op, then it won't reach the modified code, the suggested change has no influence on no-op sync, only on the case when there are very few changes to the repo.

My suggestion for step 2 and step after the patch applied is to remove (aka unassiociate) any unit from the repo and only then sync.

Actions #11

Updated by jortel@redhat.com about 7 years ago

ttereshc wrote:

I think to test the performance degradation, sync should be operational.
If it will be no-op, then it won't reach the modified code, the suggested change has no influence on no-op sync, only on the case when there are very few changes to the repo.

My suggestion for step 2 and step after the patch applied is to remove (aka unassiociate) any unit from the repo and only then sync.

Agreed. I left out (or implied) the step of doing:

mongo pulp_database --eval 'db.repo_importers.update({"scratchpad": {$ne: null}}, {$set: {"scratchpad.repomd_revision": null}}, {"multi":true})'

so the importer will continue through the affected code path.

Actions #12

Updated by jortel@redhat.com about 7 years ago

f25

Created an f25 repository containing 50k RPMs.

Benchmarks:

7:45
7:55
7:41

With catalog management fixed[1] (one of the two optimizations undone):

9:25
9:27
9:23

EL7

Created an EL7 repository containing 5k RPMs.

Benchmarks:

1:44
1:43
2:08

With catalog management fixed[1] (one of the two optimizations undone):

2:10
1:57
1:55

*No real impact.

Notes:

Nothing had changed upstream but forced the importer to process the metadata anyway by setting the stored metadata version to NULL. In the real world, when nothing has changed upstream the metadata would be the same and the performance is 8 seconds. The difference introduced in this proposed change will be moot.

[1] https://github.com/pulp/pulp_rpm/pull/1038/commits/8058ec8ee34736e83243a8d5096b2caa7626dc61

Added by jortel@redhat.com about 7 years ago

Revision 8058ec8e | View on GitHub

Fix catalog management with unit already associated. closes #2621

Actions #13

Updated by jortel@redhat.com about 7 years ago

  • Status changed from NEW to POST
  • Assignee set to jortel@redhat.com
Actions #14

Updated by jortel@redhat.com about 7 years ago

  • Status changed from POST to MODIFIED
Actions #15

Updated by bizhang almost 7 years ago

  • Platform Release set to 2.12.2
Actions #16

Updated by bizhang almost 7 years ago

  • Status changed from MODIFIED to 5
Actions #17

Updated by bizhang almost 7 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #18

Updated by bmbouter about 6 years ago

  • Sprint set to Sprint 16
Actions #19

Updated by bmbouter about 6 years ago

  • Sprint/Milestone deleted (34)
Actions #20

Updated by bmbouter almost 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF