Issue #2621
closedSyncing an immediate repo with 'on_demand' overridden no longer populates the catalog
Description
Prior to https://github.com/pulp/pulp_rpm/pull/1000
the following steps:
pulp-admin rpm repo create --repo-id=justin3 --feed=https://copr-be.cloud.fedoraproject.org/results/jmracek/dnf-search/fedora-25-x86_64/ --download-policy=immediate
pulp-admin rpm repo sync run --repo-id=justin3
# locate the package dnf-2.1.0-1.git.16.54a97bb.fc25
find /var/lib/pulp/content/units/rpm | grep 54a97bb
# See if its in the catalog (it shouldn't be)
mongo pulp_database
db.lazy_content_catalog.find({"path": "/var/lib/pulp/content/units/rpm/b2/6de2a5eaec4d2637d6e022e6ca9b36a8a6ab9063036033a613017f7e979478/dnf-2.1.0-1.git.16.54a97bb.fc25.noarch.rpm"})
# reset revision #, to force a sync
mongo pulp_database --eval 'db.repo_importers.update({"scratchpad": {$ne: null}}, {$set: {"scratchpad.repomd_revision": null}}, {"multi":true})'
#set to on demand and resync
pulp-admin rpm repo update --download-policy=on_demand --repo-id=justin3
pulp-admin rpm repo sync run --repo-id=justin3
#check the content catalog again:
mongo pulp_database
db.lazy_content_catalog.find({"path": "/var/lib/pulp/content/units/rpm/b2/6de2a5eaec4d2637d6e022e6ca9b36a8a6ab9063036033a613017f7e979478/dnf-2.1.0-1.git.16.54a97bb.fc25.noarch.rpm"})
would result in the catalog being populated. Now it does not. This prevents us from verifying the checksums of all rpms on an immediate repo.
Updated by jortel@redhat.com about 7 years ago
This is the PR and change that broke this: https://github.com/pulp/pulp_rpm/pull/1000/files#diff-68fa85b69e45aa1c5b5a032166c5ad0dL134
This needs to be done unconditionally:
catalog.add(unit, wanted[unit.unit_key_as_named_tuple].download_path)
back
Updated by jortel@redhat.com about 7 years ago
The suggested fix will undo the performance gains provided by #2457 (which has 2 related bugzillas) by 50%.
Updated by jortel@redhat.com about 7 years ago
Long term, we should have the importer(s) managing catalog entries regardless of download policy.
The advantages are:
- overhead is proportional to the units being added/removed.
- users can more freely change policies.
The disadvantages:
- Add small about of overhead to initial sync.
- Customer not using lazy is storing data they don't need.
Updated by bizhang about 7 years ago
- Sprint/Milestone set to 34
- Triaged changed from No to Yes
Updated by ttereshc about 7 years ago
I think to test the performance degradation, sync should be operational.
If it will be no-op, then it won't reach the modified code, the suggested change has no influence on no-op sync, only on the case when there are very few changes to the repo.
My suggestion for step 2 and step after the patch applied is to remove (aka unassiociate) any unit from the repo and only then sync.
Updated by jortel@redhat.com about 7 years ago
ttereshc wrote:
I think to test the performance degradation, sync should be operational.
If it will be no-op, then it won't reach the modified code, the suggested change has no influence on no-op sync, only on the case when there are very few changes to the repo.My suggestion for step 2 and step after the patch applied is to remove (aka unassiociate) any unit from the repo and only then sync.
Agreed. I left out (or implied) the step of doing:
mongo pulp_database --eval 'db.repo_importers.update({"scratchpad": {$ne: null}}, {$set: {"scratchpad.repomd_revision": null}}, {"multi":true})'
so the importer will continue through the affected code path.
Updated by jortel@redhat.com about 7 years ago
f25¶
Created an f25 repository containing 50k RPMs.
Benchmarks:
7:45
7:55
7:41
With catalog management fixed[1] (one of the two optimizations undone):
9:25
9:27
9:23
EL7¶
Created an EL7 repository containing 5k RPMs.
Benchmarks:
1:44
1:43
2:08
With catalog management fixed[1] (one of the two optimizations undone):
2:10
1:57
1:55
*No real impact.
Notes:
Nothing had changed upstream but forced the importer to process the metadata anyway by setting the stored metadata version to NULL. In the real world, when nothing has changed upstream the metadata would be the same and the performance is 8 seconds. The difference introduced in this proposed change will be moot.
[1] https://github.com/pulp/pulp_rpm/pull/1038/commits/8058ec8ee34736e83243a8d5096b2caa7626dc61
Added by jortel@redhat.com about 7 years ago
Updated by jortel@redhat.com about 7 years ago
- Status changed from NEW to POST
- Assignee set to jortel@redhat.com
Updated by jortel@redhat.com about 7 years ago
- Status changed from POST to MODIFIED
Applied in changeset 8058ec8ee34736e83243a8d5096b2caa7626dc61.
Updated by bizhang almost 7 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Fix catalog management with unit already associated. closes #2621