Issue #2621
closed
Syncing an immediate repo with 'on_demand' overridden no longer populates the catalog
Status:
CLOSED - CURRENTRELEASE
Description
Prior to https://github.com/pulp/pulp_rpm/pull/1000
the following steps:
pulp-admin rpm repo create --repo-id=justin3 --feed=https://copr-be.cloud.fedoraproject.org/results/jmracek/dnf-search/fedora-25-x86_64/ --download-policy=immediate
pulp-admin rpm repo sync run --repo-id=justin3
# locate the package dnf-2.1.0-1.git.16.54a97bb.fc25
find /var/lib/pulp/content/units/rpm | grep 54a97bb
# See if its in the catalog (it shouldn't be)
mongo pulp_database
db.lazy_content_catalog.find({"path": "/var/lib/pulp/content/units/rpm/b2/6de2a5eaec4d2637d6e022e6ca9b36a8a6ab9063036033a613017f7e979478/dnf-2.1.0-1.git.16.54a97bb.fc25.noarch.rpm"})
# reset revision #, to force a sync
mongo pulp_database --eval 'db.repo_importers.update({"scratchpad": {$ne: null}}, {$set: {"scratchpad.repomd_revision": null}}, {"multi":true})'
#set to on demand and resync
pulp-admin rpm repo update --download-policy=on_demand --repo-id=justin3
pulp-admin rpm repo sync run --repo-id=justin3
#check the content catalog again:
mongo pulp_database
db.lazy_content_catalog.find({"path": "/var/lib/pulp/content/units/rpm/b2/6de2a5eaec4d2637d6e022e6ca9b36a8a6ab9063036033a613017f7e979478/dnf-2.1.0-1.git.16.54a97bb.fc25.noarch.rpm"})
would result in the catalog being populated. Now it does not. This prevents us from verifying the checksums of all rpms on an immediate repo.
The suggested fix will undo the performance gains provided by #2457 (which has 2 related bugzillas) by 50%.
Long term, we should have the importer(s) managing catalog entries regardless of download policy.
The advantages are:
- overhead is proportional to the units being added/removed.
- users can more freely change policies.
The disadvantages:
- Add small about of overhead to initial sync.
- Customer not using lazy is storing data they don't need.
- Sprint/Milestone set to 34
- Triaged changed from No to Yes
- Sprint/Milestone deleted (
34)
- Sprint/Milestone set to 34
I think to test the performance degradation, sync should be operational.
If it will be no-op, then it won't reach the modified code, the suggested change has no influence on no-op sync, only on the case when there are very few changes to the repo.
My suggestion for step 2 and step after the patch applied is to remove (aka unassiociate) any unit from the repo and only then sync.
ttereshc wrote:
I think to test the performance degradation, sync should be operational.
If it will be no-op, then it won't reach the modified code, the suggested change has no influence on no-op sync, only on the case when there are very few changes to the repo.
My suggestion for step 2 and step after the patch applied is to remove (aka unassiociate) any unit from the repo and only then sync.
Agreed. I left out (or implied) the step of doing:
mongo pulp_database --eval 'db.repo_importers.update({"scratchpad": {$ne: null}}, {$set: {"scratchpad.repomd_revision": null}}, {"multi":true})'
so the importer will continue through the affected code path.
f25¶
Created an f25 repository containing 50k RPMs.
Benchmarks:
7:45
7:55
7:41
With catalog management fixed[1] (one of the two optimizations undone):
9:25
9:27
9:23
EL7¶
Created an EL7 repository containing 5k RPMs.
Benchmarks:
1:44
1:43
2:08
With catalog management fixed[1] (one of the two optimizations undone):
2:10
1:57
1:55
*No real impact.
Notes:
Nothing had changed upstream but forced the importer to process the metadata anyway by setting the stored metadata version to NULL. In the real world, when nothing has changed upstream the metadata would be the same and the performance is 8 seconds. The difference introduced in this proposed change will be moot.
[1] https://github.com/pulp/pulp_rpm/pull/1038/commits/8058ec8ee34736e83243a8d5096b2caa7626dc61
- Status changed from NEW to POST
- Assignee set to jortel@redhat.com
- Status changed from POST to MODIFIED
- Platform Release set to 2.12.2
- Status changed from MODIFIED to 5
- Status changed from 5 to CLOSED - CURRENTRELEASE
- Sprint/Milestone deleted (
34)
Also available in: Atom
PDF
Fix catalog management with unit already associated. closes #2621