Issue #1467
closedRepository unit counts are not always accurate when the plugin authors forget to update them
Description
It is currently the responsibility of plugin authors to update the unit counts in repositories when units are added or removed from repositories. This is precarious as not all plugin authors are aware of this responsibility. It would be much easier and result in greater correctness if the platform made the call to update unit counts after all operations that might alter them. These operations should make the call on the plugin's behalf:
- import_unit
- remove_unit
- upload_unit
- sync
We should fix this before releasing 2.8, as some of the plugins no longer use the conduit code that used to do this for them.
Related issues
Updated by mhrivnak about 9 years ago
- Triaged changed from No to Yes
Copy already does it, but the other three operations do not.
Updated by bmbouter about 9 years ago
Does it make any sense to implement this as a save() or post_save() callback on RepositoryContentUnit itself? New saves vs updates could be handled accordingly. We could also hook delete(). We may loose a little bit of performance doing it unit by unit, but the update can be done efficiently such as with the update_unit_count() method. This would take us closer to an object oriented design.
Updated by rbarlow about 9 years ago
Pulp wrote:
Does it make any sense to implement this as a save() or post_save()
callback on RepositoryContentUnit itself? New saves vs updates could be
handled accordingly. We could also hook delete(). We may loose a little
bit of performance doing it unit by unit, but the update can be done
efficiently such as with the update_unit_count() method. This would take
us closer to an object oriented design.
Unfortunately, save()/delete() are only called when individual units are
saved/deleted, not when collection-wide update queries are called. I'm
not sure what kinds of queries are performed in all cases, but if any
bulk-insert queries are used this wouldn't work. In general, it's not
great to rely on save and delete in this way for models with Mongoengine
(or Django either).
--
Randy Barlow
irc: bowlofeggs
Updated by bmbouter about 9 years ago
rbarlow wrote:
Unfortunately, save()/delete() are only called when individual units are
saved/deleted, not when collection-wide update queries are called. I'm
not sure what kinds of queries are performed in all cases, but if any
bulk-insert queries are used this wouldn't work. In general, it's not
great to rely on save and delete in this way for models with Mongoengine
(or Django either).--
Randy Barlow
irc: bowlofeggs
To clarify, I'm proposing these save()/delete() hooks be done on the RepositoryContentUnit model which effectively is the association of units to a repo. I can't think of any bulk operation that wouldn't operate on RepositoryContentUnit and cause save()/delete() operations. The benefit of this approach is reusability. Future code could work with the RepositoryContentUnit model directly and receive the correct behaviors instead of using import_unit, remove_unit, delete_unit.
I want to check in on my understanding here. Is it correct that the content counts of types for a repo are stored on the repo for performance reasons only? Otherwise the RepositoryContentUnit collection itself would provide these counts with queries, right?
Also aren't import_unit and remove_unit the only two places where counts are modified? Shouldn't sync() associate and disassociate units using import_unit or remove_unit? Similarly for upload_unit, should that just call import_unit at some point? FYI, I'm not very familiar with the plugin API.
Updated by rbarlow about 9 years ago
bmbouter wrote:
To clarify, I'm proposing these save()/delete() hooks be done on the RepositoryContentUnit model which effectively is the association of units to a repo. I can't think of any bulk operation that wouldn't operate on RepositoryContentUnit and cause save()/delete() operations. The benefit of this approach is reusability. Future code could work with the RepositoryContentUnit model directly and receive the correct behaviors instead of using import_unit, remove_unit, delete_unit.
There's a downside too - instead of one query to update the count after a sync of 10,000 units, there will be 10,000 queries. I think that will be noticeable, especially with the new lazy feature.
I want to check in on my understanding here. Is it correct that the content counts of types for a repo are stored on the repo for performance reasons only? Otherwise the RepositoryContentUnit collection itself would provide these counts with queries, right?
Yes, it's a denormalization. We used to do a query, and I believe mhrivnak made the change to store it this way for rapid access.
Also aren't import_unit and remove_unit the only two places where counts are modified? Shouldn't sync() associate and disassociate units using import_unit or remove_unit? Similarly for upload_unit, should that just call import_unit at some point? FYI, I'm not very familiar with the plugin API.
No, import_units(), upload(), sync(), and remove_units() are all methods on an Importer, and each of them can perform bulk actions.
Updated by mhrivnak about 9 years ago
For now, I think we should stick to the mechanism we have. This issue is just about moving the call to that mechanism from one place to another, which is a big improvement, and very simple.
Changing how we do it is a worthy discussion, but a separate and more complex issue.
Updated by bmbouter about 9 years ago
mhrivnak wrote:
For now, I think we should stick to the mechanism we have. This issue is just about moving the call to that mechanism from one place to another, which is a big improvement, and very simple.
Changing how we do it is a worthy discussion, but a separate and more complex issue.
Agreed. Let's keep this to fixing just those specific calls. I'm convinced that the save()/delete() design won't work because of the performance reasons rbarlow mentioned.
Updated by pcreech about 9 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to pcreech
Added by pcreech about 9 years ago
Added by pcreech about 9 years ago
Revision cfc8878d | View on GitHub
Update Repository Units from platform
Updating repository units from platform, due to them not being reliably updated by plugins. The code to do this lives in platform, so we should be able to utilize it from there
Added by pcreech about 9 years ago
Revision d99c61f7 | View on GitHub
Update repository units from platform
Updating repository units from platform, due to them not being reliably updated by plugins. The code to do this lives in platform, so we should be able to utilize it from there
Added by pcreech about 9 years ago
Revision c18f0359 | View on GitHub
Update repository units from platform
Updating repository units from platform, due to them not being reliably updated by plugins. The code to do this lives in platform, so we should be able to utilize it from there
Updated by ipanova@redhat.com almost 9 years ago
- Related to Issue #1570: Reading an RPM repo returns incorrect rpm_unit_counts added
Updated by dkliban@redhat.com almost 9 years ago
- Status changed from MODIFIED to 5
Updated by dkliban@redhat.com almost 9 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Update Repository Units from platform
Updating repository units from platform, due to them not being reliably updated by plugins. The code to do this lives in platform, so we should be able to utilize it from there
https://pulp.plan.io/issues/1467 re #1467