Project

Profile

Help

Issue #1467

closed

Repository unit counts are not always accurate when the plugin authors forget to update them

Added by rbarlow almost 9 years ago. Updated over 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Master
Platform Release:
2.8.0
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Easy Fix, Pulp 2
Sprint:
Quarter:

Description

It is currently the responsibility of plugin authors to update the unit counts in repositories when units are added or removed from repositories. This is precarious as not all plugin authors are aware of this responsibility. It would be much easier and result in greater correctness if the platform made the call to update unit counts after all operations that might alter them. These operations should make the call on the plugin's behalf:

  • import_unit
  • remove_unit
  • upload_unit
  • sync

We should fix this before releasing 2.8, as some of the plugins no longer use the conduit code that used to do this for them.


Related issues

Related to RPM Support - Issue #1570: Reading an RPM repo returns incorrect rpm_unit_countsCLOSED - CURRENTRELEASEttereshcActions
Actions #1

Updated by mhrivnak almost 9 years ago

  • Triaged changed from No to Yes
Actions #2

Updated by mhrivnak almost 9 years ago

Actions #3

Updated by bmbouter almost 9 years ago

Does it make any sense to implement this as a save() or post_save() callback on RepositoryContentUnit itself? New saves vs updates could be handled accordingly. We could also hook delete(). We may loose a little bit of performance doing it unit by unit, but the update can be done efficiently such as with the update_unit_count() method. This would take us closer to an object oriented design.

Actions #4

Updated by rbarlow almost 9 years ago

Pulp wrote:

Does it make any sense to implement this as a save() or post_save()
callback on RepositoryContentUnit itself? New saves vs updates could be
handled accordingly. We could also hook delete(). We may loose a little
bit of performance doing it unit by unit, but the update can be done
efficiently such as with the update_unit_count() method. This would take
us closer to an object oriented design.

Unfortunately, save()/delete() are only called when individual units are
saved/deleted, not when collection-wide update queries are called. I'm
not sure what kinds of queries are performed in all cases, but if any
bulk-insert queries are used this wouldn't work. In general, it's not
great to rely on save and delete in this way for models with Mongoengine
(or Django either).

--
Randy Barlow
irc: bowlofeggs

Actions #5

Updated by bmbouter almost 9 years ago

rbarlow wrote:

Unfortunately, save()/delete() are only called when individual units are
saved/deleted, not when collection-wide update queries are called. I'm
not sure what kinds of queries are performed in all cases, but if any
bulk-insert queries are used this wouldn't work. In general, it's not
great to rely on save and delete in this way for models with Mongoengine
(or Django either).

--
Randy Barlow
irc: bowlofeggs

To clarify, I'm proposing these save()/delete() hooks be done on the RepositoryContentUnit model which effectively is the association of units to a repo. I can't think of any bulk operation that wouldn't operate on RepositoryContentUnit and cause save()/delete() operations. The benefit of this approach is reusability. Future code could work with the RepositoryContentUnit model directly and receive the correct behaviors instead of using import_unit, remove_unit, delete_unit.

I want to check in on my understanding here. Is it correct that the content counts of types for a repo are stored on the repo for performance reasons only? Otherwise the RepositoryContentUnit collection itself would provide these counts with queries, right?

Also aren't import_unit and remove_unit the only two places where counts are modified? Shouldn't sync() associate and disassociate units using import_unit or remove_unit? Similarly for upload_unit, should that just call import_unit at some point? FYI, I'm not very familiar with the plugin API.

Actions #6

Updated by rbarlow almost 9 years ago

bmbouter wrote:

To clarify, I'm proposing these save()/delete() hooks be done on the RepositoryContentUnit model which effectively is the association of units to a repo. I can't think of any bulk operation that wouldn't operate on RepositoryContentUnit and cause save()/delete() operations. The benefit of this approach is reusability. Future code could work with the RepositoryContentUnit model directly and receive the correct behaviors instead of using import_unit, remove_unit, delete_unit.

There's a downside too - instead of one query to update the count after a sync of 10,000 units, there will be 10,000 queries. I think that will be noticeable, especially with the new lazy feature.

I want to check in on my understanding here. Is it correct that the content counts of types for a repo are stored on the repo for performance reasons only? Otherwise the RepositoryContentUnit collection itself would provide these counts with queries, right?

Yes, it's a denormalization. We used to do a query, and I believe mhrivnak made the change to store it this way for rapid access.

Also aren't import_unit and remove_unit the only two places where counts are modified? Shouldn't sync() associate and disassociate units using import_unit or remove_unit? Similarly for upload_unit, should that just call import_unit at some point? FYI, I'm not very familiar with the plugin API.

No, import_units(), upload(), sync(), and remove_units() are all methods on an Importer, and each of them can perform bulk actions.

Actions #7

Updated by mhrivnak almost 9 years ago

For now, I think we should stick to the mechanism we have. This issue is just about moving the call to that mechanism from one place to another, which is a big improvement, and very simple.

Changing how we do it is a worthy discussion, but a separate and more complex issue.

Actions #8

Updated by bmbouter almost 9 years ago

mhrivnak wrote:

For now, I think we should stick to the mechanism we have. This issue is just about moving the call to that mechanism from one place to another, which is a big improvement, and very simple.

Changing how we do it is a worthy discussion, but a separate and more complex issue.

Agreed. Let's keep this to fixing just those specific calls. I'm convinced that the save()/delete() design won't work because of the performance reasons rbarlow mentioned.

Actions #9

Updated by pcreech almost 9 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to pcreech

Added by pcreech almost 9 years ago

Revision cfc8878d | View on GitHub

Update Repository Units from platform

Updating repository units from platform, due to them not being reliably updated by plugins. The code to do this lives in platform, so we should be able to utilize it from there

https://pulp.plan.io/issues/1467 re #1467

Added by pcreech almost 9 years ago

Revision cfc8878d | View on GitHub

Update Repository Units from platform

Updating repository units from platform, due to them not being reliably updated by plugins. The code to do this lives in platform, so we should be able to utilize it from there

https://pulp.plan.io/issues/1467 re #1467

Added by pcreech almost 9 years ago

Revision d99c61f7 | View on GitHub

Update repository units from platform

Updating repository units from platform, due to them not being reliably updated by plugins. The code to do this lives in platform, so we should be able to utilize it from there

https://pulp.plan.io/issues/1467 re #1467

Added by pcreech almost 9 years ago

Revision c18f0359 | View on GitHub

Update repository units from platform

Updating repository units from platform, due to them not being reliably updated by plugins. The code to do this lives in platform, so we should be able to utilize it from there

https://pulp.plan.io/issues/1467 re #1467

Actions #10

Updated by pcreech almost 9 years ago

  • Status changed from ASSIGNED to MODIFIED
Actions #11

Updated by ipanova@redhat.com almost 9 years ago

  • Related to Issue #1570: Reading an RPM repo returns incorrect rpm_unit_counts added
Actions #12

Updated by dkliban@redhat.com almost 9 years ago

  • Status changed from MODIFIED to 5
Actions #13

Updated by dkliban@redhat.com almost 9 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #14

Updated by bmbouter over 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF