Issue #6347
closedFile content list does not return all unique content units
Description
https://pulp-file.readthedocs.io/en/latest/restapi.html#operation/content_file_files_list does not return all unique content units when used with a page size of ~2000 and there are multiple pages of results. To reproduce:
- Create and sync a large file repo (http://quartet.usersys.redhat.com/pub/fake-repos/large_file/)
- Retrieve content units for the repo using page_size 2000
- Out of 70,000 content units, roughly 69k unique records are returned.
Several units are repeated across pages leading to this discrepancy since we only query till offset 68000 with the 2000 page_size and 70k results don't consist of 70k unique units.
Files
Updated by fao89 almost 5 years ago
- Triaged changed from No to Yes
- Sprint set to Sprint 68
Updated by jsherril@redhat.com almost 5 years ago
- Tags Katello-P1 added
- Tags deleted (
Katello-P2)
Updated by dkliban@redhat.com almost 5 years ago
This may be a problem for other plugins such as pulp_container. We should see if we can add the sort to all content by default.
Updated by dkliban@redhat.com almost 5 years ago
All Content models should be sorted by the 'pulp_created' field.
Updated by lmjachky almost 5 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to lmjachky
Updated by dkliban@redhat.com almost 5 years ago
- Project changed from File Support to Pulp
- Sprint/Milestone set to 3.3.0
Updated by lmjachky almost 5 years ago
I was able to reproduce the issue. I used the attached test to do so. The output of the test looked like this:
2000: 4000
4000: 6000
6000: 8000
8000: 10000
10000: 12000
12000: 14000
14000: 16000
16000: 18000
18000: 20000
20000: 22000
22000: 24000
24000: 25295
26000: 26597
28000: 27908
30000: 29271
32000: 30531
34000: 31868
36000: 33239
38000: 34526
40000: 35833
42000: 37146
44000: 38486
46000: 39804
48000: 41137
50000: 42435
52000: 43745
54000: 45055
56000: 46362
58000: 47703
60000: 49003
62000: 50342
64000: 51626
66000: 52978
68000: 54312
70000: 54312
Expected: 70000
Actual: 54312
The content started to be non-unique after 24,000 fetched units (the number probably depends upon a running system). And it did not matter whether I was retrieving the content units using page_size=2,000 or page_size=200.
I tried to sort synced content by the field "pulp_created" (as dkliban had mentioned) using the built-in Django ordering options (https://docs.djangoproject.com/en/3.0/ref/models/options/#django.db.models.Options.ordering), but with no success. I added "ordering = ['-pulp_created']" to the meta class of the model FileContent, however, the sync failed with the following error message:
"traceback": " File \"/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py\", line 884, in perform_job\n
rv = job.perform()\n File \"/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/job.py\", line 664, in perform\n
self._result = self._execute()\n File \"/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/job.py\", line 670, in _execute\n
return self.func(*self.args, **self.kwargs)\n File \"/home/fedora/devel/pulp_file/pulp_file/app/tasks/synchronizing.py\", line 45,
in synchronize\n dv.create()\n File \"/home/fedora/devel/pulpcore/pulpcore/plugin/stages/declarative_version.py\", line 149,
in create\n loop.run_until_complete(pipeline)\n File \"/home/fedora/devel/pulpcore/pulpcore/app/models/repository.py\",
line 753, in __exit__\n repository.finalize_new_version(self)\n File \"/home/fedora/devel/pulp_file/pulp_file/app/models.py\",
line 68, in finalize_new_version\n validate_repo_version(new_version)\n File
\"/home/fedora/devel/pulpcore/pulpcore/plugin/repo_version_utils.py\", line 138, in validate_repo_version\n
validate_duplicate_content(version)\n File \"/home/fedora/devel/pulpcore/pulpcore/plugin/repo_version_utils.py\",
line 96, in validate_duplicate_content\n ).distinct(*repo_key_fields).count()\n
File \"/usr/local/lib/pulp/lib64/python3.7/site-packages/django/db/models/query.py\", line 392, in count\n
return self.query.get_count(using=self.db)\n File \"/usr/local/lib/pulp/lib64/python3.7/site-packages/django/db/models/sql/query.py\",
line 504, in get_count\n number = obj.get_aggregation(using, ['__count'])['__count']\n
File \"/usr/local/lib/pulp/lib64/python3.7/site-packages/django/db/models/sql/query.py\", line 489,
in get_aggregation\n result = compiler.execute_sql(SINGLE)\n File \"/usr/local/lib/pulp/lib64/python3.7/site-packages/django/db/models/sql/compiler.py\", line 1133, in execute_sql\n
cursor.execute(sql, params)\n
File \"/usr/local/lib/pulp/lib64/python3.7/site-packages/django/db/backends/utils.py\", line 67, in execute\n
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)\n
File \"/usr/local/lib/pulp/lib64/python3.7/site-packages/django/db/backends/utils.py\", line 76, in _execute_with_wrappers\n
return executor(sql, params, many, context)\n File \"/usr/local/lib/pulp/lib64/python3.7/site-packages/django/db/backends/utils.py\",
line 84, in _execute\n return self.cursor.execute(sql, params)\n
File \"/usr/local/lib/pulp/lib64/python3.7/site-packages/django/db/utils.py\", line 89, in __exit__\n
raise dj_exc_value.with_traceback(traceback) from exc_value\n
File \"/usr/local/lib/pulp/lib64/python3.7/site-packages/django/db/backends/utils.py\", line 84, in _execute\n
return self.cursor.execute(sql, params)\n",
"description": "SELECT DISTINCT ON expressions must match initial ORDER BY expressions\nLINE 1: SELECT COUNT(*) FROM (SELECT DISTINCT ON (\"file_filecontent\"...\n
I did not experience the above error message during the sync when I added that ordering to the model MasterModel. Still, Pulp did not return unique values. Specifying a default ordering (https://www.django-rest-framework.org/api-guide/filtering/#specifying-a-default-ordering) in FileContentViewSet did not help either way.
The only approach, that was working for me, was to create a custom class that handles the actual ordering, like shown in the commit https://github.com/lubosmj/pulp_file/commit/379ce9298307a4d8886a1cb1f4e2a5bd08906ef7. I just stole the workaround proposed here: https://github.com/encode/django-rest-framework/issues/6886#issuecomment-547120480. What is interesting here is that I did not need to order content by the field "pulp_created" and the test passed as supposed. Yet, It seems like the changes did break some functionality according to Travis (see the build https://travis-ci.org/github/lubosmj/pulp_file/builds/670764582, only one test failed, but I am not experiencing such a failure in my VM).
Also, there is an option to use a different pagination. Currently, we are using LimitOffsetPagination. There exists CursorPagination (https://www.django-rest-framework.org/api-guide/pagination/#cursorpagination) which requires that there is a unique, unchanging ordering of items in the result set.
Updated by fao89 almost 5 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to fao89
Updated by pulpbot almost 5 years ago
- Status changed from ASSIGNED to POST
Updated by pulpbot almost 5 years ago
Updated by fao89 almost 5 years ago
- Assignee changed from fao89 to lmjachky
Lubos did all the work: https://github.com/pulp/pulpcore/pull/634
Added by Fabricio Aguiar almost 5 years ago
Updated by Anonymous almost 5 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulpcore|5ef9ee839bb23d15f0754e44e53358789ce57eb0.
Added by Fabricio Aguiar almost 5 years ago
Revision 937fbddf | View on GitHub
Fixed non unique content units on content list
https://pulp.plan.io/issues/6347 closes #6347
(cherry picked from commit 5ef9ee839bb23d15f0754e44e53358789ce57eb0)
Updated by Anonymous almost 5 years ago
Applied in changeset pulpcore|937fbddf716fd8e14d0ce65e32d970b77b3955c5.
Updated by pulpbot almost 5 years ago
Updated by bmbouter almost 5 years ago
- Status changed from MODIFIED to ASSIGNED
Updated by bmbouter almost 5 years ago
- Status changed from ASSIGNED to POST
- Assignee changed from lmjachky to fao89
Added by Fabricio Aguiar almost 5 years ago
Revision 8f6fb1bd | View on GitHub
StableOrderingFilter only for NamedModelViewSet
Updated by Anonymous almost 5 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulpcore|8f6fb1bd5aaa7f9708d9576efb4805426756ffce.
Updated by ttereshc almost 5 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Updated by ggainey over 4 years ago
- Tags Katello added
- Tags deleted (
Katello-P1)
Fixed non unique content units on content list
https://pulp.plan.io/issues/6347 closes #6347