Issue #6103
closedMigrating large file repository leads to unmigrated units
Description
I migrated this repository: http://quartet.usersys.redhat.com/pub/fake-repos/very_large_file_150k/ (apologies its an internal link), and after migrating with this plan:
{"plugins":[{"type":"iso","repositories":[{"name":"cf332c42-b2c7-4bc2-a36f-4cdf361dd44d","repository_versions":[{"pulp2_repository_id":"cf332c42-b2c7-4bc2-a36f-4cdf361dd44d","pulp2_distributor_repository_ids":["cf332c42-b2c7-4bc2-a36f-4cdf361dd44d"]}],"pulp2_importer_repository_id":"cf332c42-b2c7-4bc2-a36f-4cdf361dd44d"}]}]}
The migration did not error, but failed to migrate almost 20,000 units:
pulp=# select count(*) from pulp_2to3_migration_pulp2content where pulp3_content_id is NULL; count
19346 (1 row)
Updated by dalley almost 5 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to dalley
Updated by dalley almost 5 years ago
Reproduced
In [23]: Pulp2Content.objects.count()
Out[23]: 150267
In [24]: FileContent.objects.count()
Out[24]: 90978
In [25]: Pulp2Content.objects.filter(pulp3_content_id=None).count()
Out[25]: 59289
Updated by ipanova@redhat.com almost 5 years ago
- Triaged changed from No to Yes
- Sprint set to Sprint 65
Updated by dalley almost 5 years ago
It looks like the number of migrated units is different each time, and whether you do a catch-all or a specific migration plan doesn't matter.
Updated by dalley almost 5 years ago
Re-running the migration plan does continue adding more FileContent to the database FWIW, so there doesn't appear to be any fundamental reason why the content is being skipped. It's just random.
After a second migration I now have 130,000 FileContent (out of 150,000 expected) instead of the 109,000 I had before or the 91,000 I got in the separate test up above.
Updated by dalley almost 5 years ago
- Status changed from ASSIGNED to NEW
- Assignee deleted (
dalley)
I was absolutely able to reproduce this several times, but for some reason, I have stopped being able to, even after environment refreshes and everything I could think of. Given this, I'm going to step back from it for a bit.
Updated by ipanova@redhat.com almost 5 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to ipanova@redhat.com
Added by ipanova@redhat.com almost 5 years ago
Added by ipanova@redhat.com almost 5 years ago
Revision 97a5d04b | View on GitHub
Problem: Migrating large repository leads to unmigrated units.
Solution: Remove slicing of the unevaluated qs.
Added by ipanova@redhat.com almost 5 years ago
Revision 97a5d04b | View on GitHub
Problem: Migrating large repository leads to unmigrated units.
Solution: Remove slicing of the unevaluated qs.
Updated by ipanova@redhat.com almost 5 years ago
- Status changed from ASSIGNED to POST
Updated by ipanova@redhat.com almost 5 years ago
I post the observations here for historical purpose
In [96]: batch_count = 36
In [97]: batch_size = 4167
In [98]: pulp2content_qs = Pulp2Content.objects.filter(pulp2_content_type_id='iso')
In [99]: migrators = []
In [100]: s = set()
In [101]: for batch_idx in range(batch_count):
...: start = batch_idx * batch_size
...: end = (batch_idx + 1) * batch_size
...: batch = pulp2content_qs[start:end]
...: migrators.append(batch)
In [102]:
...: for i in migrators:
...: s.update(set(i))
In [103]: len(s)
Out[103]: 101194
In [114]: pulp2content_qs.count()
Out[114]: 149997
- we needed to remove slicing, another option is if call list() on the qs which will evaluate it, but as a side effect it will need to evaluate all the values at once..
- seems like because qs is not evaluated the sliced queries are getting evaluated as they are requested and this produces different results. This behaviour was not observed in a small qs, so maybe the the batching_size has some impact
In [1]: pulp2content_qs = Pulp2Content.objects.filter(pulp2_content_type_id='iso')
In [2]: a=pulp2content_qs[0:10]
In [3]: b=pulp2content_qs[0:10]
In [11]: list(a)==list(b)
Out[11]: True
In [14]: a=pulp2content_qs[0:10000]
In [15]: b=pulp2content_qs[0:10000]
In [16]: list(a)==list(b)
Out[16]: False
Updated by ipanova@redhat.com almost 5 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulp:pulp-2to3-migration|97a5d04bc926d0fde2919ffaf12c359ddbc71054.
Updated by ipanova@redhat.com almost 5 years ago
Applied in changeset pulp-2to3-migration|97a5d04bc926d0fde2919ffaf12c359ddbc71054.
Updated by ipanova@redhat.com over 4 years ago
Applied in changeset pulp:pulp-2to3-migrate|97a5d04bc926d0fde2919ffaf12c359ddbc71054.
Updated by ttereshc over 4 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Updated by ggainey over 4 years ago
- Tags Katello added
- Tags deleted (
Katello-P2)
Problem: Migrating large repository leads to unmigrated units.
Solution: Remove slicing of the unevaluated qs.
closes #6103 https://pulp.plan.io/issues/6103