Migrating large file repository leads to unmigrated units
I migrated this repository: http://quartet.usersys.redhat.com/pub/fake-repos/very_large_file_150k/ (apologies its an internal link), and after migrating with this plan:
The migration did not error, but failed to migrate almost 20,000 units:
pulp=# select count(*) from pulp_2to3_migration_pulp2content where pulp3_content_id is NULL; count
19346 (1 row)
#7 Updated by dalley over 1 year ago
Re-running the migration plan does continue adding more FileContent to the database FWIW, so there doesn't appear to be any fundamental reason why the content is being skipped. It's just random.
After a second migration I now have 130,000 FileContent (out of 150,000 expected) instead of the 109,000 I had before or the 91,000 I got in the separate test up above.
#8 Updated by dalley over 1 year ago
- Status changed from ASSIGNED to NEW
- Assignee deleted (
I was absolutely able to reproduce this several times, but for some reason, I have stopped being able to, even after environment refreshes and everything I could think of. Given this, I'm going to step back from it for a bit.
#10 Updated by email@example.com over 1 year ago
- Status changed from ASSIGNED to POST
#11 Updated by firstname.lastname@example.org over 1 year ago
I post the observations here for historical purpose
In : batch_count = 36 In : batch_size = 4167 In : pulp2content_qs = Pulp2Content.objects.filter(pulp2_content_type_id='iso') In : migrators =  In : s = set() In : for batch_idx in range(batch_count): ...: start = batch_idx * batch_size ...: end = (batch_idx + 1) * batch_size ...: batch = pulp2content_qs[start:end] ...: migrators.append(batch) In : ...: for i in migrators: ...: s.update(set(i)) In : len(s) Out: 101194 In : pulp2content_qs.count() Out: 149997
- we needed to remove slicing, another option is if call list() on the qs which will evaluate it, but as a side effect it will need to evaluate all the values at once..
- seems like because qs is not evaluated the sliced queries are getting evaluated as they are requested and this produces different results. This behaviour was not observed in a small qs, so maybe the the batching_size has some impact
In : pulp2content_qs = Pulp2Content.objects.filter(pulp2_content_type_id='iso') In : a=pulp2content_qs[0:10] In : b=pulp2content_qs[0:10] In : list(a)==list(b) Out: True In : a=pulp2content_qs[0:10000] In : b=pulp2content_qs[0:10000] In : list(a)==list(b) Out: False
#12 Updated by email@example.com over 1 year ago
- Status changed from POST to MODIFIED
Applied in changeset pulp:pulp-2to3-migration|97a5d04bc926d0fde2919ffaf12c359ddbc71054.
#13 Updated by firstname.lastname@example.org over 1 year ago
Applied in changeset pulp-2to3-migration|97a5d04bc926d0fde2919ffaf12c359ddbc71054.
#14 Updated by email@example.com over 1 year ago
Applied in changeset pulp:pulp-2to3-migrate|97a5d04bc926d0fde2919ffaf12c359ddbc71054.
Please register to edit this issue