Project

Profile

Help

Issue #6103

Migrating large file repository leads to unmigrated units

Added by jsherril@redhat.com over 1 year ago. Updated about 1 year ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Katello
Sprint:
Sprint 66
Quarter:

Description

I migrated this repository: http://quartet.usersys.redhat.com/pub/fake-repos/very_large_file_150k/ (apologies its an internal link), and after migrating with this plan:

{"plugins":[{"type":"iso","repositories":[{"name":"cf332c42-b2c7-4bc2-a36f-4cdf361dd44d","repository_versions":[{"pulp2_repository_id":"cf332c42-b2c7-4bc2-a36f-4cdf361dd44d","pulp2_distributor_repository_ids":["cf332c42-b2c7-4bc2-a36f-4cdf361dd44d"]}],"pulp2_importer_repository_id":"cf332c42-b2c7-4bc2-a36f-4cdf361dd44d"}]}]}

The migration did not error, but failed to migrate almost 20,000 units:

pulp=# select count(*) from pulp_2to3_migration_pulp2content where pulp3_content_id is NULL; count

19346 (1 row)

Associated revisions

Revision 97a5d04b View on GitHub
Added by ipanova@redhat.com over 1 year ago

Problem: Migrating large repository leads to unmigrated units.

Solution: Remove slicing of the unevaluated qs.

closes #6103 https://pulp.plan.io/issues/6103

Revision 97a5d04b View on GitHub
Added by ipanova@redhat.com over 1 year ago

Problem: Migrating large repository leads to unmigrated units.

Solution: Remove slicing of the unevaluated qs.

closes #6103 https://pulp.plan.io/issues/6103

Revision 97a5d04b View on GitHub
Added by ipanova@redhat.com over 1 year ago

Problem: Migrating large repository leads to unmigrated units.

Solution: Remove slicing of the unevaluated qs.

closes #6103 https://pulp.plan.io/issues/6103

History

#1 Updated by jsherril@redhat.com over 1 year ago

  • Tags Katello-P2 added

#2 Updated by dalley over 1 year ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dalley

#3 Updated by dalley over 1 year ago

Reproduced

In [23]: Pulp2Content.objects.count()                                                                                                                                                                                                         
Out[23]: 150267

In [24]: FileContent.objects.count()                                                                                                                                                                                                          
Out[24]: 90978

In [25]: Pulp2Content.objects.filter(pulp3_content_id=None).count()                                                                                                                                                                           
Out[25]: 59289

#4 Updated by ipanova@redhat.com over 1 year ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 65

#5 Updated by dalley over 1 year ago

It looks like the number of migrated units is different each time, and whether you do a catch-all or a specific migration plan doesn't matter.

#6 Updated by rchan over 1 year ago

  • Sprint changed from Sprint 65 to Sprint 66

#7 Updated by dalley over 1 year ago

Re-running the migration plan does continue adding more FileContent to the database FWIW, so there doesn't appear to be any fundamental reason why the content is being skipped. It's just random.

After a second migration I now have 130,000 FileContent (out of 150,000 expected) instead of the 109,000 I had before or the 91,000 I got in the separate test up above.

#8 Updated by dalley over 1 year ago

  • Status changed from ASSIGNED to NEW
  • Assignee deleted (dalley)

I was absolutely able to reproduce this several times, but for some reason, I have stopped being able to, even after environment refreshes and everything I could think of. Given this, I'm going to step back from it for a bit.

#9 Updated by ipanova@redhat.com over 1 year ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ipanova@redhat.com

#10 Updated by ipanova@redhat.com over 1 year ago

  • Status changed from ASSIGNED to POST

#11 Updated by ipanova@redhat.com over 1 year ago

I post the observations here for historical purpose

In [96]: batch_count = 36                                                                                                                                                                                          

In [97]: batch_size = 4167                                                                                                                                                                                         

In [98]: pulp2content_qs = Pulp2Content.objects.filter(pulp2_content_type_id='iso')                                                                                                                                

In [99]: migrators = []                                                                                                                                                                                            

In [100]: s = set()                                                                                                                                                                                                

In [101]: for batch_idx in range(batch_count): 
     ...:     start = batch_idx * batch_size 
     ...:     end = (batch_idx + 1) * batch_size 
     ...:     batch = pulp2content_qs[start:end] 
     ...:     migrators.append(batch) 
                                                                                                                                                                                                   

In [102]:  
     ...: for i in migrators: 
     ...:     s.update(set(i)) 
                                                                                                                                                                                                   

In [103]: len(s)                                                                                                                                                                                                   
Out[103]: 101194

In [114]: pulp2content_qs.count()                                                                                                                                                                                  
Out[114]: 149997

  1. we needed to remove slicing, another option is if call list() on the qs which will evaluate it, but as a side effect it will need to evaluate all the values at once..
  2. seems like because qs is not evaluated the sliced queries are getting evaluated as they are requested and this produces different results. This behaviour was not observed in a small qs, so maybe the the batching_size has some impact
In [1]: pulp2content_qs = Pulp2Content.objects.filter(pulp2_content_type_id='iso')                                                                                                                                 

In [2]: a=pulp2content_qs[0:10]                                                                                                                                                                                    

In [3]: b=pulp2content_qs[0:10]                                                                                                                                                                                    

In [11]: list(a)==list(b)                                                                                                                                                                                          
Out[11]: True

In [14]: a=pulp2content_qs[0:10000]                                                                                                                                                                                

In [15]: b=pulp2content_qs[0:10000]                                                                                                                                                                                

In [16]: list(a)==list(b)                                                                                                                                                                                          
Out[16]: False

#12 Updated by ipanova@redhat.com over 1 year ago

  • Status changed from POST to MODIFIED

#15 Updated by ttereshc about 1 year ago

  • Sprint/Milestone set to 0.1.0

#16 Updated by ttereshc about 1 year ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

#17 Updated by ggainey about 1 year ago

  • Tags Katello added
  • Tags deleted (Katello-P2)

Please register to edit this issue

Also available in: Atom PDF