Project

Profile

Help

Issue #6103

closed

Migrating large file repository leads to unmigrated units

Added by jsherril@redhat.com almost 5 years ago. Updated over 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Katello
Sprint:
Sprint 66
Quarter:

Description

I migrated this repository: http://quartet.usersys.redhat.com/pub/fake-repos/very_large_file_150k/ (apologies its an internal link), and after migrating with this plan:

{"plugins":[{"type":"iso","repositories":[{"name":"cf332c42-b2c7-4bc2-a36f-4cdf361dd44d","repository_versions":[{"pulp2_repository_id":"cf332c42-b2c7-4bc2-a36f-4cdf361dd44d","pulp2_distributor_repository_ids":["cf332c42-b2c7-4bc2-a36f-4cdf361dd44d"]}],"pulp2_importer_repository_id":"cf332c42-b2c7-4bc2-a36f-4cdf361dd44d"}]}]}

The migration did not error, but failed to migrate almost 20,000 units:

pulp=# select count(*) from pulp_2to3_migration_pulp2content where pulp3_content_id is NULL; count

19346 (1 row)

Actions #1

Updated by jsherril@redhat.com almost 5 years ago

  • Tags Katello-P2 added
Actions #2

Updated by dalley almost 5 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dalley
Actions #3

Updated by dalley almost 5 years ago

Reproduced

In [23]: Pulp2Content.objects.count()                                                                                                                                                                                                         
Out[23]: 150267

In [24]: FileContent.objects.count()                                                                                                                                                                                                          
Out[24]: 90978

In [25]: Pulp2Content.objects.filter(pulp3_content_id=None).count()                                                                                                                                                                           
Out[25]: 59289
Actions #4

Updated by ipanova@redhat.com almost 5 years ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 65
Actions #5

Updated by dalley almost 5 years ago

It looks like the number of migrated units is different each time, and whether you do a catch-all or a specific migration plan doesn't matter.

Actions #6

Updated by rchan almost 5 years ago

  • Sprint changed from Sprint 65 to Sprint 66
Actions #7

Updated by dalley almost 5 years ago

Re-running the migration plan does continue adding more FileContent to the database FWIW, so there doesn't appear to be any fundamental reason why the content is being skipped. It's just random.

After a second migration I now have 130,000 FileContent (out of 150,000 expected) instead of the 109,000 I had before or the 91,000 I got in the separate test up above.

Actions #8

Updated by dalley almost 5 years ago

  • Status changed from ASSIGNED to NEW
  • Assignee deleted (dalley)

I was absolutely able to reproduce this several times, but for some reason, I have stopped being able to, even after environment refreshes and everything I could think of. Given this, I'm going to step back from it for a bit.

Actions #9

Updated by ipanova@redhat.com almost 5 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ipanova@redhat.com

Added by ipanova@redhat.com almost 5 years ago

Revision 97a5d04b | View on GitHub

Problem: Migrating large repository leads to unmigrated units.

Solution: Remove slicing of the unevaluated qs.

closes #6103 https://pulp.plan.io/issues/6103

Added by ipanova@redhat.com almost 5 years ago

Revision 97a5d04b | View on GitHub

Problem: Migrating large repository leads to unmigrated units.

Solution: Remove slicing of the unevaluated qs.

closes #6103 https://pulp.plan.io/issues/6103

Added by ipanova@redhat.com almost 5 years ago

Revision 97a5d04b | View on GitHub

Problem: Migrating large repository leads to unmigrated units.

Solution: Remove slicing of the unevaluated qs.

closes #6103 https://pulp.plan.io/issues/6103

Actions #10

Updated by ipanova@redhat.com almost 5 years ago

  • Status changed from ASSIGNED to POST
Actions #11

Updated by ipanova@redhat.com almost 5 years ago

I post the observations here for historical purpose

In [96]: batch_count = 36                                                                                                                                                                                          

In [97]: batch_size = 4167                                                                                                                                                                                         

In [98]: pulp2content_qs = Pulp2Content.objects.filter(pulp2_content_type_id='iso')                                                                                                                                

In [99]: migrators = []                                                                                                                                                                                            

In [100]: s = set()                                                                                                                                                                                                

In [101]: for batch_idx in range(batch_count): 
     ...:     start = batch_idx * batch_size 
     ...:     end = (batch_idx + 1) * batch_size 
     ...:     batch = pulp2content_qs[start:end] 
     ...:     migrators.append(batch) 
                                                                                                                                                                                                   

In [102]:  
     ...: for i in migrators: 
     ...:     s.update(set(i)) 
                                                                                                                                                                                                   

In [103]: len(s)                                                                                                                                                                                                   
Out[103]: 101194

In [114]: pulp2content_qs.count()                                                                                                                                                                                  
Out[114]: 149997

  1. we needed to remove slicing, another option is if call list() on the qs which will evaluate it, but as a side effect it will need to evaluate all the values at once..
  2. seems like because qs is not evaluated the sliced queries are getting evaluated as they are requested and this produces different results. This behaviour was not observed in a small qs, so maybe the the batching_size has some impact
In [1]: pulp2content_qs = Pulp2Content.objects.filter(pulp2_content_type_id='iso')                                                                                                                                 

In [2]: a=pulp2content_qs[0:10]                                                                                                                                                                                    

In [3]: b=pulp2content_qs[0:10]                                                                                                                                                                                    

In [11]: list(a)==list(b)                                                                                                                                                                                          
Out[11]: True

In [14]: a=pulp2content_qs[0:10000]                                                                                                                                                                                

In [15]: b=pulp2content_qs[0:10000]                                                                                                                                                                                

In [16]: list(a)==list(b)                                                                                                                                                                                          
Out[16]: False

Actions #12

Updated by ipanova@redhat.com almost 5 years ago

  • Status changed from POST to MODIFIED
Actions #15

Updated by ttereshc almost 5 years ago

  • Sprint/Milestone set to 0.1.0
Actions #16

Updated by ttereshc almost 5 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Actions #17

Updated by ggainey over 4 years ago

  • Tags Katello added
  • Tags deleted (Katello-P2)

Also available in: Atom PDF