Issue #7887
closedIf orphaned content is removed in pulp 2 between migration re-runs, FileNotFoundError is raised
Description
It happens only if orphaned content hasn't been fully migrated with the first run.
To reproduce:
- have orphaned content in pulp 2
- run migration, stop it before the content migration is done (ensure that at least one of the orphaned content units was pre-migrated but not migrated.)
- clean orphans in pulp2
- run migration again
{
"child_tasks": [],
"created_resources": [
"/pulp/api/v3/task-groups/f6d728d2-61ea-49f2-b461-f8e0baf33b9b/"
],
"error": {
"description": "[Errno 2] No such file or directory: '/var/lib/pulp/content/units/iso/d7/269cf0f9afe9445d5e31c82cf13be3b17f75af57312a1638659ac592c221fc/1.iso'",
"traceback": " File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/worker.py\", line 936, in perform_job\n rv = job.perform()\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py\", line 684, in perform\n self._result = self._execute()\n File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py\", line 690, in _execute\n return self.func(*self.args, **self.kwargs)\n File \"/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/tasks/migrate.py\", line 141, in migrate_from_pulp2\n migrate_content(plan, skip_corrupted=skip_corrupted)\n File \"/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/migration.py\", line 47, in migrate_content\n plugin.migrator.migrate_content_to_pulp3(skip_corrupted=skip_corrupted)\n File \"/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/plugin/iso/migrator.py\", line 64, in migrate_content_to_pulp3\n loop.run_until_complete(dm.create())\n File \"/usr/lib64/python3.6/asyncio/base_events.py\", line 484, in run_until_complete\n return future.result()\n File \"/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/plugin/content.py\", line 89, in create\n await pipeline\n File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/api.py\", line 225, in create_pipeline\n await asyncio.gather(*futures)\n File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/api.py\", line 43, in __call__\n await self.run()\n File \"/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/plugin/content.py\", line 178, in run\n self.migrate_to_pulp3(cmodel, ctype)\n File \"/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/plugin/content.py\", line 378, in migrate_to_pulp3\n downloaded=pulp2content.downloaded\n File \"/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/plugin/content.py\", line 128, in create_artifact\n expected_size=expected_size)\n File \"/home/vagrant/devel/pulpcore/pulpcore/app/models/content.py\", line 277, in init_and_validate\n with open(file, \"rb\") as f:\n"
},
"finished_at": "2020-11-23T11:12:54.853991Z",
"name": "pulp_2to3_migration.app.tasks.migrate.migrate_from_pulp2",
"parent_task": null,
"progress_reports": [
{
"code": "creating.repositories",
"done": 0,
"message": "Creating repositories in Pulp 3",
"state": "completed",
"suffix": null,
"total": 0
},
{
"code": "migrating.importers",
"done": 0,
"message": "Migrating importers to Pulp 3",
"state": "completed",
"suffix": null,
"total": 0
},
{
"code": "migrating.content",
"done": 0,
"message": "Migrating content to Pulp 3",
"state": "failed",
"suffix": null,
"total": 0
},
{
"code": "migrating.iso.content",
"done": 13,
"message": "Migrating iso content to Pulp 3 iso",
"state": "failed",
"suffix": null,
"total": 276
},
{
"code": "processing.repositories",
"done": 4,
"message": "Processing Pulp 2 repositories, importers, distributors",
"state": "completed",
"suffix": null,
"total": 4
},
{
"code": "premigrating.content.general",
"done": 0,
"message": "Pre-migrating Pulp 2 ISO content (general info)",
"state": "completed",
"suffix": null,
"total": 0
},
{
"code": "premigrating.content.detail",
"done": 0,
"message": "Pre-migrating Pulp 2 ISO content (detail info)",
"state": "completed",
"suffix": null,
"total": 0
}
],
"pulp_created": "2020-11-23T11:12:54.394083Z",
"pulp_href": "/pulp/api/v3/tasks/e1a1a18e-bebe-4f74-bde8-804432964765/",
"reserved_resources_record": [
"pulp_2to3_migration"
],
"started_at": "2020-11-23T11:12:54.538645Z",
"state": "failed",
"task_group": "/pulp/api/v3/task-groups/f6d728d2-61ea-49f2-b461-f8e0baf33b9b/",
"worker": "/pulp/api/v3/workers/74ba07cd-4830-4527-9c70-452f16fe2de5/"
}
Updated by dalley almost 4 years ago
At the scales we are expecting, it is plausible that we could brute force this problem by:
- Extract all of the pulp 2 content IDs from mongo
- Extract all of the pulp 2 content IDs from the premigrated content table in postgresql
- Dump them into sets
- Take a difference of the sets
- Delete the pre-migrated content which was removed from Pulp 2
MongoDB object IDs are 24 character long hexidecimal strings. 24 bytes x 2 lists x 4,000,000 content (which would be a very large installation) would yield about 200 megabytes of memory consumption. Since Python uses string interning and the contents of the lists are expected to be mostly duplicate, the actual number should be ~max 100 megabytes.
If we have to do this work anyway, we might also be able to use it to get rid of some corner cases.
https://github.com/pulp/pulp-2to3-migration/blob/master/pulp_2to3_migration/app/pre_migration.py#L140-L152 https://github.com/pulp/pulp-2to3-migration/blob/master/pulp_2to3_migration/app/pre_migration.py#L120-L124
Updated by ttereshc over 3 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to ttereshc
- Sprint set to Sprint 91
- Quarter set to Q1-2021
Updated by ttereshc over 3 years ago
This won't affect Katello much because now we have skip_corrupted=True and they always use it. But it will be an extra work on every re-run.
Updated by ttereshc over 3 years ago
- Status changed from ASSIGNED to POST
Added by ttereshc over 3 years ago
Added by ttereshc over 3 years ago
Revision be75d4ca | View on GitHub
Remove Pulp2Content records if content is no longer in Pulp2
If some content is only pre-migrated and not migrated to Pulp 3, and then removed from Pulp 2, it can cause a problem with subsequent migration run. Such content is treated as corrupted but in reality, it's no longer in Pulp 2. Now such records are removed before any pre-migration starts.
Added by ttereshc over 3 years ago
Revision be75d4ca | View on GitHub
Remove Pulp2Content records if content is no longer in Pulp2
If some content is only pre-migrated and not migrated to Pulp 3, and then removed from Pulp 2, it can cause a problem with subsequent migration run. Such content is treated as corrupted but in reality, it's no longer in Pulp 2. Now such records are removed before any pre-migration starts.
Updated by ttereshc over 3 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulp:pulp-2to3-migration|be75d4caabc5fe04a5acf331b2ece627cae463fa.
Updated by pulpbot over 3 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Remove Pulp2Content records if content is no longer in Pulp2
If some content is only pre-migrated and not migrated to Pulp 3, and then removed from Pulp 2, it can cause a problem with subsequent migration run. Such content is treated as corrupted but in reality, it's no longer in Pulp 2. Now such records are removed before any pre-migration starts.
closes #7887 https://pulp.plan.io/issues/7887