Project

Profile

Help

Issue #7887

closed

If orphaned content is removed in pulp 2 between migration re-runs, FileNotFoundError is raised

Added by ttereshc over 3 years ago. Updated about 3 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Platform Release:
OS:
Triaged:
No
Groomed:
No
Sprint Candidate:
No
Tags:
Katello
Sprint:
Sprint 93
Quarter:
Q1-2021

Description

It happens only if orphaned content hasn't been fully migrated with the first run.

To reproduce:

  • have orphaned content in pulp 2
  • run migration, stop it before the content migration is done (ensure that at least one of the orphaned content units was pre-migrated but not migrated.)
  • clean orphans in pulp2
  • run migration again
{
    "child_tasks": [],
    "created_resources": [
        "/pulp/api/v3/task-groups/f6d728d2-61ea-49f2-b461-f8e0baf33b9b/"
    ],
    "error": {
        "description": "[Errno 2] No such file or directory: '/var/lib/pulp/content/units/iso/d7/269cf0f9afe9445d5e31c82cf13be3b17f75af57312a1638659ac592c221fc/1.iso'",
        "traceback": "  File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/worker.py\", line 936, in perform_job\n    rv = job.perform()\n  File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py\", line 684, in perform\n    self._result = self._execute()\n  File \"/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py\", line 690, in _execute\n    return self.func(*self.args, **self.kwargs)\n  File \"/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/tasks/migrate.py\", line 141, in migrate_from_pulp2\n    migrate_content(plan, skip_corrupted=skip_corrupted)\n  File \"/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/migration.py\", line 47, in migrate_content\n    plugin.migrator.migrate_content_to_pulp3(skip_corrupted=skip_corrupted)\n  File \"/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/plugin/iso/migrator.py\", line 64, in migrate_content_to_pulp3\n    loop.run_until_complete(dm.create())\n  File \"/usr/lib64/python3.6/asyncio/base_events.py\", line 484, in run_until_complete\n    return future.result()\n  File \"/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/plugin/content.py\", line 89, in create\n    await pipeline\n  File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/api.py\", line 225, in create_pipeline\n    await asyncio.gather(*futures)\n  File \"/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/api.py\", line 43, in __call__\n    await self.run()\n  File \"/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/plugin/content.py\", line 178, in run\n    self.migrate_to_pulp3(cmodel, ctype)\n  File \"/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/plugin/content.py\", line 378, in migrate_to_pulp3\n    downloaded=pulp2content.downloaded\n  File \"/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/plugin/content.py\", line 128, in create_artifact\n    expected_size=expected_size)\n  File \"/home/vagrant/devel/pulpcore/pulpcore/app/models/content.py\", line 277, in init_and_validate\n    with open(file, \"rb\") as f:\n"
    },
    "finished_at": "2020-11-23T11:12:54.853991Z",
    "name": "pulp_2to3_migration.app.tasks.migrate.migrate_from_pulp2",
    "parent_task": null,
    "progress_reports": [
        {
            "code": "creating.repositories",
            "done": 0,
            "message": "Creating repositories in Pulp 3",
            "state": "completed",
            "suffix": null,
            "total": 0
        },
        {
            "code": "migrating.importers",
            "done": 0,
            "message": "Migrating importers to Pulp 3",
            "state": "completed",
            "suffix": null,
            "total": 0
        },
        {
            "code": "migrating.content",
            "done": 0,
            "message": "Migrating content to Pulp 3",
            "state": "failed",
            "suffix": null,
            "total": 0
        },
        {
            "code": "migrating.iso.content",
            "done": 13,
            "message": "Migrating iso content to Pulp 3 iso",
            "state": "failed",
            "suffix": null,
            "total": 276
        },
        {
            "code": "processing.repositories",
            "done": 4,
            "message": "Processing Pulp 2 repositories, importers, distributors",
            "state": "completed",
            "suffix": null,
            "total": 4
        },
        {
            "code": "premigrating.content.general",
            "done": 0,
            "message": "Pre-migrating Pulp 2 ISO content (general info)",
            "state": "completed",
            "suffix": null,
            "total": 0
        },
        {
            "code": "premigrating.content.detail",
            "done": 0,
            "message": "Pre-migrating Pulp 2 ISO content (detail info)",
            "state": "completed",
            "suffix": null,
            "total": 0
        }
    ],
    "pulp_created": "2020-11-23T11:12:54.394083Z",
    "pulp_href": "/pulp/api/v3/tasks/e1a1a18e-bebe-4f74-bde8-804432964765/",
    "reserved_resources_record": [
        "pulp_2to3_migration"
    ],
    "started_at": "2020-11-23T11:12:54.538645Z",
    "state": "failed",
    "task_group": "/pulp/api/v3/task-groups/f6d728d2-61ea-49f2-b461-f8e0baf33b9b/",
    "worker": "/pulp/api/v3/workers/74ba07cd-4830-4527-9c70-452f16fe2de5/"
}
Actions #1

Updated by ttereshc over 3 years ago

  • Description updated (diff)
Actions #2

Updated by dalley over 3 years ago

At the scales we are expecting, it is plausible that we could brute force this problem by:

  • Extract all of the pulp 2 content IDs from mongo
  • Extract all of the pulp 2 content IDs from the premigrated content table in postgresql
  • Dump them into sets
  • Take a difference of the sets
  • Delete the pre-migrated content which was removed from Pulp 2

MongoDB object IDs are 24 character long hexidecimal strings. 24 bytes x 2 lists x 4,000,000 content (which would be a very large installation) would yield about 200 megabytes of memory consumption. Since Python uses string interning and the contents of the lists are expected to be mostly duplicate, the actual number should be ~max 100 megabytes.

If we have to do this work anyway, we might also be able to use it to get rid of some corner cases.

https://github.com/pulp/pulp-2to3-migration/blob/master/pulp_2to3_migration/app/pre_migration.py#L140-L152 https://github.com/pulp/pulp-2to3-migration/blob/master/pulp_2to3_migration/app/pre_migration.py#L120-L124

Actions #3

Updated by ttereshc about 3 years ago

  • Sprint/Milestone set to 0.9.0
Actions #4

Updated by jsherril@redhat.com about 3 years ago

  • Tags Katello added
Actions #6

Updated by ttereshc about 3 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ttereshc
  • Sprint set to Sprint 91
  • Quarter set to Q1-2021
Actions #7

Updated by ttereshc about 3 years ago

This won't affect Katello much because now we have skip_corrupted=True and they always use it. But it will be an extra work on every re-run.

Actions #8

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 91 to Sprint 92
Actions #9

Updated by ttereshc about 3 years ago

  • Sprint/Milestone deleted (0.9.0)
Actions #10

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 92 to Sprint 93
Actions #11

Updated by ttereshc about 3 years ago

  • Status changed from ASSIGNED to POST

Added by ttereshc about 3 years ago

Revision be75d4ca | View on GitHub

Remove Pulp2Content records if content is no longer in Pulp2

If some content is only pre-migrated and not migrated to Pulp 3, and then removed from Pulp 2, it can cause a problem with subsequent migration run. Such content is treated as corrupted but in reality, it's no longer in Pulp 2. Now such records are removed before any pre-migration starts.

closes #7887 https://pulp.plan.io/issues/7887

Added by ttereshc about 3 years ago

Revision be75d4ca | View on GitHub

Remove Pulp2Content records if content is no longer in Pulp2

If some content is only pre-migrated and not migrated to Pulp 3, and then removed from Pulp 2, it can cause a problem with subsequent migration run. Such content is treated as corrupted but in reality, it's no longer in Pulp 2. Now such records are removed before any pre-migration starts.

closes #7887 https://pulp.plan.io/issues/7887

Added by ttereshc about 3 years ago

Revision be75d4ca | View on GitHub

Remove Pulp2Content records if content is no longer in Pulp2

If some content is only pre-migrated and not migrated to Pulp 3, and then removed from Pulp 2, it can cause a problem with subsequent migration run. Such content is treated as corrupted but in reality, it's no longer in Pulp 2. Now such records are removed before any pre-migration starts.

closes #7887 https://pulp.plan.io/issues/7887

Actions #12

Updated by ttereshc about 3 years ago

  • Status changed from POST to MODIFIED
Actions #13

Updated by ttereshc about 3 years ago

  • Sprint/Milestone set to 0.10.0
Actions #14

Updated by pulpbot about 3 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF