Project

Profile

Help

Story #7538

As a user, I can provide an option to log and skip missing or corrupt files during artifact creation

Added by jsherril@redhat.com 7 months ago. Updated 4 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Katello
Sprint:
Sprint 86
Quarter:

Description

A user that is using pulp2 may have 100s of thousands of rpms synced or uploaded. Right now if a single rpm is missing or corrupt the entire migration process will fail. They may not care about this one particular rpm (it could be the 3rd out of 50 kernels released) for example, and otherwise are oblivious to this rpm being corrupt or missing.

We should provide an option to skip and warn the user in this situation. Here's the traceback:

    Sep 21 15:30:02 content-migration pulpcore-worker-2: Traceback (most recent call last):
    Sep 21 15:30:02 content-migration pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/rq/worker.py", line 883, in perform_job
    Sep 21 15:30:02 content-migration pulpcore-worker-2: rv = job.perform()
    Sep 21 15:30:02 content-migration pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/rq/job.py", line 657, in perform
    Sep 21 15:30:02 content-migration pulpcore-worker-2: self._result = self._execute()
    Sep 21 15:30:02 content-migration pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/rq/job.py", line 663, in _execute
    Sep 21 15:30:02 content-migration pulpcore-worker-2: return self.func(*self.args, **self.kwargs)
    Sep 21 15:30:02 content-migration pulpcore-worker-2: File "/usr/local/lib/python3.6/site-packages/pulp_2to3_migration/app/tasks/migrate.py", line 141, in migrate_from_pulp2
    Sep 21 15:30:02 content-migration pulpcore-worker-2: migrate_content(plan)
    Sep 21 15:30:02 content-migration pulpcore-worker-2: File "/usr/local/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 36, in migrate_content
    Sep 21 15:30:02 content-migration pulpcore-worker-2: plugin.migrator.migrate_content_to_pulp3()
    Sep 21 15:30:02 content-migration pulpcore-worker-2: File "/usr/local/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/rpm/migrator.py", line 142, in migrate_content_to_pulp3
    Sep 21 15:30:02 content-migration pulpcore-worker-2: loop.run_until_complete(dm.create())
    Sep 21 15:30:02 content-migration pulpcore-worker-2: File "/usr/lib64/python3.6/asyncio/base_events.py", line 484, in run_until_complete
    Sep 21 15:30:02 content-migration pulpcore-worker-2: return future.result()
    Sep 21 15:30:02 content-migration pulpcore-worker-2: File "/usr/local/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/content.py", line 86, in create
    Sep 21 15:30:02 content-migration pulpcore-worker-2: await pipeline
    Sep 21 15:30:02 content-migration pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/api.py", line 225, in create_pipeline
    Sep 21 15:30:02 content-migration pulpcore-worker-2: await asyncio.gather(*futures)
    Sep 21 15:30:02 content-migration pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/api.py", line 43, in __call__
    Sep 21 15:30:02 content-migration pulpcore-worker-2: await self.run()
    Sep 21 15:30:02 content-migration pulpcore-worker-2: File "/usr/local/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/content.py", line 191, in run
    Sep 21 15:30:02 content-migration pulpcore-worker-2: self.migrate_to_pulp3(cmodel, ctype)
    Sep 21 15:30:02 content-migration pulpcore-worker-2: File "/usr/local/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/content.py", line 376, in migrate_to_pulp3
    Sep 21 15:30:02 content-migration pulpcore-worker-2: downloaded=pulp2content.downloaded
    Sep 21 15:30:02 content-migration pulpcore-worker-2: File "/usr/local/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/content.py", line 147, in create_artifact
    Sep 21 15:30:02 content-migration pulpcore-worker-2: expected_size=expected_size)
    Sep 21 15:30:02 content-migration pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulpcore/app/models/content.py", line 231, in init_and_validate
    Sep 21 15:30:02 content-migration pulpcore-worker-2: with open(file, "rb") as f:
    Sep 21 15:30:02 content-migration pulpcore-worker-2: FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/pulp/content/units/rpm/b1/4d69e4e6065886d8ed2c9976782e02113ce7146cc54c7f7ceece50eb5f31e5/libsss_certmap-devel-1.16.4-37.el7.s390.rpm'

Associated revisions

Revision 3b61fdf4 View on GitHub
Added by ttereshc 5 months ago

Add an option to skip corrupted or missing Pulp 2 content.

closes #7538 https://pulp.plan.io/issues/7538

Revision 3b61fdf4 View on GitHub
Added by ttereshc 5 months ago

Add an option to skip corrupted or missing Pulp 2 content.

closes #7538 https://pulp.plan.io/issues/7538

Revision 3b61fdf4 View on GitHub
Added by ttereshc 5 months ago

Add an option to skip corrupted or missing Pulp 2 content.

closes #7538 https://pulp.plan.io/issues/7538

History

#1 Updated by ttereshc 7 months ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 82

#2 Updated by ttereshc 7 months ago

  • Tracker changed from Issue to Story
  • Subject changed from during migration, provide option to log and skip missing or corrupt files during artifcate creation to As a user, I can provide an option to log and skip missing or corrupt files during artifact creation
  • % Done set to 0
  • Severity deleted (2. Medium)
  • Triaged deleted (Yes)

#3 Updated by rchan 7 months ago

  • Sprint changed from Sprint 82 to Sprint 83

#4 Updated by ttereshc 6 months ago

This should cover every single content we rely on for migration to be present on a disk.

  • RPM/SRPM (if downloaded=True)
  • .treeinfo file for DistributionTrees
  • module and module-default snippets (I think we use files with snippets to migrate it)
  • yum_repo_metadata_file

Feel free to update this list if anyone recalls more.

#5 Updated by rchan 6 months ago

  • Sprint changed from Sprint 83 to Sprint 84

#6 Updated by rchan 6 months ago

  • Sprint changed from Sprint 84 to Sprint 85

#7 Updated by ipanova@redhat.com 6 months ago

ttereshc wrote:

This should cover every single content we rely on for migration to be present on a disk.

  • RPM/SRPM (if downloaded=True)
  • .treeinfo file for DistributionTrees
  • module and module-default snippets (I think we use files with snippets to migrate it)

Feel free to update this list if anyone recalls more.

I think it might get a bit more complicated than this. We need to think how to handle content that has relations. In Pulp3 modulemd has relation to modular rpm, if we skip a corrupted rpm it would lead to a migrated modulemd that does not have some rpms. Same for the other content that has relations - for example docker image. It is composed out of blobs and manifest, and if one of the blobs is corrupted we risk migrating a partial image and while partial modulemd can still be usable to certain extent, that's not the case for the docker image. Should we then skip entirely migration of that image? or we migrate the way it is, issue a warning and ask user to resync repositories so missing data will be fetched.

#8 Updated by ttereshc 6 months ago

That's a valid concern.

I see 2 options:

  • we skip only content which is safe to skip, e.g. non-modular rpms, distribution trees, etc
  • in theory, we can skip all corrupted content and after the migration identify and remove incomplete content, e.g. modules with only some rpms migrated, docker images with some blobs missing, etc.

I'm definitely voting for the former. The latter seems to me complicated and error-prone.

@jsherrill , do you have any thoughts about ipanova's concern and the options I listed?

Any other ideas?

#9 Updated by ipanova@redhat.com 6 months ago

ttereshc wrote:

That's a valid concern.

I see 2 options:

  • we skip only content which is safe to skip, e.g. non-modular rpms, distribution trees, etc
  • in theory, we can skip all corrupted content and after the migration identify and remove incomplete content, e.g. modules with only some rpms migrated, docker images with some blobs missing, etc.

I'm definitely voting for the former. The latter seems to me complicated and error-prone.

I would 100% not go with the second option, it will be hell to identify missing blobs, up to parsing each manifest json file and compare that data to the number of relations. Option 1 sounds viable.

@jsherrill , do you have any thoughts about ipanova's concern and the options I listed?

Any other ideas?

#10 Updated by ttereshc 5 months ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ttereshc

#11 Updated by ttereshc 5 months ago

just adding here not to forget what was discussed

[17:23:16] <ttereshc> jsherrill, is it ok if we skip only some content and some is not possible to skip https://pulp.plan.io/issues/7538#note-8? E.g. we can skip regular corrupted/unavailable rpms but we don't want to skip modular ones because we'll have issues with module integrity in pulp3
[17:30:40] <jsherrill> ttereshc: assuming this is a repository with a url its syncing from (and for modular repos i think thats probably a given), and that next sync should fetch the missing items
[17:30:42] <jsherrill> i think that is perfectly fine
[17:30:59] <jsherrill> ttereshc: but i guess you're saying to error
[17:31:05] <jsherrill> instead of skipping
[[17:32:30] <ttereshc> jsherrill, yes, I want to error because client won't work properly right after the migration
[17:32:53] <ttereshc> next sync might fix it if url is a working one
[17:33:21] <ttereshc> it's a good test case to have
[17:34:30] <ttereshc> I'm also worrying that it might be hard to troubleshoot customer cases related to that
[17:35:26] <jsherrill> yeah, and what are they supposed to do if that content isn't available upstream anymore?
[17:35:55] <ttereshc> they should cleanup their pulp2
[17:36:21] <ttereshc> if upstream url is unavailable, remove that module, it's not working properly anyway
[17:37:35] <jsherrill> i'm not sure we give the user the ability to remove a module
[17:37:39] <jsherrill> let me see...
[17:38:35] <ttereshc> jsherrill, what about pulp3? if we migrate a broken module, will they be able to get rid of it in pulp 3?
[17:40:31] <jsherrill> i'll have to check, but i don't think we let you delete a module
[17:40:38] <jsherrill> we don't let you delete anything manually for RH repos
[17:40:51] <jsherrill> but you can use mirror on sync/remove missing to 'sync it up' with the upstream repo
[17:41:17] <ttereshc> jsherrill, ipanova had the same concerns for the docker content, for the corrupted blobs
[17:42:10] <ttereshc> we should either fail or not migrate the rest of the image
[17:47:10] <jsherrill> i think not migrating the rest of the image is fine
[17:47:24] <jsherrill> i'm okay with broken stuff not ended up in the resulting migrated repo
[17:48:06] <ttereshc> jsherrill, so basically you are against any failure which would require user actions on pulp 2 side, correct?
[17:49:13] <jsherrill> i guess, maybe we should give them the option
[17:49:28] <jsherrill> initial run would check all checksums and fail
[17:49:44] <jsherrill> and then they can fix it or they can just 'skip' failed/missing content
[17:50:01] <jsherrill> as a user, really all i'm going to do is try to go and remove all that content anyways
[17:50:05] <jsherrill> i would imagine
[17:50:09] <jsherrill> so if the migration can just ignore it
[17:50:15] <jsherrill> that saves a lot of effort ?
[17:51:40] <ttereshc> it's not easy to ignore corrupted content because of relations between different types and due to perf reasons all the operations being in bulk fashion
[17:51:53] <ttereshc> I'll see what I can do
[17:52:03] <ttereshc> thanks for the feedback, it's helpful
[17:59:51] <jsherrill> ttereshc: yeah, if its not possible or really really hard, we can maybe brainstorm alternative options 
[17:59:57] <jsherrill> (such as katello zapping all the content in pulp2)
[18:00:11] <jsherrill> if we know which content is corrupt
[18:16:16] <ttereshc> ty

#12 Updated by rchan 5 months ago

  • Sprint changed from Sprint 85 to Sprint 86

#13 Updated by ttereshc 5 months ago

Modular RPMs and Container Blobs if skipped, can be fixed with a subsequent sync, the relations to other content is created correctly as well. So we introduce no special handling here, just skipping the migration of corrupted content entirely.

#14 Updated by ttereshc 5 months ago

  • Status changed from ASSIGNED to MODIFIED
  • % Done changed from 0 to 100

#15 Updated by ttereshc 5 months ago

  • Sprint/Milestone set to 0.6.0

#16 Updated by ttereshc 4 months ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Please register to edit this issue

Also available in: Atom PDF