Project

Profile

Help

Issue #8305

closed

Deleting a remote used as source for live content corrupts ContentArtifact records

Added by dalley about 3 years ago. Updated about 2 years ago.

Status:
CLOSED - DUPLICATE
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 112
Quarter:

Description

Ticket moved to GitHub: "pulp/pulpcore/1975":https://github.com/pulp/pulpcore/issues/1975


  • Create a repository and on_demand remote, and sync them.
  • Delete the remote

The deletion of the Remote deletes the RemoteArtifacts, leaving behind ContentArtifact attached to neither Artifacts nor Remotes, making them effectively corrupted and unpublishable.

# create repository, remote
remote = remote_api.create(gen_file_remote(policy='on_demand'))
repo = repo_api.create(gen_repo())

# sync the repository
repository_sync_data = RepositorySyncURL(remote=remote.pulp_href)
sync_response = repo_api.sync(repo.pulp_href, repository_sync_data)
task = monitor_task(sync_response.task)

# delete the remote
monitor_task(remote_api.delete(remote.pulp_href).task)

# ^---- problem occurs here, now RemoteArtifacts deleted, now ContentArtifact is broken

publish_response = publications_api.create({"repository_version": task.created_resources[0]})
monitor_task(publish_response.task)  # boom publish failure

This is more pernicious because content units can move throughout repositories, and if the remote is ever deleted, every repo can be broken at once with no safeguards.


Files

reproduce_publish_error.py (1.28 KB) reproduce_publish_error.py dalley, 02/25/2021 08:58 PM

Related issues

Related to Pulp - Issue #9101: Content_artifact is not updatedCLOSED - CURRENTRELEASEgerrodActions
Has duplicate Ansible Plugin - Issue #7924: Sync doesn't create RemoteArtifactsCLOSED - DUPLICATEActions
Actions #1

Updated by dalley about 3 years ago

We should probably introduce an actual constraint that enforces this, so it blows up immediately if it ever occurs.

Actions #2

Updated by dalley about 3 years ago

  • File reproduce_publish_error.py reproduce_publish_error.py added
  • File deleted (reproduce_publish_error.py)
  • Subject changed from Some sequences of events can result in invalid ContentArtifact records to Deleting a remote used as source for live content corrupts ContentArtifact records
  • Description updated (diff)
Actions #3

Updated by dalley about 3 years ago

  • Priority changed from Normal to High
Actions #4

Updated by dalley about 3 years ago

  • Description updated (diff)
Actions #5

Updated by dalley about 3 years ago

Recreating the remote and re-syncing does fix the content artifacts, but if you don't notice the problem immediately, or if you copied the content around between repos, it would be practically impossible to know how to resolve the issue.

There's some discussion on solutions here: https://hackmd.io/_3gsUVdyQwy50Nc5pMXN-g

Actions #6

Updated by fao89 about 3 years ago

  • Triaged changed from No to Yes
Actions #7

Updated by mdellweg about 3 years ago

Idea to solve the problem: Add a force flag to the DELETE call. If force is not specified, and the remote is referenced by either a repository or a remote artifact, the call will fail.

Actions #8

Updated by dalley about 3 years ago

  • Priority changed from High to Normal
Actions #9

Updated by dalley almost 3 years ago

  • Related to Issue #7924: Sync doesn't create RemoteArtifacts added
Actions #10

Updated by bmbouter almost 3 years ago

  • Related to deleted (Issue #7924: Sync doesn't create RemoteArtifacts)
Actions #11

Updated by bmbouter almost 3 years ago

  • Has duplicate Issue #7924: Sync doesn't create RemoteArtifacts added
Actions #12

Updated by dalley over 2 years ago

  • Sprint set to Sprint 100
Actions #13

Updated by ggainey over 2 years ago

I'd go for 1c) from the associated hackmd. I wouldn't even give the option of --force - the state your pulp-instance gets left in is pretty horrific, even if you meant to do it. Having some way to list the content/repo-versions/artifacts, or a list of "do an immediate sync on the following repos before attempting" would be great.

Actions #14

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 100 to Sprint 101
Actions #15

Updated by ipanova@redhat.com over 2 years ago

during review of a PR there has been raised an idea about what to do with the content that does not have RA not Artifact

Do you think it would be reasonable to prevent content with no Artifact and no RemoteArtifact from being added to new repository versions via one of the validation hooks? We want to keep the historical records around, but the content is no longer useful or functional, so it doesn't make sense to let it spread into new repository versions.

EDIT: add a --force flag which can be specified when new repo-version is being created to allow such content

Actions #16

Updated by dalley over 2 years ago

More discussion from the PR


I have not manually tested rpm plugin yet, but skimmed through pulpcore code - plugins that use directly pulp's content app handler, should be able to gracefully handle this. Uploaded content will have artifact set to none as well it won't have any remoteartifacts so a 404 will just be raised https://github.com/pulp/pulpcore/blob/master/pulpcore/content/handler.py#L681. Since pulp-container has a subclassed version of handler, i needed to modify couple of lines https://gist.github.com/ipanova/bd5821b55a1e01245fe7556dc3791ddd which led to this output:

$ podman pull localhost:24817/test/repo --tls-verify=false
Trying to pull localhost:24817/test/repo:latest...
Error: Error initializing source docker://localhost:24817/test/repo:latest: Error reading manifest latest in localhost:24817/test/repo: StatusCode: 404, 404: Not Found
(pulp) [vagrant@pulp3-source-fedora34 ~]$ 

tldr, i think we're fine just need to audit plugins that subclass the Handler.

EDIT: some plugins contain content that is expected to always have artifact. We should not touch those content types during reclaim disk space. For example: rpm modules and defaults, container tags, manifests and config blobs. For the rest of the content types for which disk space was supposed to correctly reclaim the artifact, the code needs to be adjusted so it takes into account situation when ca.artifact is None content._artifacts.get() returns ObjectDoesNotExist

[ipanova@fluffy pulp_rpm]$ git grep '\._artifacts'
pulp_rpm/app/migrations/0003_DATA_incorrect_json.py:            modulemd_index.update_from_string(module._artifacts.first().file.read().decode(), True)
pulp_rpm/app/tasks/publishing.py:            mod_yml.write(modulemd._artifacts.get().file.read())
pulp_rpm/app/tasks/publishing.py:            mod_yml.write(default._artifacts.get().file.read())
[ipanova@fluffy pulp_rpm]$ 
[ipanova@fluffy pulp_rpm]$ 
[ipanova@fluffy pulp_rpm]$ cd ..
[ipanova@fluffy pulp3]$ cd pulp_container/
[ipanova@fluffy pulp_container]$ git grep '\._artifacts'
pulp_container/app/migrations/0007_clear_tags_artifacts_refs.py:        tag._artifacts.clear()
pulp_container/app/migrations/0007_clear_tags_artifacts_refs.py:            tag._artifacts.add(tag.tagged_manifest._artifacts.get())
pulp_container/app/redirects.py:            artifact = manifest._artifacts.get()
pulp_container/app/redirects.py:            artifact = blob._artifacts.get()
pulp_container/app/registry.py:            artifact = tag.tagged_manifest._artifacts.get()
pulp_container/app/registry_api.py:        artifact = manifest._artifacts.get()
pulp_container/app/registry_api.py:        artifact = blob._artifacts.get()
pulp_container/app/schema_convert.py:        config_artifact = manifest.config_blob._artifacts.get()
pulp_container/app/schema_convert.py:    manifest_artifact = manifest._artifacts.get()
pulp_container/app/tasks/sync_stages.py:            with man._artifacts.get().file.open() as content_file:
pulp_container/app/tasks/tag.py:    artifact = manifest._artifacts.all()[0]
[ipanova@fluffy pulp_container]$ 
Actions #17

Updated by ipanova@redhat.com over 2 years ago

  • Sprint changed from Sprint 101 to Sprint 102
Actions #18

Updated by dalley over 2 years ago

  • Related to Issue #9101: Content_artifact is not updated added
Actions #19

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 102 to Sprint 103
Actions #20

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 103 to Sprint 104
Actions #21

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 104 to Sprint 105
Actions #22

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 105 to Sprint 106
Actions #23

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 106 to Sprint 107
Actions #24

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 107 to Sprint 108
Actions #25

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 108 to Sprint 109
Actions #26

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 109 to Sprint 110
Actions #27

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 110 to Sprint 111
Actions #28

Updated by rchan about 2 years ago

  • Sprint changed from Sprint 111 to Sprint 112
Actions #29

Updated by pulpbot about 2 years ago

  • Description updated (diff)
  • Status changed from NEW to CLOSED - DUPLICATE

Also available in: Atom PDF