Project

Profile

Help

Issue #9395

closed

RemoteArtifacts are not being saved properly

Added by dalley over 2 years ago. Updated over 2 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
4. Urgent
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 105
Quarter:

Description

This is the result of a long discussion on the Katello forums: https://community.theforeman.org/t/katello-4-1-2-1-404-error-through-content-proxy-due-to-incorrect-location-href/24812/26?u=dralley

TL;DR if you sync a repository on-demand multiple times against different repos, only the first set of RemoteArtifacts is saved. If the layout of the repository changes or the repository disappears, all of these URLs get broken, and the RemoteArtifacts are thus broken even if they were supposed to have multiple different potential sources.

I've confirmed this by syncing a single repository, changing the layout of that repository, and resyncing. Only the original RemoteArtifact will exist and the new ones will not. A script is attached to demonstrate this (if you look in the DB afterwards)

This is an especially severe issue because "metadata mirroring" and standard syncs have entirely different layouts, so re-publishing a mirrored repository or mirroring it after having otherwise not been doing so results in broken repositories due to all of the URLs changing.


Files

no_new_remoteartifact.py (2.43 KB) no_new_remoteartifact.py dalley, 09/14/2021 05:33 AM

Related issues

Copied to Pulp - Backport #9400: Backport #9395 "RemoteArtifacts are not being saved properly" to 3.14.zCLOSED - CURRENTRELEASEdalley

Actions
Actions #1

Updated by dalley over 2 years ago

Actions #2

Updated by dalley over 2 years ago

  • Description updated (diff)
Actions #3

Updated by dalley over 2 years ago

  • Description updated (diff)
Actions #4

Updated by dalley over 2 years ago

  • Description updated (diff)
Actions #5

Updated by dkliban@redhat.com over 2 years ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 105
Actions #6

Updated by dalley over 2 years ago

More context. The way the uniqueness constraint for RemoteArtifact is set up precludes us from having more than one RemoteArtifact saved for any given remote.

https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/content.py#L652

This means that if the remote URL is changed, but the "relative path" stays the same, then new RemoteArtifacts cannot be created due to this constraint. And as mentioned above this is a bigger problem for the RPM plugin which uses the filename as the relative_path, and the layout of the repository can potentially change on you.

There are really only two potential solutions for this:

  • Allow the RemoteArtifactSaver stage to update the URLs of existing RemoteArtifacts to match the current URL of the remote during a resync
    • This feels like the more correct option and is not too difficult to implement
  • Relax the uniqueness constraint to allow more than one RemoteArtifact to be stored for a given remote e.g. change the constraint to ("content_artifact", "url", "remote").
    • This would probably not be backportable due to the migration.
Actions #7

Updated by pulpbot over 2 years ago

  • Status changed from ASSIGNED to POST
Actions #9

Updated by dalley over 2 years ago

  • Sprint/Milestone set to 3.16.0
Actions #10

Updated by dalley over 2 years ago

  • Copied to Backport #9400: Backport #9395 "RemoteArtifacts are not being saved properly" to 3.14.z added

Added by dalley over 2 years ago

Revision 489156eb | View on GitHub

Update remote artifact urls on sync if the remote or repo changes

closes: #9395 https://pulp.plan.io/issues/9395

Actions #11

Updated by dalley over 2 years ago

  • Status changed from POST to MODIFIED
Actions #13

Updated by pulpbot over 2 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF