Issue #9395
closedRemoteArtifacts are not being saved properly
Description
This is the result of a long discussion on the Katello forums: https://community.theforeman.org/t/katello-4-1-2-1-404-error-through-content-proxy-due-to-incorrect-location-href/24812/26?u=dralley
TL;DR if you sync a repository on-demand multiple times against different repos, only the first set of RemoteArtifacts is saved. If the layout of the repository changes or the repository disappears, all of these URLs get broken, and the RemoteArtifacts are thus broken even if they were supposed to have multiple different potential sources.
I've confirmed this by syncing a single repository, changing the layout of that repository, and resyncing. Only the original RemoteArtifact will exist and the new ones will not. A script is attached to demonstrate this (if you look in the DB afterwards)
This is an especially severe issue because "metadata mirroring" and standard syncs have entirely different layouts, so re-publishing a mirrored repository or mirroring it after having otherwise not been doing so results in broken repositories due to all of the URLs changing.
Files
Related issues
Updated by dalley over 3 years ago
- File no_new_remoteartifact.py no_new_remoteartifact.py added
- Description updated (diff)
Updated by dkliban@redhat.com over 3 years ago
- Triaged changed from No to Yes
- Sprint set to Sprint 105
Updated by dalley over 3 years ago
More context. The way the uniqueness constraint for RemoteArtifact is set up precludes us from having more than one RemoteArtifact saved for any given remote.
https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/content.py#L652
This means that if the remote URL is changed, but the "relative path" stays the same, then new RemoteArtifacts cannot be created due to this constraint. And as mentioned above this is a bigger problem for the RPM plugin which uses the filename as the relative_path, and the layout of the repository can potentially change on you.
There are really only two potential solutions for this:
- Allow the RemoteArtifactSaver stage to update the URLs of existing RemoteArtifacts to match the current URL of the remote during a resync
- This feels like the more correct option and is not too difficult to implement
- Relax the uniqueness constraint to allow more than one RemoteArtifact to be stored for a given remote e.g. change the constraint to ("content_artifact", "url", "remote").
- This would probably not be backportable due to the migration.
Updated by pulpbot over 3 years ago
- Status changed from ASSIGNED to POST
Updated by dalley over 3 years ago
- Copied to Backport #9400: Backport #9395 "RemoteArtifacts are not being saved properly" to 3.14.z added
Added by dalley over 3 years ago
Updated by dalley over 3 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulpcore|489156ebed3d6e6a83b8442827ba6f2560ef40b8.
Updated by pulpbot over 3 years ago
Updated by pulpbot over 3 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Update remote artifact urls on sync if the remote or repo changes
closes: #9395 https://pulp.plan.io/issues/9395