Project

Profile

Help

Refactor #3341

Remove the FK from publishers/importers to repos

Added by daviddavis almost 3 years ago. Updated about 1 year ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
Yes
Sprint Candidate:
No
Tags:
Sprint:
Sprint 32
Quarter:

Associated revisions

Revision 7e10351b View on GitHub
Added by daviddavis almost 3 years ago

Remove relationship between publishers/importers and repos

closes #3341 https://pulp.plan.io/issues/3341

Revision 7e10351b View on GitHub
Added by daviddavis almost 3 years ago

Remove relationship between publishers/importers and repos

closes #3341 https://pulp.plan.io/issues/3341

History

#2 Updated by daviddavis almost 3 years ago

A few todo items:

  • We need an association from RemoteArtifact to RepositoryVersion now.
  • Distribution needs a repository association. Since publishers are shared across repos, we don't want to auto-distribute different repos the same location on the filesystem.
  • Consider adding back ContentAdaptor

Also, looks like pulp_file needs the auto-distribute code. Might have to look in git history to find it.

Edit: Found distribution code - https://git.io/vNFiH

#3 Updated by amacdona@redhat.com almost 3 years ago

How would the following situation be handled by the proposed design?

Repo A is synced (deferred downloading) from external source X, creating Repo Av1
Repo B is synced (deferred downloading) from external source X, creating Repo Bv1

Each repo is published.

Before this story, RemoteArtifacts were associated with the Importer. I think this actually makes the most sense conceptually, because the RemoteArtifact that lives on the "remote source" that is represented by the Importer. In the situation above, Av1 and Bv1 would have been synced using the same Importer, which would have allowed them to reference a single set of RemoteArtifacts. The current design suggests that RemoteArtifacts will be associated to a RepositoryVersion.

Problem: If RemoteArtifacts are associated to a RepositoryVersion, we lose the deduplication feature. Lets say a client tries to fetch PackageZ from Av1. It was deferred downloading, so the streamer does its magic and PackageZ is added to Pulp. Now a client tries to fetch PackageZ from Bv2, which references different RemoteArtifacts that are part of the same ContentUnit. The artifact will be streamed again.

AFAICT, this means that for deferred repositories, once they are published, there is no deduplication of RemoteArtifacts.

#4 Updated by amacdona@redhat.com almost 3 years ago

To take it a step farther, if we associate with RepositoryVersion, we could have the same problem with multiple versions of a single repo.

Repo A is synced (deferred) 3 times forming 3 versions that are mostly the same, but with some changes in each.
Av3 -> dev
Av2 -> testing
Av1 -> production

All 3 are published. Given that they share most content, it would be expected that after users who stream a pacakage from production, dev and testing copies of the same content would not need to be downloaded again.

#5 Updated by daviddavis almost 3 years ago

Talked with @asmacdo and @jortel about the above question @asmacdo asked. The artifact should only get streamed once because once the content_arifact has an artifact, it will use it instead of the artifact. Moreover, a publication points to the content_artifact and not a remote artifact. So duplication of remote_artifacts is ok.

We also talked about joining remote_artifact to repository_content instead of repository_version. The benefit here is that repository_version can be looked up from repository_content and when a repository_version is deleted, we don't have to worry about re-associating remote_artifacts with a newer (aka next) repository_version. Going to make that change.

#6 Updated by jortel@redhat.com almost 3 years ago

wrote:

A few todo items:

  • We need an association from RemoteArtifact to RepositoryVersion now.
  • Distribution needs a repository association. Since publishers are shared across repos, we don't want to auto-distribute different repos the same location on the filesystem.
  • Consider adding back ContentAdaptor

Also, looks like pulp_file needs the auto-distribute code. Might have to look in git history to find it.

Auto distribution is taken care of by the Publication context manager.

Edit: Found distribution code - https://git.io/vNFiH

#7 Updated by jortel@redhat.com almost 3 years ago

  • Groomed changed from No to Yes

#8 Updated by jortel@redhat.com almost 3 years ago

  • Sprint/Milestone set to 54

#9 Updated by daviddavis almost 3 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

#10 Updated by bmbouter almost 3 years ago

  • Sprint set to Sprint 32

#11 Updated by bmbouter almost 3 years ago

  • Sprint/Milestone deleted (54)

#12 Updated by dkliban@redhat.com almost 3 years ago

  • Sprint/Milestone set to 3.0.0

#13 Updated by bmbouter over 1 year ago

  • Tags deleted (Pulp 3, Pulp 3 MVP)

#14 Updated by bmbouter about 1 year ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Please register to edit this issue

Also available in: Atom PDF