Project

Profile

Help

Story #3894

Story #3693: Lazy for Pulp3

As a user, I can use Pulp as a pass-through cache

Added by dkliban@redhat.com about 1 year ago. Updated 6 months ago.

Status:
MODIFIED
Priority:
Normal
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

0%

Platform Release:
Blocks Release:
Backwards Incompatible:
No
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 50

Description

Problem

Lazy sync requires the plugin to create *Content, ContentArtifacts, and RemoteArtifact objects for every piece of content discovered at the remote. This kind of indexing is resource utilization prohibitive for repositories with millions of artifacts.

Solution

Data model change

Make content indexing optional by enabling users to configure a Distribution with a Remote. This would be a new relation field added to "Distribution" and called "'remote'".

Integration with the Content serving app and Streamer

The content app would then do the following to find an artifact to serve to the user:

1. Match path to a distribution.
2. Try to find an artifact by looking at the Publication associated with the Distribution.
3. If an Artifact is found, send the content of it as a response.
4. If an Artifact is not found and a RemoteArtifact is found, fetch the artifact using the 'remote' and stream the response back to the user.
5. If remote's policy is 'on_demand' or 'immediate', create an Artifact.
6. If neither an Artifact nor a RemoteArtifact is found, check if there is a 'remote' associated with the Distribution.
7. If a 'remote' is not set for the Distribution, return a 404.
8. If a 'remote' is associated, fetch the artifact using the 'remote' and stream the response back to the user.
9. Create a RemoteArtifact.
10. If remote's policy is 'on_demand' or 'immediate', create an Artifact.

Example use case with Maven

Maven central hosts millions of Maven Artifacts. It's hosted at https://repo1.maven.org/maven2/. The user will take the following steps to create pass-through cache of Maven Central in Pulp.

1. Create a Maven Remote that points to "https://repo1.maven.org/maven2/".
2. Create a Distribution that has ''remote'' set to the remote created in step 1 and relative path set to 'foo'.

When mvn is building a project and requests an artifact from Pulp at "http://hostname/pulp/content/foo/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar", the Content app is going to ask the streamer to fetch the artifact from "https://repo1.maven.org/maven2/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar" and then create an Artifact and a RemoteArtifact for this artifact. The next time the Content app should be able to locate the Artifact via the remote artifact's URL.


Related issues

Blocks Maven Plugin - Story #4183: As a user, I can use Pulp as a pull through cache for Maven Central MODIFIED Actions

Associated revisions

Revision 22266783 View on GitHub
Added by dkliban@redhat.com 8 months ago

Problem: plugin API missing pull-through cache interfaces

Solution: add interfaces that enable pull-through cache for plugins

This patch extends the Remote API. It adds an interface for determining the full URL of a RemoteArtifact
by examining the relative path and the Remote's url. It also adds an interface for determining the type of
content that should be available at a relative path of a remote. These interfaces are used by the content
app when serving content in the pull-through cache mode. The content app needs to determine what URL is
actually being requested and what kind of content is being requested by the client.

This patch also extends the Content API. It adds a new constructor method to the Content model. This new
interface is a constructor that take an Artifact and a relative path to produce an unsaved instance of a
specific content type.

Plugin writers are expected to provide specific implementations of these methods.

re: #3894
https://pulp.plan.io/issues/3894

Revision 1558cf46 View on GitHub
Added by dkliban@redhat.com 8 months ago

Problem: pulpcore does not have pull-through cache feature

Solution: extend the content app to support pull-through cache

This patch enables the content app handler to look for a remote on a distribution. When a remote
is associated with a distribution, that remote is used to download content missing from Pulp.

re: #3894
https://pulp.plan.io/issues/3894

History

#1 Updated by dkliban@redhat.com about 1 year ago

  • Description updated (diff)
  • Sprint Candidate changed from No to Yes
  • Tags Pulp 3 added

#2 Updated by dkliban@redhat.com about 1 year ago

  • Description updated (diff)

#3 Updated by bmbouter about 1 year ago

  • Description updated (diff)

I think the name of the user facing feature specifically would improve this story. I put a place for it in all caps, but I'm not sure what to call it.

#4 Updated by dkliban@redhat.com about 1 year ago

  • Description updated (diff)

Let's call it 'fallback_remote' for now.

#5 Updated by dkliban@redhat.com about 1 year ago

  • Description updated (diff)
  • Parent task set to #3693

#6 Updated by dkliban@redhat.com about 1 year ago

  • Description updated (diff)

#7 Updated by jortel@redhat.com about 1 year ago

An optional association between the Distribution and a Remote seems reasonable. I think the fallback prefix is unnecessary and a little confusing. Let's just name it: Distribution.remote and document as:

"The distribution may be optionally associated with a Remote to support Passthru Caching."

The Passthru Caching concept/feature can be explained in detail elsewhere and linked.

Looking at the description, I think the design can be described more concisely by describing only the changes instead of all of the logic steps. (I don't think the steps are exactly correct anyway).

@dkliban, what do you think of:

Add Distribution.remote as (blank=True, null=True, db_index=True, on_delete=SET_NULL)

The content application will redirect to the streamer when: the publication cannot be matched; neither a published-artifact or published-metadata can be matched; redirect is enabled; Distribution.remote (is set).

The streamer will get the downloader from the remote directly when: the publication cannot be matched; neither a published-artifact or published-metadata can be matched; Distribution.remote (is set). Else, 404.

#8 Updated by amacdona@redhat.com about 1 year ago

  • Sprint Candidate changed from Yes to No

#9 Updated by dkliban@redhat.com 11 months ago

  • Blocks Story #4183: As a user, I can use Pulp as a pull through cache for Maven Central added

#10 Updated by dkliban@redhat.com 10 months ago

  • Description updated (diff)

#11 Updated by jortel@redhat.com 9 months ago

  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes

#12 Updated by dkliban@redhat.com 9 months ago

  • Sprint set to Sprint 47

#13 Updated by rchan 9 months ago

  • Sprint changed from Sprint 47 to Sprint 48

#14 Updated by dkliban@redhat.com 9 months ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dkliban@redhat.com

#15 Updated by rchan 8 months ago

  • Sprint changed from Sprint 48 to Sprint 49

#16 Updated by jortel@redhat.com 8 months ago

The content-app should delegate content creation to the plugin completely. I would expect a single new method on Remote.

#17 Updated by dkliban@redhat.com 8 months ago

  • Tags Pulp 3 RC Blocker added

#19 Updated by rchan 8 months ago

  • Sprint changed from Sprint 49 to Sprint 50

#20 Updated by dkliban@redhat.com 7 months ago

  • Status changed from POST to MODIFIED

#21 Updated by daviddavis 6 months ago

  • Sprint/Milestone set to 3.0

#22 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3, Pulp 3 RC Blocker)

Please register to edit this issue

Also available in: Atom PDF