Story #3894
Updated by dkliban@redhat.com almost 6 years ago
h2. Problem Lazy sync requires the plugin to create *Content, ContentArtifacts, and RemoteArtifact objects for every piece of content discovered at the remote. This kind of indexing is resource utilization prohibitive for repositories with millions of artifacts. h2. Solution h5. Data model change Make content indexing optional by enabling users to configure a Distribution with a Remote. This would be a new relation field added to "Distribution" and called "'remote'". "fallback_remote". h5. Integration with the Content serving app and Streamer The content app would then do the following to find an artifact to serve to the user: 1. Match path to a distribution. 2. Try to find an artifact by looking at the Publication associated with the Distribution. 3. If an Artifact is found, send the content of it as a response. 4. If an Artifact is not found and a RemoteArtifact is found, fetch the artifact using the 'remote' and stream the response back redirect to the user. streamer. 5. If remote's policy is 'on_demand' or 'immediate', create an Artifact. 6. If neither an Artifact nor a RemoteArtifact is found, check if there is a 'remote' fallback_remote associated with the Distribution. 7. 6. If a 'remote' fallback_remote is not set for the Distribution, return a 404. 8. 7. If a 'remote' fallback_remote is associated, fetch redirect to the artifact using streamer. The streamer would then do the 'remote' and stream following to find the response back artifact to serve to the user. user: 1. Match path to a distribution. 9. Create 2. Check if fallback_remote is set for the Distribution. 3. If fallback_remote is not set, find the RemoteArtifact by looking at the Publication associated with the Distribution. 4. If fallback_remote is set, concatenate the Remote's URL with the relative path requested and try to find a RemoteArtifact. matching RemoteArtifact with that URL. 10. 5. If remote's policy a RemoteArtifact is 'on_demand' or 'immediate', create found, check if it has an Artifact. Artifact associated with it. 6. If an Artifact is associated with the RemoteArtifact, return the content of the Artifact as the response. 7. If a RemoteArtifact is not found and fallback_remote is set, request the artifact from squid using the URL. 8. If a RemoteArtifact is not found and fallback_remote is NOT set, return a 404. h5. Example use case with Maven Maven central hosts millions of Maven Artifacts. It's hosted at https://repo1.maven.org/maven2/. The user will take the following steps to create pass-through cache of Maven Central in Pulp. 1. Create a Maven Remote that points to "https://repo1.maven.org/maven2/". 2. Create a Distribution that has ''remote'' 'fallback_remote' set to the remote created in step 1 and relative path set to 'foo'. When mvn is building a project and requests an artifact from Pulp at "http://hostname/pulp/content/foo/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar", the Content app is going to ask the streamer to fetch the artifact from "https://repo1.maven.org/maven2/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar" and then create an Artifact and a RemoteArtifact for this artifact. The next time the Content app should be able to locate the Artifact via the remote artifact's URL.