Story #7252: As a plugin writer, I have a way to map Content to Repositories in Pulp exports - Pulp

Actions

Send by e-mail Copy link

Story #7252

closed

Story #6134: [EPIC] Pulp import/export

As a plugin writer, I have a way to map Content to Repositories in Pulp exports

Added by daviddavis over 4 years ago. Updated over 4 years ago.

Status:

CLOSED - CURRENTRELEASE

Priority:

Normal

Assignee:

daviddavis

Category:

Sprint/Milestone:

3.6.0

Start date:

Due date:

% Done:

100%

Estimated time:

Platform Release:

Groomed:

Yes

Sprint Candidate:

Tags:

Sprint:

Sprint 79

Quarter:

Description

Problem¶

Currently, the Pulp import/export code divides up Content (and its related information) into RepositoryVersion folders when exporting. At import time, a new task is launched for each folder to import this Content and automatically create a new version of the Repository with all the Content that gets imported[0].

The problem is that DistributionTree Repositories have Content that is not directly tied to the new RepositoryVersion. Instead, this Content belongs to subrepos.

As pulpcore handles the association of Content to Repositories, pulpcore needs to provide a way for plugin writers to be able to map Content to particular Repositories.

Solutions¶

The naïve solution would be to export RepositoryContent. But this is somewhat problematic as RepositoryContent has no helpful information: it has a repository id (instead of the repo natural key), a content id (another non-natural key), and version information (which probably won't match the downstream).

Option 1¶

Create a json dump that matches repository name to a list of content upstream_ids. After the content is imported, the upstream_ids should match the upstream pulp instance and the import code could simply go through and create a new repo version for each repository name. If there are no changes, Pulp will automatically not create a new repo version (which is the current behavior).

Option 2¶

Add repository name to each exported Content unit. This seems like a simple solution but its implementation is a bit more complex. There are a few obstacles like figuring out how to handle Content units that might belong to multiple Repositories (do we export the Content multiple times?). Also we'd have to tell the Content model resource to populate the field but ignore it during runtime and then extract this mapping of Repository to Content to build our new repo versions.

[0] https://github.com/pulp/pulpcore/blob/7157e4452bb3e1c480217a037829c34dcbf5bc75/pulpcore/app/tasks/importer.py#L135-L150

Actions

Copy link

Updated by daviddavis over 4 years ago

Talking this over with ttereshc some and this will probably cause the size of the exports to increase. We should try to make this change optional and only for repos that have subrepos.

Actions

Copy link

Updated by ggainey over 4 years ago

Status changed from NEW to ASSIGNED
Assignee set to daviddavis
Groomed changed from No to Yes
Sprint set to Sprint 78

David and I have spent a number of sessions talking this over. The takeaway for me is, Option-1 above is the most reliable/straightforward way to solve this problem.

We already have the upstream_id available at Content-level, it exists specifically to handle the relinking case where we don't have a 'natural key' we can rely on. If we take into the design the observation made by ttereshc that we only want to do this when we know there are subrepos involved (as opposed to 'for every repo', when for most we don't need this) we can be more efficient with our disk-usage and in our execution-time at import.

We've struggled to find a cleaner way to rebuild DistributionTrees on import. The real-world of kickstarts is moderately ugly; it is, I suppose, only appropriate that dealing with them be a little less than elegant as well.

Actions

Copy link