Story #6134: [EPIC] Pulp import/export
As a plugin writer, I have a way to map Content to Repositories in Pulp exports
Currently, the Pulp import/export code divides up Content (and its related information) into RepositoryVersion folders when exporting. At import time, a new task is launched for each folder to import this Content and automatically create a new version of the Repository with all the Content that gets imported.
The problem is that DistributionTree Repositories have Content that is not directly tied to the new RepositoryVersion. Instead, this Content belongs to subrepos.
As pulpcore handles the association of Content to Repositories, pulpcore needs to provide a way for plugin writers to be able to map Content to particular Repositories.
The naïve solution would be to export RepositoryContent. But this is somewhat problematic as RepositoryContent has no helpful information: it has a repository id (instead of the repo natural key), a content id (another non-natural key), and version information (which probably won't match the downstream).
Create a json dump that matches repository name to a list of content upstream_ids. After the content is imported, the upstream_ids should match the upstream pulp instance and the import code could simply go through and create a new repo version for each repository name. If there are no changes, Pulp will automatically not create a new repo version (which is the current behavior).
Add repository name to each exported Content unit. This seems like a simple solution but its implementation is a bit more complex. There are a few obstacles like figuring out how to handle Content units that might belong to multiple Repositories (do we export the Content multiple times?). Also we'd have to tell the Content model resource to populate the field but ignore it during runtime and then extract this mapping of Repository to Content to build our new repo versions.
Updated by ggainey about 2 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to daviddavis
- Groomed changed from No to Yes
- Sprint set to Sprint 78
David and I have spent a number of sessions talking this over. The takeaway for me is, Option-1 above is the most reliable/straightforward way to solve this problem.
We already have the upstream_id available at Content-level, it exists specifically to handle the relinking case where we don't have a 'natural key' we can rely on. If we take into the design the observation made by ttereshc that we only want to do this when we know there are subrepos involved (as opposed to 'for every repo', when for most we don't need this) we can be more efficient with our disk-usage and in our execution-time at import.
We've struggled to find a cleaner way to rebuild DistributionTrees on import. The real-world of kickstarts is moderately ugly; it is, I suppose, only appropriate that dealing with them be a little less than elegant as well.