Story #3822
closedStory #3821: As a user, I can migrate from Pulp 2 to Pulp 3
As a user, I can migrate all content units of a specific type from Pulp 2 to Pulp 3
100%
Description
Description¶
Pulp 3 will provide a tool called pulp-2to3-migrate. This tool is going to be a Pulp 3 plugin. This tool is going to be modular. Some functionality will come from the core, but specific implementation for each content type will be provided by the plugins.
Users of this tool will be required to write a Migration Plan which is a JSON document that describes which content, repositories, importers, and distributors get migrated from Pulp 2 to Pulp 3. Users will be able to use Pulp 2 to generate a Migration plan for migrating everything.
Features¶
Pulp 2 will provide a REST API that returns a default Migration Plan.¶
This Migration Plan will migrate all content, all repositories, all importers, and all distributors.
Pulp 3 will provide a REST API that will accept two parameters: Migration Plan and concurrency.¶
This API will dispatch one more tasks to perform the migration. The default concurrency will be (Pulp 3 worker count) - 1. The concurrency will determine the number of tasks that will be dispatched in parallel. The tasking system will limit one migration "operation" at a time, although many tasks may run concurrently for that operation.
Content Migration implementation details¶
For each content unit discovered during migration of content, the following will be performed:
- create a hard link in /var/lib/pulp/artifacts/ directory
- create Artifact in the database
- create Content in the database
- record progress by storing references for pulp 2 and pulp 3 content unit
Updated by dkliban@redhat.com over 6 years ago
- Tracker changed from Issue to Story
- % Done set to 0
Updated by bmbouter about 6 years ago
I've been thinking about the hardlinking aspects. If I have a file a /my/path/to/file and I make a hardlink to it at /new/path/file... is any data copied? is that a fast operation?
Specifically I'm wondering about ^ questions in NFSv3 and NFSv4 and local POSIX environments.
Also it would be helpful to write out the python package name you plan to use for this and see if it's available on PyPI.
Consider calling 'local' mode 'in-place' mode instead since the storage may actually be remote to the Pulp server the 'local' is a bit confusing to me.
Updated by ttereshc about 6 years ago
- Related to Story #3810: As a user, I can migrate ISO content units from Pulp 2 into Pulp 3 as File Content added
Updated by ttereshc about 6 years ago
1. I believe that creating a hard link is a very quick operation, nothing is copied, just the same inode number is associated with another filename.
The main requirement here would be the same filesystem since inodes are unique within a filesystem. I think the name of a mode for the tool should reflect that.
2. At the moment story covers only downloaded content, my understanding is that all the content units have to be migrated, not only downloaded ones. Is it the case?
3. What about creating a RemoteArtifact? I'm not sure I understand why we should not create them in Pulp3 (for downloaded content as well).
4. I suggest to extend the description for storing references for pulp2 and pulp3 content units.
As mentioned in the description, one goal is to resume from any point.
Another goal - we need a mapping between pulp2 and pulp3 content for external(?) tools to migrate repositories themselves.
5. I agree that sqlite is the lightest solution, however it's likely that in many cases there will be already postgresql installed. Does it make sense to use postgresql for that tool?
6. The tool should be able to migrate by content type.
7. The tool should be able to report its status/progress of the migration, e.g. RPMs migrated - 95% or 95000/100000.
Updated by dkliban@redhat.com about 6 years ago
bmbouter wrote:
I've been thinking about the hardlinking aspects. If I have a file a /my/path/to/file and I make a hardlink to it at /new/path/file... is any data copied? is that a fast operation?
A hardlink is just a pointer to an inode. When creating one, no data is moved. Because of this the hardlink needs to reside on the same Volume as the inode it's pointing to.
Specifically I'm wondering about ^ questions in NFSv3 and NFSv4 and local POSIX environments.
The spec for NFSv3[0] says that if the filesystem supports hardlinks then it should work. Filesystems that don't support hardlinks are supposed to raise a specific error.
[0] http://www.faqs.org/rfcs/rfc1813.html
Also it would be helpful to write out the python package name you plan to use for this and see if it's available on PyPI.
I'll add some package names.
Consider calling 'local' mode 'in-place' mode instead since the storage may actually be remote to the Pulp server the 'local' is a bit confusing to me.
Yeah, in-place sounds better.
Updated by dkliban@redhat.com about 6 years ago
ttereshc wrote:
1. I believe that creating a hard link is a very quick operation, nothing is copied, just the same inode number is associated with another filename.
The main requirement here would be the same filesystem since inodes are unique within a filesystem. I think the name of a mode for the tool should reflect that.2. At the moment story covers only downloaded content, my understanding is that all the content units have to be migrated, not only downloaded ones. Is it the case?
I agree. I'll add language about making sure that if the content unit does not exist on disk, it still gets created in the Pulp 3 database using infromation from the LazyCatalogEntry model associated with the content unit. We would create a RemoteArtifact for sure in this case.
3. What about creating a RemoteArtifact? I'm not sure I understand why we should not create them in Pulp3 (for downloaded content as well).
RemoteArtifact needs to be associated with a Remote. We would need to also create Remotes in order to create a RemoteArtifact for each Artifact. This would imply that this tool is going to discover Importers and create equivalent Remotes in Pulp 3. I like this idea and don't think we can get away with not migrating Importers.
4. I suggest to extend the description for storing references for pulp2 and pulp3 content units.
As mentioned in the description, one goal is to resume from any point.
Another goal - we need a mapping between pulp2 and pulp3 content for external(?) tools to migrate repositories themselves.
Yes, that is another goal. I'll mention it in the description.
5. I agree that sqlite is the lightest solution, however it's likely that in many cases there will be already postgresql installed. Does it make sense to use postgresql for that tool?
The user should be able to configure this with either sqlite or postgresql.
6. The tool should be able to migrate by content type.
Yes, i'll put in some examples of commands.
7. The tool should be able to report its status/progress of the migration, e.g. RPMs migrated - 95% or 95000/100000.
It needs to report how many remotes got migrated and then how many of each type of content got migrated.
Updated by bmbouter about 6 years ago
+1 to supporting both sqlite and postgresql. I guess to do that the easiest the tool will be its own django projects that doesn't use the web UI and only uses management commands and models. Is that what you all were thinking?
Updated by dkliban@redhat.com about 6 years ago
bmbouter wrote:
+1 to supporting both sqlite and postgresql. I guess to do that the easiest the tool will be its own django projects that doesn't use the web UI and only uses management commands and models. Is that what you all were thinking?
Yes, that's exactly what I had in mind.
Updated by jsherril@redhat.com almost 6 years ago
From a katello perspective, we'd love to see a tool that does the following:
1) migrates all content units and ideally provides us the mapping of uuids to hrefs.
Its possible we might be able to do the mapping ourselves based on some attributes depending on the unit (such as checksum for rpms), but a more direct mapping would be preferrable.
2a) For our 'normal syncable' repos, provide the ability to create a repo with a version based on it:
repo_a
would map to:
Repository A
Version 1
In this case, for re-runs, we'd want to publish a new version whenever we re-run it (since new content might have entered the repository)
2b) For our 'content view' repos Provide the ability to map some set of pulp2 repositories to a single pulp3 repositories with multiple versions, for example:
repo_a
repo_b
repo_c
would map to:
Repository A:
Version 1
Version 2
Version 3
Ideally order would be preserved. In our use case, these versions never change, so we wouldn't need to handle updates once a version is created (but may want to add more repos as new versions on re-runs, so a re-run with repo_d would create Version 4).
Are the following intended to be covered by this issue? we might need to brainstorm some if they are.
3) remotes
4) publishers
5) publications
6) distributions
7) We will need to worry about migrating or handling consumer profiles and applicability at some point as well, but i assume that will be handled via another issue?
Updated by jsherril@redhat.com over 5 years ago
- Tags Katello-P3 added
- Tags deleted (
Katello-P1)
Updated by jsherril@redhat.com over 5 years ago
- Tags Katello-P2 added
- Tags deleted (
Katello-P3)
Updated by dkliban@redhat.com about 5 years ago
- Status changed from NEW to MODIFIED
Updated by bmbouter about 5 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Updated by ttereshc about 5 years ago
- Project changed from Pulp to Migration Plugin
- Sprint/Milestone deleted (
3.0.0)
Updated by ttereshc about 5 years ago
- Status changed from CLOSED - CURRENTRELEASE to MODIFIED
Updated by ttereshc almost 5 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
- Sprint/Milestone set to 0.1.0
Updated by ggainey over 4 years ago
- Tags Katello added
- Tags deleted (
Katello-P2)