Issue #7842
closed'table db_info already exists' on consecutive migrations
Description
On a fresh pulp3 installation, running consecutive 2to3 migrations results in following error (first migration ran without error):
Nov 16 12:48:54 rq[657971]: Traceback (most recent call last):
Nov 16 12:48:54 rq[657971]: File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
Nov 16 12:48:54 rq[657971]: rv = job.perform()
Nov 16 12:48:54 rq[657971]: File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
Nov 16 12:48:54 rq[657971]: self._result = self._execute()
Nov 16 12:48:54 rq[657971]: File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
Nov 16 12:48:54 rq[657971]: return self.func(*self.args, **self.kwargs)
Nov 16 12:48:54 rq[657971]: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/tasks/migrate.py", line 140, in migrate_from_pulp2
Nov 16 12:48:54 rq[657971]: create_repoversions_publications_distributions(plan)
Nov 16 12:48:54 rq[657971]: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 293, in create_repoversions_publications_distributions
Nov 16 12:48:54 rq[657971]: task_func(*task_args)
Nov 16 12:48:54 rq[657971]: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 187, in simple_plugin_migration
Nov 16 12:48:54 rq[657971]: migrate_repo_distributor(dist_migrator, progress_dist, pulp2_dist)
Nov 16 12:48:54 rq[657971]: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 391, in migrate_repo_distributor
Nov 16 12:48:54 rq[657971]: pulp2dist, repo_version)
Nov 16 12:48:54 rq[657971]: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/rpm/repository.py", line 74, in migrate_to_pulp3
Nov 16 12:48:54 rq[657971]: publish(repo_version.pk, checksum_types=checksum_types)
Nov 16 12:48:54 rq[657971]: File "/usr/lib/python3.6/site-packages/pulp_rpm/app/tasks/publishing.py", line 290, in publish
Nov 16 12:48:54 rq[657971]: metadata_signing_service=metadata_signing_service
Nov 16 12:48:54 rq[657971]: File "/usr/lib/python3.6/site-packages/pulp_rpm/app/tasks/publishing.py", line 343, in create_repomd_xml
Nov 16 12:48:54 rq[657971]: pri_db = cr.PrimarySqlite(pri_db_path)
Nov 16 12:48:54 rq[657971]: File "/usr/lib64/python3.6/site-packages/createrepo_c/__init__.py", line 202, in __init__
Nov 16 12:48:54 rq[657971]: Sqlite.__init__(self, path, DB_PRIMARY)
Nov 16 12:48:54 rq[657971]: createrepo_c.CreaterepoCError: Can not create db_info table: table db_info already exists
This is while publishing a 'frozen' rpm repository, i.e. a repo without a feed which we manually copy content to when needed. The content in the repo had changed (content added) between migrations but I don't see how that could be a problem. The error appears at every migration attempt now, while trying to publish the same repo.
Running in an rpm-based installation on RHEL8: python3-pulp-rpm-3.7.0-1.el8.noarch python3-pulpcore-3.7.3-1.el8.noarch python3-pulp-2to3-migration-0.5.1-1.el8.noarch
Related issues
Updated by dalley about 4 years ago
If I had to guess, this is a problem with working directory management. We give createrepo_c a path to init a database file but the path already exists.
But beyond that, we probably shouldn't be generating the sqlite databases to begin with. https://pulp.plan.io/issues/7851 will likely fix the problem for your use case.
Updated by dalley about 4 years ago
- Related to Issue #7851: don't generate sqlite db files for yum metadata if pulp2 exporter didn't use generate them added
Updated by dalley about 4 years ago
I attempted to make a reproducer script, but wasn't able to reproduce. Am I missing some steps?
export BASE_ADDR=http://localhost:24817
pulp-admin rpm repo create --download-policy=on_demand --repo-id zoo --feed https://fixtures.pulpproject.org/rpm-unsigned/
pulp-admin rpm repo sync run --repo-id zoo
pulp-admin rpm repo create --repo-id new
pulp-admin rpm repo copy rpm --from-repo-id zoo --to-repo-id new --str-eq name=dog
http POST :24817/pulp/api/v3/migration-plans/ plan='{"plugins": [{"type": "rpm"}]}'
export PLAN_HREF=$(http $BASE_ADDR/pulp/api/v3/migration-plans/ | jq -r '.results[0] | .pulp_href')
http POST :24817${PLAN_HREF}run/
pulp-admin rpm repo copy rpm --from-repo-id zoo --to-repo-id new --str-eq name=bear
http POST :24817${PLAN_HREF}run/
Updated by adam.winberg@smhi.se about 4 years ago
Possibly a 'publish' in pulp2 after your last 'pulp-admin' command?
Though I'm not sure what triggers this, but when I encountered it the repo that pulp3 was trying to publish had been updated with new content and published in pulp2.
Updated by adam.winberg@smhi.se almost 4 years ago
I've reran migrations and not been able to reproduce this again. Not sure what triggers it. Now i instead run into #7876 which happen before publishing, so it may be that the migration fails before I get to this issues stage.
Updated by adam.winberg@smhi.se almost 4 years ago
nvm, now I got the error again. The big change this time compared to the successful migration I ran yesterday is that we added two new repos to pulp2 - one with a feed and one without a feed ('postgres13' and 'frozen-postgres13'). 'postgres13' was synced and content was copied to the frozen repo and both repos were published.
And this morning when I tried a migration I once again get createrepo_c.CreaterepoCError: Can not create db_info table: table db_info already exists
Updated by dalley almost 4 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to dalley
Updated by dalley almost 4 years ago
Hey, sorry for the delay. I picked this up just before leaving for winter shutdown.
The second case seems like it might be simpler so let's focus on that one. Just to be sure I'm understanding the sequence of events correctly:
- You added a new repository with a feed, and synced it into Pulp 2
- You created a new repository without a feed, and copied content from the first repo.
- Both repositories were published (for the first time)
- Neither repository has ever been migrated at this point, they were both created after the last migration.
- They are migrated to Pulp 3, and the migration fails, and fails constantly ever after
I updated my script and unfortunately I'm still having trouble reproducing.
Questions:
- Can you provide the postgresql 13 repo URL that you used?
- Did you copy all content into the frozen repo, or just some of the content?
- Does the migration plan just ask for all RPM repositories to be migrated, or does it explicitly list them?
Assumptions (let me know if they aren't valid):
- Pulp 3 doesn't have any "virgin" repos that were not migrated from Pulp 2?
- Pulp 2 has several repos besides these two in question, and Pulp 3 has migrated copies of those?
pulp-admin rpm repo create --download-policy=on_demand --repo-id postgresql13 --feed https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-8.3-x86_64/
pulp-admin rpm repo sync run --repo-id postgresql13
pulp-admin rpm repo create --repo-id frozen-postgresql13
pulp-admin rpm repo copy rpm --from-repo-id postgresql13 --to-repo-id frozen-postgresql13
pulp-admin rpm repo publish run --repo-id frozen-postgresql13
http POST :24817/pulp/api/v3/migration-plans/ plan='{"plugins": [{"type": "rpm"}]}'
export PLAN_HREF=$(http $BASE_ADDR/pulp/api/v3/migration-plans/ | jq -r '.results[0] | .pulp_href')
http POST :24817${PLAN_HREF}run/
Updated by adam.winberg@smhi.se almost 4 years ago
- You added a new repository with a feed, and synced it into Pulp 2
- You created a new repository without a feed, and copied content from the first repo.
- Both repositories were published (for the first time)
- Neither repository has ever been migrated at this point, they were both created after the last migration.
- They are migrated to Pulp 3, and the migration fails, and fails constantly ever after
Yes, this is correct.
Questions:
- Can you provide the postgresql 13 repo URL that you used?
https://yum.postgresql.org/13/redhat/rhel-8-x86_64/
- Did you copy all content into the frozen repo, or just some of the content?
All content.
- Does the migration plan just ask for all RPM repositories to be migrated, or does it explicitly list them?
No explicit list in the migration plan, just everything from the rpm plugin.
Assumptions (let me know if they aren't valid):
- Pulp 3 doesn't have any "virgin" repos that were not migrated from Pulp 2?
Correct, no actions made to the pulp3 installation besides the migrations.
- Pulp 2 has several repos besides these two in question, and Pulp 3 has migrated copies of those?
Correct.
Updated by dalley almost 4 years ago
So I still haven't reproduced it but I think I figured out what is going on anyways.
The publish task creates a temporary working directory to work with the metadata files it's constructing. The name of this temporary working directory is constructed from the hostname of the worker and the task ID. During normal publishes and complex migrations this will always be unique because new tasks are spawned for each individual publish op. Not the case for "simple" migrations which means "just migrate everything". In that case, it's running publish() repeatedly from the same task, which means the working directory is constructed with the same name, which is probably why we're getting file name collisions.
If this is what is happening, the next time it happens, try clearing /tmp/ and see if it works.
Refactoring the codepaths for "simple" migrations was planned anyways so we'll keep this in mind when doing so.
Updated by dalley almost 4 years ago
Alternatively (or additionally)? We should probably make sure that WorkingDirectory() doesn't silently re-use an existing directory.
Updated by dalley almost 4 years ago
Eh, there might be a little more to this. It looks like the directories should be cleaned up automatically. I'll keep investigating.
Updated by dalley almost 4 years ago
- Status changed from ASSIGNED to CLOSED - DUPLICATE
This PR should fix the problem if my analysis was correct. I'll close this issue for now but if you experience it again (once the new version lands and you upgrade) please re-open it.