Issue #7809
closed
timeout on `Pulp2to3MigrationClient::Pulp2RepositoriesApi list({"offset"=>0, "limit"=>2000})`
Status:
CLOSED - CURRENTRELEASE
Description
My Pulpcore worker has been dying with [CRITICAL] WORKER TIMEOUT (pid:15519)
when Katello tries to list 402 migrated repositories with the limit set to 2000. Reproducing the issue is a little bit strange, since I can list using the 2000 limit in the foreman Rails console but trying the same thing during our ImportMigration task results in the killed worker and 504 proxy error. I'm still trying to figure out the difference, but I do believe there's a Pulp bug in there somewhere due to the dead Pulpcore worker.
If it's necessary to reproduce with Katello:
- Create a Katello nightly production VM
- Sync the RHEL 7 Extras repo and put it in a content view
- Publish the resulting content view 400 times
- Run
foreman-rake katello:pulp3_migration
Also I'll note that memory usage is almost definitely not the issue here. I haven't caught any instances of high memory usage, at least.
Files
(found a typo, I meant to say 502 instead of 504)
I tried on a katello box but I used the python bindings. Pulp2RepositoriesApi(migration_client).list(offset=0, limit=2000)
.
It took few seconds but I didn't notice any issues, though it does create a short spike on the cpu load. I have 659 repos to list.
I tried it also when I kicked off a migration task.
Nothing is failing so far but I'll keep trying at different stages.
I've discovered that I am only able to reproduce this on Katello nightly production boxes after migrating >= 400 repositories. The issue is not reproducible on a development box, however. I'm not sure yet what difference would cause this issue, it's not Pulp plugin versions at least.
I've found it's only reproducible if you do the following:
foreman-rake console
def api_client
Pulp2to3MigrationClient::ApiClient.new(SmartProxy.pulp_primary!.pulp3_configuration(Pulp2to3MigrationClient::Configuration))
end
def pulp2_repositories_api
Pulp2to3MigrationClient::Pulp2RepositoriesApi.new(api_client)
end
imported = Katello::Pulp3::Api::Core.fetch_from_list { |opts| pulp2_repositories_api.list(opts) }
Still trying to figure out why fetching with the above code causes the pulpcore worker to die.
Attaching some files showing:
- a migration plan
- the resulting listing of 2 pulp2repositories. Notice that all distributions are listed for each one
- a fetching of the distributions showing that they are using the same publication
It appears that the api is showing all distributions associated with the pulp3 version, not just the ones that are part of that pulp2 repository. I suspect this is slowing down the api considerably.
- Subject changed from Worker dies on `Pulp2to3MigrationClient::Pulp2RepositoriesApi list({"offset"=>0, "limit"=>2000})` to timeout on `Pulp2to3MigrationClient::Pulp2RepositoriesApi list({"offset"=>0, "limit"=>2000})`
Thanks, jsherrill. Agreed.
Just adding a bit more detail here.
The likely problem is a serialization. It's very heavy in certain situations because of the following incorrect behaviour:
If pulp 2 has N copies of the same repository, it will be one repo version and one publication in pulp 3 which is good. It will create N distributions, which is correct. However, we serialize pulp2repository
in a way that it shows all the distributions for each publication, so it will be N distributions for each pulp 2 repo, which is wrong.
Migration plugin needs to show only distributions relevant to that pulp 2 repo.
We might need to add pulp3_distributions
relation to the pulp2repository model to resolve that.
- Triaged changed from No to Yes
- Status changed from NEW to ASSIGNED
- Assignee set to ipanova@redhat.com
- Sprint set to Sprint 85
- Sprint changed from Sprint 85 to Sprint 86
- Status changed from ASSIGNED to MODIFIED
- Sprint/Milestone set to 0.6.0
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Also available in: Atom
PDF
Fix distribution serialization.
closes #7809 https://pulp.plan.io/issues/7809