Issue #5813
closedsyncing a very large file repository takes a VERY long time
Description
I tried syncing a file repository with pulp3 and it took ~7 hours and was still not done. It finished all the steps in the progress report and was still doing 'something'. worker cpu usage was high, but i'm not sure what it was doing (duplicate detection maybe?)
I tried this with pulp2 and it took ~55 minutes.
Files
Updated by daviddavis about 5 years ago
- Sprint set to Sprint 62
Adding to the sprint to hopefully resolve before 3.0 GA.
Updated by daviddavis about 5 years ago
I talked to @partha since @jsherrill is out. Sounds like the file repo had 150K small files.
We can probably use the pulp-fixtures script to generate such a repo:
https://github.com/PulpQE/pulp-fixtures/blob/master/file/gen-fixtures.sh
Updated by bmbouter about 5 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to bmbouter
Updated by bmbouter about 5 years ago
I applied the WIP PR from @fabricio here: which adjusts the batch sizes which provides a great speedup. This is syncing with policy='immediate' from http://quartet.usersys.redhat.com/pub/fake-repos/very_large_file_150k/
It shows a runtime of ~ 66 minutes even with cprofile recording the run, which is great. We're going to merge that PR and I think it will resolve this issue.
{
"pulp_href": "/pulp/api/v3/tasks/a60bd695-8129-406f-8ee5-c13fb3a6b680/",
"pulp_created": "2019-12-05T15:36:32.725152Z",
"state": "completed",
"name": "pulp_file.app.tasks.synchronizing.synchronize",
"started_at": "2019-12-05T15:36:33.024143Z",
"finished_at": "2019-12-05T16:42:41.263819Z",
"error": null,
"worker": "/pulp/api/v3/workers/5eb9dcb6-7561-4ef2-aa1e-6b9f9468f7d8/",
"progress_reports": [
{
"message": "Downloading Metadata",
"code": "downloading.metadata",
"state": "completed",
"total": null,
"done": 1,
"suffix": null
},
{
"message": "Parsing Metadata Lines",
"code": "parsing.metadata",
"state": "completed",
"total": 150001,
"done": 150001,
"suffix": null
},
{
"message": "Downloading Artifacts",
"code": "downloading.artifacts",
"state": "completed",
"total": null,
"done": 150001,
"suffix": null
},
{
"message": "Associating Content",
"code": "associating.content",
"state": "completed",
"total": null,
"done": 150001,
"suffix": null
}
],
"created_resources": [
"/pulp/api/v3/repositories/file/file/cbfdbf42-a3cd-4e62-8a28-e8f55957c469/versions/1/"
],
"reserved_resources_record": [
"/pulp/api/v3/repositories/file/file/cbfdbf42-a3cd-4e62-8a28-e8f55957c469/",
"/pulp/api/v3/remotes/file/file/03c91c3f-c750-407a-8cd5-7b182e9680c6/"
]
}
Updated by bmbouter about 5 years ago
Here's a sync with policy=on_demand and cprofiling enabled ~ 26 minutes
{
"pulp_href": "/pulp/api/v3/tasks/d51a63a3-f737-4898-a294-44198f378823/",
"pulp_created": "2019-12-05T17:33:46.837539Z",
"state": "completed",
"name": "pulp_file.app.tasks.synchronizing.synchronize",
"started_at": "2019-12-05T17:33:46.950234Z",
"finished_at": "2019-12-05T17:59:30.521306Z",
"error": null,
"worker": "/pulp/api/v3/workers/cf4b2e6e-3e81-4968-ba14-8dab7edbb6b3/",
"progress_reports": [
{
"message": "Downloading Metadata",
"code": "downloading.metadata",
"state": "completed",
"total": null,
"done": 1,
"suffix": null
},
{
"message": "Parsing Metadata Lines",
"code": "parsing.metadata",
"state": "completed",
"total": 150001,
"done": 150001,
"suffix": null
},
{
"message": "Downloading Artifacts",
"code": "downloading.artifacts",
"state": "completed",
"total": null,
"done": 0,
"suffix": null
},
{
"message": "Associating Content",
"code": "associating.content",
"state": "completed",
"total": null,
"done": 150001,
"suffix": null
}
],
"created_resources": [
"/pulp/api/v3/repositories/file/file/090ed56e-4f30-450c-829b-9f861397bc21/versions/1/"
],
"reserved_resources_record": [
"/pulp/api/v3/remotes/file/file/780e65aa-d66b-47ed-98cd-d0b1bf32f18c/",
"/pulp/api/v3/repositories/file/file/090ed56e-4f30-450c-829b-9f861397bc21/"
]
}
Updated by bmbouter about 5 years ago
- File cprofile_150k_repo cprofile_150k_repo added
- File cprofile_150k_repo_on_demand cprofile_150k_repo_on_demand added
Adding cprofiled outputs so anyone can analyze
Updated by bmbouter about 5 years ago
- Status changed from ASSIGNED to MODIFIED
This was fixed by: https://github.com/pulp/pulpcore/pull/440
Updated by bmbouter about 5 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Updated by ggainey over 4 years ago
- Tags Katello added
- Tags deleted (
Katello-P2)