Project

Profile

Help

Issue #5813

closed

syncing a very large file repository takes a VERY long time

Added by jsherril@redhat.com almost 5 years ago. Updated over 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Katello
Sprint:
Sprint 63
Quarter:

Description

I tried syncing a file repository with pulp3 and it took ~7 hours and was still not done. It finished all the steps in the progress report and was still doing 'something'. worker cpu usage was high, but i'm not sure what it was doing (duplicate detection maybe?)

I tried this with pulp2 and it took ~55 minutes.


Files

cprofile_150k_repo (234 KB) cprofile_150k_repo bmbouter, 12/05/2019 07:35 PM
cprofile_150k_repo_on_demand (210 KB) cprofile_150k_repo_on_demand bmbouter, 12/05/2019 07:35 PM
Actions #1

Updated by jsherril@redhat.com almost 5 years ago

  • Tags Katello-P2 added
Actions #2

Updated by daviddavis almost 5 years ago

  • Sprint set to Sprint 62

Adding to the sprint to hopefully resolve before 3.0 GA.

Actions #3

Updated by daviddavis almost 5 years ago

I talked to @partha since @jsherrill is out. Sounds like the file repo had 150K small files.

We can probably use the pulp-fixtures script to generate such a repo:

https://github.com/PulpQE/pulp-fixtures/blob/master/file/gen-fixtures.sh

Actions #5

Updated by bmbouter almost 5 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to bmbouter
Actions #6

Updated by fao89 almost 5 years ago

  • Triaged changed from No to Yes
Actions #7

Updated by bmbouter almost 5 years ago

I applied the WIP PR from @fabricio here: which adjusts the batch sizes which provides a great speedup. This is syncing with policy='immediate' from http://quartet.usersys.redhat.com/pub/fake-repos/very_large_file_150k/

It shows a runtime of ~ 66 minutes even with cprofile recording the run, which is great. We're going to merge that PR and I think it will resolve this issue.

{
    "pulp_href": "/pulp/api/v3/tasks/a60bd695-8129-406f-8ee5-c13fb3a6b680/",
    "pulp_created": "2019-12-05T15:36:32.725152Z",
    "state": "completed",
    "name": "pulp_file.app.tasks.synchronizing.synchronize",
    "started_at": "2019-12-05T15:36:33.024143Z",
    "finished_at": "2019-12-05T16:42:41.263819Z",
    "error": null,
    "worker": "/pulp/api/v3/workers/5eb9dcb6-7561-4ef2-aa1e-6b9f9468f7d8/",
    "progress_reports": [
        {
            "message": "Downloading Metadata",
            "code": "downloading.metadata",
            "state": "completed",
            "total": null,
            "done": 1,
            "suffix": null
        },
        {
            "message": "Parsing Metadata Lines",
            "code": "parsing.metadata",
            "state": "completed",
            "total": 150001,
            "done": 150001,
            "suffix": null
        },
        {
            "message": "Downloading Artifacts",
            "code": "downloading.artifacts",
            "state": "completed",
            "total": null,
            "done": 150001,
            "suffix": null
        },
        {
            "message": "Associating Content",
            "code": "associating.content",
            "state": "completed",
            "total": null,
            "done": 150001,
            "suffix": null
        }
    ],
    "created_resources": [
        "/pulp/api/v3/repositories/file/file/cbfdbf42-a3cd-4e62-8a28-e8f55957c469/versions/1/"
    ],
    "reserved_resources_record": [
        "/pulp/api/v3/repositories/file/file/cbfdbf42-a3cd-4e62-8a28-e8f55957c469/",
        "/pulp/api/v3/remotes/file/file/03c91c3f-c750-407a-8cd5-7b182e9680c6/"
    ]
}
Actions #8

Updated by bmbouter almost 5 years ago

Here's a sync with policy=on_demand and cprofiling enabled ~ 26 minutes

{
    "pulp_href": "/pulp/api/v3/tasks/d51a63a3-f737-4898-a294-44198f378823/",
    "pulp_created": "2019-12-05T17:33:46.837539Z",
    "state": "completed",
    "name": "pulp_file.app.tasks.synchronizing.synchronize",
    "started_at": "2019-12-05T17:33:46.950234Z",
    "finished_at": "2019-12-05T17:59:30.521306Z",
    "error": null,
    "worker": "/pulp/api/v3/workers/cf4b2e6e-3e81-4968-ba14-8dab7edbb6b3/",
    "progress_reports": [
        {
            "message": "Downloading Metadata",
            "code": "downloading.metadata",
            "state": "completed",
            "total": null,
            "done": 1,
            "suffix": null
        },
        {
            "message": "Parsing Metadata Lines",
            "code": "parsing.metadata",
            "state": "completed",
            "total": 150001,
            "done": 150001,
            "suffix": null
        },
        {
            "message": "Downloading Artifacts",
            "code": "downloading.artifacts",
            "state": "completed",
            "total": null,
            "done": 0,
            "suffix": null
        },
        {
            "message": "Associating Content",
            "code": "associating.content",
            "state": "completed",
            "total": null,
            "done": 150001,
            "suffix": null
        }
    ],
    "created_resources": [
        "/pulp/api/v3/repositories/file/file/090ed56e-4f30-450c-829b-9f861397bc21/versions/1/"
    ],
    "reserved_resources_record": [
        "/pulp/api/v3/remotes/file/file/780e65aa-d66b-47ed-98cd-d0b1bf32f18c/",
        "/pulp/api/v3/repositories/file/file/090ed56e-4f30-450c-829b-9f861397bc21/"
    ]
}
Actions #9

Updated by bmbouter almost 5 years ago

Adding cprofiled outputs so anyone can analyze

Actions #10

Updated by rchan almost 5 years ago

  • Sprint changed from Sprint 62 to Sprint 63
Actions #11

Updated by bmbouter almost 5 years ago

  • Status changed from ASSIGNED to MODIFIED
Actions #12

Updated by bmbouter almost 5 years ago

  • Sprint/Milestone set to 0.1.0
Actions #13

Updated by bmbouter almost 5 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Actions #14

Updated by ggainey over 4 years ago

  • Tags Katello added
  • Tags deleted (Katello-P2)

Also available in: Atom PDF