Project

Profile

Help

Issue #4505

closed

Slow syncs on large repositories

Added by jsherril@redhat.com about 5 years ago. Updated over 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 50
Quarter:

Description

When syncing a large repository like http://mirror.math.princeton.edu/pub/fedora/linux/releases/28/, the sync is very slow. After 40 minutes only ~5000 rpms have been processed as seen by the associate progress report. (this was with an on_demand repo)

I turned on slow sql logging and see:

  1. cat /var/lib/pgsql/data/pg_log/postgresql-Tue.log | grep duration | cut -c 1-100
    LOG: parameter "log_min_duration_statement" changed to "200"
    LOG: duration: 36385.807 ms statement: UPDATE "pulp_app_repositorycontent" SET "version_removed_id
    LOG: duration: 36158.991 ms statement: UPDATE "pulp_app_repositorycontent" SET "version_removed_id
    LOG: duration: 37753.231 ms statement: UPDATE "pulp_app_repositorycontent" SET "version_removed_id
    LOG: duration: 38638.415 ms statement: UPDATE "pulp_app_repositorycontent" SET "version_removed_id
    LOG: duration: 41236.294 ms statement: UPDATE "pulp_app_repositorycontent" SET "version_removed_id
    LOG: duration: 41611.244 ms statement: UPDATE "pulp_app_repositorycontent" SET "version_removed_id
    LOG: duration: 41071.438 ms statement: UPDATE "pulp_app_repositorycontent" SET "version_removed_id
    LOG: duration: 41803.944 ms statement: UPDATE "pulp_app_repositorycontent" SET "version_removed_id
    LOG: duration: 41617.086 ms statement: UPDATE "pulp_app_repositorycontent" SET "version_removed_id
    LOG: duration: 41983.358 ms statement: UPDATE "pulp_app_repositorycontent" SET "version_removed_id
    LOG: duration: 44623.680 ms statement: UPDATE "pulp_app_repositorycontent" SET "version_removed_id
    LOG: duration: 44442.117 ms statement: UPDATE "pulp_app_repositorycontent" SET "version_removed_id

These are popping up about every minute and are likely a big source of the problem. Each one is very large:

cat /var/lib/pgsql/data/pg_log/postgresql-Tue.log | grep duration |  tail -n1| cut -c 1-1000

LOG:  duration: 44922.590 ms  statement: UPDATE "pulp_app_repositorycontent" SET "version_removed_id" = 2 WHERE ("pulp_app_repositorycontent"."content_id" IN (SELECT U0."content_ptr_id" FROM "rpm_package" U0 WHERE ((U0."arch" = 'x86_64' AND U0."epoch" = '0' AND U0."name" = '0ad' AND U0."release" = '5.fc28' AND U0."version" = '0.0.22' AND NOT (U0."content_ptr_id" = 51)) OR (U0."arch" = 'noarch' AND U0."epoch" = '0' AND U0."name" = '0ad-data' AND U0."release" = '2.fc28' AND U0."version" = '0.0.22' AND NOT (U0."content_ptr_id" = 52)) OR (U0."arch" = 'x86_64' AND U0."epoch" = '0' AND U0."name" = '0install' AND U0."release" = '1.fc27' AND U0."version" = '2.12.1' AND NOT (U0."content_ptr_id" = 53)) OR (U0."arch" = 'x86_64' AND U0."epoch" = '0' AND U0."name" = '0xFFFF' AND U0."release" = '15.fc26' AND U0."version" = '0.3.9' AND NOT (U0."content_ptr_id" = 54)) OR (U0."arch" = 'x86_64' AND U0."epoch" = '0' AND U0."name" = '2048-cli' AND U0."release" = '5.fc28' AND U0."version" = '0.9.1' AND NOT

cat /var/lib/pgsql/data/pg_log/postgresql-Tue.log | grep duration | tail -n1| wc -c
1147415

I'm not sure if the size of the query is the problem or how it is structured.


Related issues

Related to RPM Support - Test #4292: Add Travis job to nightly sync EPEL7 repoCLOSED - COMPLETEkersomActions

Also available in: Atom PDF