Project

Profile

Help

Issue #4404

closed

Sync performance degradation with RemoteArtifactSaver stage

Added by gmbnomis almost 6 years ago. Updated almost 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 48
Quarter:

Description

Problem

The new RemoteArtifactSaver stage see issue #4246) slows down sync, especially
when re-syncing an existing repo.

Some performance impact was expected, as the RemoteArtifactSaver stage has
to query for existing RemoteArtifacts in all cases (previously, the
RemoteArtifacts were saved unconditionally when saving content units)

Measurements

Duration of a lazy sync of Chef Supermarket (ca. 23700 content units &
remote artifacts) using pulp_cookbook. Uses a fresh Pulp instance (just started) with an empty
database.

Before RemoteArtifactSaver stage (i.e. before https://github.com/pulp/pulpcore-plugin/pull/36):

Initial sync: ca. 90 seconds
Re-sync: ca. 14 seconds

With separate RemoteArtifactSaver stage:

Initial sync: ca. 160 seconds
Re-sync: ca. 140 seconds

Initial sync time almost doubles, re-sync time increases by a factor of 10!

Solution

The root cause is the high number of data base operations in the new stage.
In a test (see https://gist.github.com/gmbnomis/07b6c7d13a313dbcfcaa81ff026b96f8), the stage causes ca. 300 DB queries for a batch of 100 content units (to create around 50 RemoteArtifacts).

Using prefetching, the stage can be implemented using 2 DB queries per
batch. The WHERE clauses use pks only. For example:

[{'sql': 'SELECT "pulp_app_contentartifact"."_id", '
         '"pulp_app_contentartifact"."_created", '
         '"pulp_app_contentartifact"."_last_updated", '
         '"pulp_app_contentartifact"."artifact_id", '
         '"pulp_app_contentartifact"."content_id", '
         '"pulp_app_contentartifact"."relative_path" FROM '
         '"pulp_app_contentartifact" WHERE '
         '"pulp_app_contentartifact"."content_id" IN (1, 2, 3, 4, 5, 6, 7, 8, '
         '9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, '
         '26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, '
         '43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, '
         '60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, '
         '77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, '
         '94, 95, 96, 97, 98, 99, 100)',
  'time': '0.008'},
 {'sql': 'SELECT "pulp_app_remoteartifact"."_id", '
         '"pulp_app_remoteartifact"."_created", '
         '"pulp_app_remoteartifact"."_last_updated", '
         '"pulp_app_remoteartifact"."url", "pulp_app_remoteartifact"."size", '
         '"pulp_app_remoteartifact"."md5", "pulp_app_remoteartifact"."sha1", '
         '"pulp_app_remoteartifact"."sha224", '
         '"pulp_app_remoteartifact"."sha256", '
         '"pulp_app_remoteartifact"."sha384", '
         '"pulp_app_remoteartifact"."sha512", '
         '"pulp_app_remoteartifact"."content_artifact_id", '
         '"pulp_app_remoteartifact"."remote_id" FROM "pulp_app_remoteartifact" '
         'WHERE ("pulp_app_remoteartifact"."remote_id" IN (1, 3) AND '
         '"pulp_app_remoteartifact"."content_artifact_id" IN (1, 2, 3, 4, 5, '
         '6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, '
         '24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, '
         '41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, '
         '58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, '
         '75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, '
         '92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, '
         '107, 108, 109, 110, 111, 112, 113, 114, 115))',
  'time': '0.001'}]

The queries are are little bit broader than before: The query for the
RemoteArtifacts includes RemoteArtifacts for all remotes seen in the current
batch (previously, it included the remotes seen per declarative_content).
For all practical purposes, this difference should be negligible (and no
difference for batches using a single remote).

Performance measurement:

Initial sync: ca. 100 seconds
Re-sync: ca. 23 seconds

Also available in: Atom PDF