Project

Profile

Help

Test #4566

closed

Improve performance of rpm duplicate nevra check

Added by bherring almost 6 years ago. Updated over 5 years ago.

Status:
CLOSED - COMPLETE
Priority:
High
Assignee:
Sprint/Milestone:
-
Version:
Platform Release:
Tags:
Pulp 2
Sprint:
Quarter:

Description

In current versions of Pulp 2.x, uploading an RPM to a repo will remove other RPMs with the same NEVRA.

Currently, we are upgrading from an old version of Pulp 2.7, and I've found that performance of import_uploaded_unit tasks for RPMs has regressed significantly. In Pulp 2.7, imports would usually take around 0.5s. In Pulp 2-master, imports to the same repos have taken from 8 to 130 seconds, depending on the size of the repo.

By debugging I've found most of the time is spent in this duplicate check (remove_unit_duplicate_nevra).

This issue is for improving the performance of remove_unit_duplicate_nevra to reduce the severity of the performance regression.


Related issues

Copied from RPM Support - Story #4527: Improve performance of rpm duplicate nevra checkCLOSED - CURRENTRELEASErmcgover

Actions
Actions #1

Updated by bherring almost 6 years ago

  • Copied from Story #4527: Improve performance of rpm duplicate nevra check added
Actions #2

Updated by bherring almost 6 years ago

  • Priority changed from Normal to High
Actions #3

Updated by bherring almost 6 years ago

  • Assignee set to bherring

Notes

Verification will include:

  • No Regression in automation testing for 2.19RC
  • Manually testing comparison just to note and agree with previous notations, assumptions and findings pre-build (verify there is an improvement).
Actions #4

Updated by bherring almost 6 years ago

  • Status changed from NEW to ASSIGNED
Actions #5

Updated by bherring almost 6 years ago

Notes

Manual verification of uploaded units proved the single order of magnitude increase on upload to a pre-populated repo without duplicating RPMs via the python script and more functionally through the CLI

CLI-based script

## Test Scenario:
## Resources:
#    Box 1: 2.17.1 
#    Box 2: 2-master
## Tests:
#    1. Dev provided Python based test
#    2. CLI based test below

# Login 
pulp-admin login -u admin -p admin

# Create Repo A and B
pulp-admin rpm repo create --repo-id a --feed https://fedora.mirror.constant.com/fedora/linux/releases/29/Everything/x86_64/os/ --download-policy=on_demand

pulp-admin rpm repo create --repo-id b --feed https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm-with-modules/

# Sync Up 
pulp-admin rpm repo sync run --repo-id b
pulp-admin rpm repo sync run --repo-id a

# Test Upload of content to A (B is no longer used)
yum install wget -y
wget --recursive --no-parent https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm-with-modules/

# Timing Copy of unit from B to A
## declare an array
declare -a elementArray=("bear" "camel" "cat" "cheetah" "chimpanzee" "cockateel" "cow" "crow" "dog" "duck" "elephant" "fox" "frog")

## loop through all elements
for element in "${elementArray[@]}"
do
   echo -e "$element\n"
   time pulp-admin rpm repo uploads rpm --repo-id a --file $element* --recursive
done

Python Test

The python testing locally on each box using `python test.py` returned the same result as the gitgist provided in the original PR for verification.

Actions #6

Updated by bherring almost 6 years ago

  • Status changed from ASSIGNED to MODIFIED
Actions #7

Updated by bmbouter almost 6 years ago

  • Tags Pulp 2 added
Actions #8

Updated by bherring over 5 years ago

  • Status changed from MODIFIED to CLOSED - COMPLETE

Also available in: Atom PDF