Performance Test Plan

Overview

Overall this idea is inspired by reports like these for the Lucene project.

Idea

Add performance tests to pulp-smash in a specific Python module. They will not be run by default. They will only if the specific performance module is run. The runtime measurement will be built into these tests and will be reported in seconds.

By default these tests will write their runtime output to stdout as a table

If given an environment variable (or option?), write the contents to a json file. The file will be keyed on the test name and its value is the runtime.

Create a Jenkins job which installs the version of Pulp under test, and runs the performance tests with the environment variable set. The install happens on one machine and all tests run on that one machine. If all tests do not complete the Jenkins job takes no further action and e-mail and reports in IRC that the performance test did not run.

Longitudinal raw data will be stored outside of Jenkins in a sqlite database. Assuming the tests completed, after the runtime json file is produced, the Jenkins job will download the sqlite file via scp. It will then add that run's data into the sqlite file. The scp will then be scp delivered back to its original location overwritting the file it downloaded.

Using the updated sqlite database, one summarized timeseries chart will be produced using matplotlib for each test in the sqlite database. These will do a summarization (see statistics below) to consolidate several points into one. These charts will be SCP'd to a web accessible filesystem to be viewed by the community.

The sqlite database will have a table for Annotations which will allow matplotlib to annotate data points with information about changes in performance or release information.

Controlling for External Factors over Time

Add things here...

The Tests

Each test will live on its own page and at the top will have a simple description of the test does. The graph will be below the description, and the annotations are at the bottom. Here is an example (minus the description)

RPM Tests

Fresh sync only of Fedora 21 w/ lazy off from http://archive.linux.duke.edu/fedora/pub/fedora/linux/releases/21/Everything/x86_64/os/

Re-sync only w/ lazy off http://archive.linux.duke.edu/fedora/pub/fedora/linux/releases/21/Everything/x86_64/os/

Fresh sync only of Fedora 21 w/ lazy on from http://archive.linux.duke.edu/fedora/pub/fedora/linux/releases/21/Everything/x86_64/os/

Re-sync only w/ lazy on http://archive.linux.duke.edu/fedora/pub/fedora/linux/releases/21/Everything/x86_64/os/

Uploading of 100 RPMs with Pulp's httpd config defaults

Copy of all rpms from Fedora 21 repo ^ one repo to an empty repo without depsolve

Copy of a limited set of rpms from Fedora 21 repo ^ to an empty repo with depsolve

Fetch all rpm ids of a large repo. Katello tries to fetch all ids and if that doesn't respond within 2 minutes we fetch in 500 chunks

Related test would be to take the same large repository and create 100-200 copies of that repository (thus creating a lot of related links among the content in Mongo) and do the same fetches

Fetch all errata ids of a repo (May need to sync RHEL or EPEL for this), same logic as rpm ids

After fetching all the errata ids of a repo, fetch all the errata in chunks of 500 by chunks of 500 ids

After fetching all the rpms ids of a repo, fetch all the errata in chunks of 500 by chunks of 500ids

Search performance using a simple Criteria filter

Fresh publish runtime of a Fedora 21 repo

Incremental (second) publish runtime of the Fedora 21 repo with no changes

Incremental (second) publish runtime of the Fedora 21 repo after uploading 20 new rpms.