Performance Test Plan¶
Overall this idea is inspired by reports like these for the Lucene project.
Add performance tests to pulp-smash in a specific Python module. They will not be run by default. They will only if the specific performance module is run. The runtime measurement will be built into these tests and will be reported in seconds.
By default these tests will write their runtime output to stdout as a table
If given an environment variable (or option?), write the contents to a json file. The file will be keyed on the test name and its value is the runtime.
Create a Jenkins job which installs the version of Pulp under test, and runs the performance tests with the environment variable set. The install happens on one machine and all tests run on that one machine. If all tests do not complete the Jenkins job takes no further action and e-mail and reports in IRC that the performance test did not run.
Longitudinal raw data will be stored outside of Jenkins in a sqlite database. Assuming the tests completed, after the runtime json file is produced, the Jenkins job will download the sqlite file via scp. It will then add that run's data into the sqlite file. The scp will then be scp delivered back to its original location overwritting the file it downloaded.
Using the updated sqlite database, one summarized timeseries chart will be produced using matplotlib for each test in the sqlite database. These will do a summarization (see statistics below) to consolidate several points into one. These charts will be SCP'd to a web accessible filesystem to be viewed by the community.
The sqlite database will have a table for Annotations which will allow matplotlib to annotate data points with information about changes in performance or release information.
Controlling for External Factors over Time¶
Add things here...
Each test will live on its own page and at the top will have a simple description of the test does. The graph will be below the description, and the annotations are at the bottom. Here is an example (minus the description)
Fresh sync only of Fedora 21 w/ lazy off from http://archive.linux.duke.edu/fedora/pub/fedora/linux/releases/21/Everything/x86_64/os/
Re-sync only w/ lazy off http://archive.linux.duke.edu/fedora/pub/fedora/linux/releases/21/Everything/x86_64/os/
Fresh sync only of Fedora 21 w/ lazy on from http://archive.linux.duke.edu/fedora/pub/fedora/linux/releases/21/Everything/x86_64/os/
Re-sync only w/ lazy on http://archive.linux.duke.edu/fedora/pub/fedora/linux/releases/21/Everything/x86_64/os/
Uploading of 100 RPMs with Pulp's httpd config defaults
Copy of all rpms from Fedora 21 repo ^ one repo to an empty repo without depsolve
Copy of a limited set of rpms from Fedora 21 repo ^ to an empty repo with depsolve
Fetch all rpm ids of a large repo. Katello tries to fetch all ids and if that doesn't respond within 2 minutes we fetch in 500 chunks
Related test would be to take the same large repository and create 100-200 copies of that repository (thus creating a lot of related links among the content in Mongo) and do the same fetches
Fetch all errata ids of a repo (May need to sync RHEL or EPEL for this), same logic as rpm ids
After fetching all the errata ids of a repo, fetch all the errata in chunks of 500 by chunks of 500 ids
After fetching all the rpms ids of a repo, fetch all the errata in chunks of 500 by chunks of 500ids
Search performance using a simple Criteria filter
Fresh publish runtime of a Fedora 21 repo
Incremental (second) publish runtime of the Fedora 21 repo with no changes
Incremental (second) publish runtime of the Fedora 21 repo after uploading 20 new rpms.