As a plugin writer or user, I have a pipeline performance data collector
Performance of a multi-stage queueing network is greatly benefited by some instrumentation that measures the traditional queueing statistics. This ticket creates a feature that gathers that data.
This can be used by users to send to developers. It could also be to performance test the pipeline nightly and report on it's performance over time when running in a resource controlled environment.
For each item at each stage we'll record the waiting time and the service time. Also upon entry to each queue we'll record the queue length. Finally the inter-arrival time to each queue will be recorded. Formal definitions of these are below:
waiting time - The number of seconds an item was waiting in a specific Queue
service time - The number of seconds an item was being handled by a stage
queue_length - The number of waiting items in the queue, as measured upon ingress of a new item
interarrival_time - The number of seconds since the previous arrival to this Queue
The data should be written to a sqlite3 database in the
/var/lib/pulp/debug/ with the filename being the UUID of the task it is running inside of. This will cause many sqlite3 dbs to be made, but it will allow them to be sent around and uploaded easily.
We need to also understand what order and which types of stages are being used so that the data for each queue and stage can be understood. This should be recorded when the pipeline is assembled with create_pipeline() This also needs to be saved into the db somehow.
If any tooling is developed it could be cool to add it as a pulp-manager command here: https://github.com/pulp/pulp/tree/master/pulpcore/pulpcore/app/management/commands
Enabling the Feature¶
This feature can be enabled with
PROFILE_STAGES_API = True. It is disabled by default.
Please register to edit this issue