Project

Profile

Help

Actions

Pulp Automation Services

This is not an exhaustive list. If any services are missing, please add them as needed.

The "automation" VM

Some of our automated services live on the "automation" VM, which currently resides in the Openstack tenancy provided by Red Hat Central CI. Being provided by Central CI, this system is only accessible if you have the Red Hat corporate network, similar to our Jenkins instance itself. Access to this VM is freely provided as-needed to Red Hat Pulp team members.

nodepool

Nodepool is a service created by the Openstack project, and is responsible for creating Jenkins slave nodes for our Jenkins master. Nodepool periodically inspects the jenkins job queue, and spawns nodes matching the node label require by jobs in that queue. Once a job completes, nodepool destroys that slave node. In this way, nodepool is able to efficiently utilize the resource quotas in our Openstack tenancy by ensuring that only nodes actually needed by Jenkins are online at any given time.

Instance Types

Replace <dist> in the list below with an identifier that obviously represents a given OS distribution. For example, f24-np would be based off of a Fedora 24 Cloud image, and rhel7-vanilla-np would be based off of a RHEL 7 Cloud image.

Common node labels available to Jenkins jobs:

  • <dist>-vanilla-np: A node type with minimal changes from its base image, useful for most tasks not involving an installation of Pulp that are run on that node.
  • <dist>-np: A node type pre-loaded with dependenc.es needed to facilitate installing any version of Pulp on that node.
  • pulp-smash-np: A node type specifically for installing and running the pulp-smash test suite.

Holding Instances

It is occasionally useful, when debugging or reproducing issues, to hold a nodepool instance. Holding an instance prevents nodepool from destroying it, allowing use to get on the held instance and inspect/interact with that node.

Upon logging into the pulp automation VM, you can run the "nodepool" command. To list instances, run "nodepool list". To hold an instance, run "nodepool hold <instance id>". To log in to an instance, ssh to that instance by IP address from the automation VM. The automation VM's ssh client config will ensure that you are able log into the node with the correct user ID. Both the instance ID and IP Address are listed in the output of "nodepool list". Futhermore, the instance label and ID are both included in the instance's name, which can be seen in Jenkins itself when viewing the "Build Executor Status" block, found on the left side of Jenkins dashboard pages. the node name template is <label>-<provider>-<id>. Provider is always a single word, so a node named "rhel7-vanilla-np-cios-54482" means the node label is "rhel7-vanilla-np", and the node ID is 54482.

Rebuilding images

Nodepool rebuilds images nightly for each node label. Nodepool also keeps the previous image for a given node label, so under normal circumstances there will be two images for a label. Older images are deleted at the end of nodepool's nightly image rebuild.

Occasionally, nodepool's automated image build fails. If this happens, we normally do one of two things. We either delete the most recently created image for the affected label and fall back on the previous image, or we rebuild the image.

To delete the most recently created image, run "nodepool image-list" to get the list of images. Find the image(s) matching the affected label, and then delete the most recent image. You can see which is most recent by looking at the "Age" column and picking the one with a smaller duration, or looking at the "Version" column and deleting the one with a higher version.

To rebuild an image for a label, run "nodepool image-update <provider> <label>". The provider for an image can been seen in the output of "nodepool image-list".

Once a new image is available (either by deleting the most recent image or building a new one), any current instances running the failing image should be destroyed. It's not very easy to cross-reference images with the instances running them, so the simplest thing to do is delete all instances that appear in "nodepool list" in the "ready" state for the affected label.

Updated by semyers over 7 years ago ยท 2 revisions