XenRT User Guide

From Xen

Concepts

The XenRT test harness is a set of scripts, tools and infrastructure designed for testing the XenSource line of products. The infrastructure consists of one or more deployment sites, each managed by a controller, containing some number of test servers (computers that will run the tests). One or more sites may be linked together via a centralized job scheduler and results database. Scripts and daemons at each site invoke the harness application to actually run the tests.

The harness application, xrt, is a Python program which will execute a job (a sequence of testcases) on one or more test servers. It can be invoked under job control by the site controller daemon or it can be run by hand from the command line on a controller. It is important to note that xrt runs on the controller, not the test servers - xrt communicates with the servers over the network as necessary.

The harness application includes the framework for ordering, executing, logging, and reporting on testcases. It contains a set of libraries for common test operations including abstractions of major operations performed on the various products supported.

Testcases are Python classes defined under a separate module hierarchy to the main harness application. Testcases can utilise the library code provided by the harness. Testcases may be simple wrappers around external scripts or programs that perform the test.

Terminology

CLI The xenrt command used to interact with the job server
blocking failure A testcase that if not passed will cause subsequent dependent tests to not run
controller The main Python program running the tests
site A self-contained deployment of a contoller and a number of test servers
testcase An individual test covering a particular area of product functionality
sequence A sequence of one or more tests run by a controller
job The execution of a sequence under the instruction of the job server

Manual use of harness features

Running tests against existing VMs

The harness can be run directly from the command line. For testing and existing VMs there are two options:

1. Running only in-VM benchmarks 2. Running in-VM benchmarks and host-initiated functional tests (e.g. reboot, shutdown...)

Running benchmarks in a VM

To run a single benchmark testcase in a running VM ("guest") the following command can be used. The VM is specified by IP address.

xrt --guest 10.3.2.1 --testcase benchmarks.micro.TCiometer

This assumes the VM has been set up in a XenRT style. Windows VMs must have:

  • The Python XML-RPC execution daemon (with start-on-boot)
  • Auto login to desktop as Administrator

Linux VMs are expected to have a standard root password ("xensource").

Running VM functional and benchmark tests

Functional tests such as shutdown, reboot, suspend/resume are run via the host so XenRT will need to know where a VM is running. The following example shows how to run the tests on an existing Windows VM on a XenServer host. The VM is called winguest and conforms to the requirements in the section above.

xrt -V --host myhost.testdev.hq.xensource.com -s riowindows.seq --noprepare --no-finally -D 'VERSIONS=winguest' --skip TCWindowsInstall --skip TCDriverInstall

The --noprepare option tells the harness to look for existing VMs on the host. The VERSIONS parameter is a tuple of test VM group (only used for result reporting) and VM name. --no-finally disables the post-run cleanup which would try to uninstall the VM.

Using the centralised job server

Interface and workflow

The primary interfaces to XenRT are the command line tool xenrt and the results display web page.

The system is structured around "jobs". A job is a single run of a test sequence on a particular machine (or set of machines). Jobs can be submitted, monitored and removed using the command line tool.

Job submission

A job is submitted using the xenrt submit CLI command with parameters that control what is tested, how it is tested, constraints on hardware used to test etc.. The minimum requirement is to specify a test sequence to run (this defaults to a legacy opensource test which is of little use). See below for details of extra arguments used for different types of testing.

The result of xenrt submit will be a numeric job ID. This should be noted to allow the progress of the job to be monitored.

Monitoring jobs

The system will allocate the job to a suitable machine when one becomes available. Even when machines are free, this can take a couple of minutes to happen. The following commands can be used to monitor progress.

xenrt list: show all jobs in the system. Add "-m" to list jobs submitted by the userid running the script. Note that the revision field will be filled in with the actual changeset identifier used if that wasn't specified when the job was submitted.

xenrt status <jobid>: show parameters of the job. This includes submission details and basic starting and completion information.

xenrt showlog <jobid>: show the test by test progress of the job. This shows just the most recent result of each test. Append -v to get a more detailed report with test progress, results and comments. Note that a test only reports results and comments once it has completed.

Borrowing machines

A machine running under job control may be borrowed for a period of time. When a machine is borrowed the scheduler will not consider it when scheduling new jobs. To borrow a machine use the xenrt CLI tool: xenrt borrow <machine> [-h <hours>]

By default the lease is for 24 hours. To check the status of machines, including lease times, use: xenrt mlist2

To extend a lease just rerun the borrow command. To release a lease before the timeout use: xenrt return <machine>

The system will not alert the user when a lease expires. Continued use of a machine with an expired lease will risk having that machine disappear for a test job. Jobs that were submitted with --hold or --holdfail arguments will automatically acquire a lease on completion of the job.

Advanced features

Pause-on-fail

By default XenRT will continue after a failure as long as the failure was not declared as a blocking failure. In some cases, such as installing guests, there are some cleanup actions performed after the failure to put the host into a reasonable state for running more tests. However in some situations it is useful to have a test sequence pause when a failure is encountered to allow a user to inspect the state of the machine, gather data, triage a problem, etc.. To enable this behaviour a sequence can be given any number of "pause-on-fail" arguments. A pause-on-fail argument can be given for individual testcases by specifying the basename of the testcase (after any in-sequence renaming) which will cause only those testcases to pause on failure; or "ALL" can be specifed to cause any failure to cause a pause. The arguments can be specified on the controller or job submit command lines as: --pause-on-fail TCWindowsInstall --pause-on-fail ALL

These options are communicated via the job server as variables POF_TCWindowsInstall and POF_ALL although they are not treated as variables within the controller.

When a test configured to pause-on-fail fails the execution of that testcase enters a paused state; other parallel parts of the sequence are allowed to continue. If the EMAIL variable is set (use --email address@example.com on the controller or submission command line) the controller sends an email notifying the user that intervention is required. The pause happens after the testcase's run function has been run but before postRun.

Interaction with a running sequence is perfomed with the xenrt CLI tool's interact command which communicates to the XML-RPC server running in the controller. If the sequence was run manually then the argument to the command is the XML-RPC host:port reported by the controller when it first started. If running under job control then this information was recorded on the job server and the jobid can be used. The state of all running tests can be checked with: xenrt interact <jobid> -l

After performing any manual actions desired, the test sequence can be resumed using the CLI (jobid or host:port usage as above): xenrt interact {<jobid>|<host:port>} -c TCWindowsInstall

or to unpause any paused job(s): xenrt interact {<jobid>|<host:port>} -C

Jobs will automatically unpause themselves after 24 hours unless disabled by running: xenrt interact {<jobid>|<host:port>} -x <testcase>

If auto-unpause is disabled the user must remember to manually unpause the job when the pause is no longer required: xenrt interact {<jobid>|<host:port>} -c <testcase>

In both the manual and job control method the CLI tool talks directly to the XML-RPC daemon running on the controller. If a firewall running on the controller host or elsewhere blocks this traffic then interaction will not be possible.

Pause-on-fail directives can be added to a running job with (e.g. for TCReboot): xenrt interact 21500 -D CLIOPTIONS/PAUSE_ON_FAIL/TCReboot=yes

or a submitted but not yet running job with (e.g. for TCReboot): xenrt update 21500 POF_TCReboot yes

Provisioning Machines

XenRT can be used to provision pools, hosts and VMs for manual testing. To specify a desired set up it is necessary to create a <prepare> section in the sequence file. The prepare section can itself contain:

  • <pool> sections. These can have the optional attributes id and name. If id is not specified it defaults to 0. If name is not specified it defaults to RESOURCE_POOL_x, where x is the value of the id attribute. A <pool> section can contain <host>, <storage>, <bridge> and <vm> sections (see below).
  • <host> sections. Can have the attributes id and alias which are the same as id and name for pools. Can also contain the following sections:
  • <storage> Represents an SR and requires the attributes type and name to be specified. For type, nfs, lvmoiscsi and extoiscsi are currently supported.
  • <bridge> Represents a network. Requires a name to be specified.
  • <vm> Represents a virtual machine. Requires the name attribute to be specified and can contain the following nodes:
    • <distro> The OS version to install.
    • <vcpus> The number of vCPUs to allocate to the VM.
    • <memory> Memory, in Mb, to allocate to the VM.
    • <storage> The name of the SR to install the VM onto.
    • <arch> The architecture of the VM (for Linux).
    • <network> A network interface. The attribute device is required and bridge can be specified. The interface with device number 0 is taken to be the primary interface.
    • <disk> An additional disk. Takes the attributes device and size. Size is specified in Gb.
    • <postinstall> Anything specified in the action attribute will be interpreted as a method in objects.py to run on the VM after installation. For example, installDrivers can be used to install the PV drivers on a VM.
    • <script> The name attribute specifies a previously included script to run on a Windows VM after install.

Killing stuck jobs

If a controller running a job is terminated abnormally (e.g. using kill) then the job status needs to be updated to allow the scheduler to reuse the machines that were used by that job: xenrt complete <jobid>