XAPI RRDs

From Xen

xapi stores persistent performance data in 'round robin databases' (RRDs). Each of these is a fixed size structure containing data at multiple resolutions. 'Data sources' are sampled every few seconds and points are added to the highest resolution RRD. Periodically each high-frequency RRD is 'consolidated' (e.g. averaged) to produce a data point for a lower-frequency RRD.

Jon Ludlam writes:

RRDs are maintained for individual VMs (including dom0) and the host. Internally, the RRD is updated and maintained by a module similar to (but not actually) rrdtool (http://oss.oetiker.ch/rrdtool/). RRDs are resident on the host on which the VM is running, or the pool master when the VM is not running. For this reason, to obtain the data requires knowledge of where the VM is running.

The RRDs can be extracted via a http handler registered at /host_rrd or /vm_rrd. Both of these require authentication to retrieve the data, either by use of http auth, or by a XenAPI session. The session authentication works by passing the 'OpaqueRef:...' as the query parameter 'session_id'. The vm_rrd handler requires the uuid of the VM, passed by the query parameter 'uuid'. As an example, connecting to the URI 'http://<server>/host_rrd' using a web browser will prompt for a username and password, then get an XML dump of the RRD. This XML is documented on the rrdtool webpage, and can be imported into rrdtool itself for further analysis. Example URIs:

http://localhost/host_rrd?session_id=OpaqueRef:abcd.... http://localhost/vm_rrd?uuid=abcd...&session_id=OpaqueRef:cdef...

It's my expectation that this won't be used as often as the following updates URI.

Recent updates to the RRDs can also be queried so that the entire RRD does not need to be downloaded when you already have most of the data. This is via another http handler: /rrd_updates. Once again this needs to be authenticated either using a session or http auth. The parameter 'start' needs to be provided, and is a Unix-type 'number of seconds since Jan 1 1970'. This will provide data in an rrdtool 'xport' style xml format, for every VM resident on the particular host that is being queried. In order to differentiate which column in the export is associated with which VM, the 'legend' field is prefixed with the VM's uuid. Also the type of archive from which it came is also prefixed, e.g. AVERAGE or MIN, etc. To obtain host updates too, use the query parameter 'host=true'. Example URI:

http://localhost/rrd_updates?session_id=...&cf=AVERAGE&start=1000000000&host=true

Example host rrd (note that a VM rrd is identically structured, but with different data sources):


<?xml version="1.0"?>
<rrd>
  <version>0003</version>
  <step>5</step>
  <lastupdate>1213616574</lastupdate>
  <ds>
    <name>memory_total_kib</name>
    <type>GAUGE</type>
    <minimal_heartbeat>300.0000</minimal_heartbeat>
    <min>0.0</min>
    <max>Infinity</max>
    <last_ds>2070172</last_ds>
    <value>9631315.6300</value>
    <unknown_sec>0</unknown_sec>
  </ds>
  <ds>
   <!-- other dss - the order of the data sources is important
        and defines the ordering of the columns in the archives below -->
  </ds>
  <rra>
    <cf>AVERAGE</cf>
    <pdp_per_row>1</pdp_per_row>
     <params>
      <xff>0.5000</xff>
    </params>
    <cdp_prep> <!-- This is for internal use -->
      <ds>
        <primary_value>0.0</primary_value>
        <secondary_value>0.0</secondary_value>
        <value>0.0</value>
        <unknown_datapoints>0</unknown_datapoints>
      </ds>
      ...other dss - internal use only...
    </cdp_prep>
    <database>
     <row>
        <v>2070172.0000</v>  <!-- columns correspond to the DSs defined above -->
        <v>1756408.0000</v>
        <v>0.0</v>
        <v>0.0</v>
        <v>732.2130</v>
        <v>0.0</v>
        <v>782.9186</v>
        <v>0.0</v>
        <v>647.0431</v>
        <v>0.0</v>
        <v>0.0001</v>
        <v>0.0268</v>
        <v>0.0100</v>
        <v>0.0</v>
        <v>615.1072</v>
     </row>
     ...
  </rra>
  ... other archives ...
</rrd>

Example rrd_updates - only 1 VM present, no host updates:


<xport>
  <meta>
    <start>1213578000</start>
    <step>3600</step>
    <end>1213617600</end>
    <rows>12</rows>
    <columns>12</columns>
    <legend>
      <entry>AVERAGE:vm:ecd8d7a0-1be3-4d91-bd0e-4888c0e30ab3:cpu1</entry> <!-- nb - each data source might have multiple entries for different consolidation functions -->
      <entry>AVERAGE:vm:ecd8d7a0-1be3-4d91-bd0e-4888c0e30ab3:cpu0</entry>
      <entry>AVERAGE:vm:ecd8d7a0-1be3-4d91-bd0e-4888c0e30ab3:memory</entry>
      <entry>MIN:vm:ecd8d7a0-1be3-4d91-bd0e-4888c0e30ab3:cpu1</entry>
      <entry>MIN:vm:ecd8d7a0-1be3-4d91-bd0e-4888c0e30ab3:cpu0</entry>
      <entry>MIN:vm:ecd8d7a0-1be3-4d91-bd0e-4888c0e30ab3:memory</entry>
      <entry>MAX:vm:ecd8d7a0-1be3-4d91-bd0e-4888c0e30ab3:cpu1</entry>
      <entry>MAX:vm:ecd8d7a0-1be3-4d91-bd0e-4888c0e30ab3:cpu0</entry>
      <entry>MAX:vm:ecd8d7a0-1be3-4d91-bd0e-4888c0e30ab3:memory</entry>
      <entry>LAST:vm:ecd8d7a0-1be3-4d91-bd0e-4888c0e30ab3:cpu1</entry>
      <entry>LAST:vm:ecd8d7a0-1be3-4d91-bd0e-4888c0e30ab3:cpu0</entry>
      <entry>LAST:vm:ecd8d7a0-1be3-4d91-bd0e-4888c0e30ab3:memory</entry>
    </legend>
  </meta>
  <data>
    <row>
      <t>1213617600</t>
      <v>0.0</v> <!-- once again, the order or the columns is defined by the legend above -->
      <v>0.0282</v>
      <v>209715200.0000</v>
      <v>0.0</v>
      <v>0.0201</v>
      <v>209715200.0000</v>
      <v>0.0</v>
      <v>0.0445</v>
      <v>209715200.0000</v>
      <v>0.0</v>
      <v>0.0243</v>
      <v>209715200.0000</v>
    </row>
   ...
  </data>
</xport>

Here is some sample python code to read and parse the rrd_updates - it uses the local domain socket to log in to xapi, and hence the code must be executed on the server. To modify it to work off-box, simply change the bit that gets the session and the URI it's requesting:


#!/usr/bin/python
#
# Example code for reading RRDs
# Contact: Jon Ludlam (jonathan.ludlam@eu.citrix.com)
#
# Mostly this script is taken from perfmon, by Alex Zeffert
#
import XenAPI
import urllib
from xml.dom import minidom
from xml.parsers.expat import ExpatError
import time
# Per VM dictionary (used by RRDUpdates to look up column numbers by variable names)
class VMReport(dict):
    """Used internally by RRDUpdates"""
    def __init__(self, uuid):
        self.uuid = uuid
# Per Host dictionary (used by RRDUpdates to look up column numbers by variable names)
class HostReport(dict):
    """Used internally by RRDUpdates"""
    def __init__(self, uuid):
        self.uuid = uuid
class RRDUpdates:
    """ Object used to get and parse the output the http://localhost/rrd_udpates?...
    """
    def __init__(self):
        # params are what get passed to the CGI executable in the URL
        self.params = dict()
        self.params['start'] = int(time.time()) - 1000 # For demo purposes!
        self.params['host'] = 'true'   # include data for host (as well as for VMs)
        self.params['cf'] = 'AVERAGE'  # consolidation function, each sample averages 12 from the 5 second RRD
        self.params['interval'] = '60'
    def get_nrows(self):
        return self.rows
    def get_vm_list(self):
        return self.vm_reports.keys()
    def get_vm_param_list(self, uuid):
        report = self.vm_reports[uuid]
        if not report:
            return []
        return report.keys()
    def get_vm_data(self, uuid, param, row):
        report = self.vm_reports[uuid]
        col = report[param]
        return self.__lookup_data(col, row)
    def get_host_uuid(self):
        report = self.host_report
        if not report:
            return None
        return report.uuid
    def get_host_param_list(self):
        report = self.host_report
        if not report:
            return []
        return report.keys()
    def get_host_data(self, param, row):
        report = self.host_report
        col = report[param]
        return self.__lookup_data(col, row)
    def get_row_time(self,row):
        return self.__lookup_timestamp(row)
    # extract float from value (<v>) node by col,row
    def __lookup_data(self, col, row):
        # Note: the <rows> nodes are in reverse chronological order, and comprise
        # a timestamp <t> node, followed by self.columns data <v> nodes
        node = self.data_node.childNodes[self.rows - 1 - row].childNodes[col+1]
        return float(node.firstChild.toxml()) # node.firstChild should have nodeType TEXT_NODE
    # extract int from value (<t>) node by row
    def __lookup_timestamp(self, row):
        # Note: the <rows> nodes are in reverse chronological order, and comprise
        # a timestamp <t> node, followed by self.columns data <v> nodes
        node = self.data_node.childNodes[self.rows - 1 - row].childNodes[0]
        return int(node.firstChild.toxml()) # node.firstChild should have nodeType TEXT_NODE
    def refresh(self, session, override_params = {}):
        params = override_params
        params['session_id'] = session
        params.update(self.params)
        paramstr = "&".join(["%s=%s"  % (k,params[k]) for k in params])
        # this is better than urllib.urlopen() as it raises an Exception on http 401 'Unauthorised' error
        # rather than drop into interactive mode
        sock = urllib.URLopener().open("http://localhost/rrd_updates?%s" % paramstr)
        xmlsource = sock.read()
        sock.close()
        xmldoc = minidom.parseString(xmlsource)
        self.__parse_xmldoc(xmldoc)
        # Update the time used on the next run
        self.params['start'] = self.end_time + 1 # avoid retrieving same data twice
    def __parse_xmldoc(self, xmldoc):
        # The 1st node contains meta data (description of the data)
        # The 2nd node contains the data
        self.meta_node = xmldoc.firstChild.childNodes[0]
        self.data_node = xmldoc.firstChild.childNodes[1]
        def lookup_metadata_bytag(name):
            return int (self.meta_node.getElementsByTagName(name)[0].firstChild.toxml())
        # rows = number of samples per variable
        # columns = number of variables
        self.rows = lookup_metadata_bytag('rows')
        self.columns = lookup_metadata_bytag('columns')
        # These indicate the period covered by the data
        self.start_time = lookup_metadata_bytag('start')
        self.step_time  = lookup_metadata_bytag('step')
        self.end_time   = lookup_metadata_bytag('end')
        # the <legend> Node describes the variables
        self.legend = self.meta_node.getElementsByTagName('legend')[0]
        # vm_reports matches uuid to per VM report
        self.vm_reports = {}
        # There is just one host_report and its uuid should not change!
        self.host_report = None
        # Handle each column.  (I.e. each variable)
        for col in range(self.columns):
            self.__handle_col(col)
    def __handle_col(self, col):
        # work out how to interpret col from the legend
        col_meta_data = self.legend.childNodes[col].firstChild.toxml()
        # vm_or_host will be 'vm' or 'host'.  Note that the Control domain counts as a VM!
        (cf, vm_or_host, uuid, param) = col_meta_data.split(':')
        if vm_or_host == 'vm':
            # Create a report for this VM if it doesn't exist
            if not self.vm_reports.has_key(uuid):
                self.vm_reports[uuid] = VMReport(uuid)
            # Update the VMReport with the col data and meta data
            vm_report = self.vm_reports[uuid]
            vm_report[param] = col
        elif vm_or_host == 'host':
            # Create a report for the host if it doesn't exist
            if not self.host_report:
                self.host_report = HostReport(uuid)
            elif self.host_report.uuid != uuid:
                raise PerfMonException, "Host UUID changed: (was %s, is %s)" % (self.host_report.uuid, uuid)
            # Update the HostReport with the col data and meta data
            self.host_report[param] = col
        else:
            raise PerfMonException, "Invalid string in <legend>: %s" % col_meta_data
def main():
    xapi = XenAPI.xapi_local();
    xapi.login_with_password("","")
    session=xapi._session
    rrd_updates = RRDUpdates()
    rrd_updates.refresh(session,{})
    for uuid in rrd_updates.get_vm_list():
        print "Got values for VM: "+uuid
        for param in rrd_updates.get_vm_param_list(uuid):
            print "param: "+param
            data=""
            for row in range(rrd_updates.get_nrows()):
                data=data+"(%d,%f) " % (rrd_updates.get_row_time(row),
                                        rrd_updates.get_vm_data(uuid,param,row))
            print data
main()

Definitions of counters

Host

memory_total_kib
memory_free_kib
xapi_memory_usage_kib
xapi_free_memory_kib
xapi_live_memory_kib
xapi_allocation_kib
cpuN
loadavg
pif_IF_rx
pif_IF_tx
pif_IF_rx_errors
pif_IF_tx_errors
sr_SR_cache_size
sr_SR_cache_hits
sr_SR_cache_misses

VM

cpuN
memory Memory currently allocated to VM(Bytes)
memory_target
memory_internal_free Free memory size(KB) reported by the guest agent
vif_IF_tx
vif_IF_rx
vif_IF_tx_errors
vif_IF_rx_errors
vbd_DEV_write
vbd_DEV_read
vbd_DEV_write_latency
vbd_DEV_read_latency
runstate_fullrun
runstate_full_contention
runstate_concurrency_hazard
runstate_blocked
runstate_partial_run
runstate_partial_contention

See Also

http://community.citrix.com/display/xs/Using+XenServer+RRDs