Difference between revisions of "XCP Archive/XCP diagnostic messages"

From Xen
(Load information example)
Line 19: Line 19:
   
 
== Load information example ==
 
== Load information example ==
  +
  +
There will be lots of load metrics. It's unlikely that many of these will be displayed at once, rather they'll be consulted on-demand or perhaps a small set will be compared together.
   
 
The load metrics are grouped by the type of thing they refer to:
 
The load metrics are grouped by the type of thing they refer to:
Line 24: Line 26:
 
$ wget http://server/metrics
 
$ wget http://server/metrics
 
{
 
{
"Host": {"uri": "/metrics/host", "description": "host"},
+
"SR": {"uri": "/metrics/sr", "description": "storage"},
 
"VM": {"uri": "/metrics/VM", "description": "virtual machine"}
 
"VM": {"uri": "/metrics/VM", "description": "virtual machine"}
 
}
 
}
Line 62: Line 64:
   
 
== Alarm state example ==
 
== Alarm state example ==
  +
  +
   
 
== Message queue example ==
 
== Message queue example ==

Revision as of 15:50, 1 May 2013

In future we would like to have a diagnostic console which can help understand the state of an individual host (not pool). We would like to see:

  1. instantaneous load information, including
    1. disk and network throughput (bytes per sec)
    2. memory usage
    3. CPU usage
    4. number of messages per second across various internal control interfaces
    5. message latency distribution per internal service
  2. the states of "alarms", where an "alarm" is set when one or more of the load metrics crosses some threshold for some period of time. Perhaps we could 3 states and use red/amber/green (people love dashboards with traffic lights)
  3. the contents of message queues containing JSON control messages
  4. a live stream of messages, filtered with some expression
  5. a live stream of logs, filtered with some expression

Ideally the console would be entirely web-based, and the API should be designed to make that easy.

Example messages

The message format is not fixed, it can still be adjusted. Everything here is just an example.

Load information example

There will be lots of load metrics. It's unlikely that many of these will be displayed at once, rather they'll be consulted on-demand or perhaps a small set will be compared together.

The load metrics are grouped by the type of thing they refer to:

$ wget http://server/metrics
{
  "SR": {"uri": "/metrics/sr", "description": "storage"},
  "VM":   {"uri": "/metrics/VM", "description": "virtual machine"}
}

Then each type of thing has a number of instances:

$ wget http://server/metrics/VM
[ { "uuid": "foo",
    "metrics", "/metrics/VM/foo" },
  ...
]

and each instance has a number of available metrics:

$ wget http://server/metrics/VM/foo
[
  { "name": "throughput",
    "units": "bytes/sec",
    "instantaneous": "/metrics/VM/foo/throughput/instantaneous",
    "history": "/metrics/VM/foo/throughput/history"
  }
]

where "instantaneous" can be polled to retrieve a single value:

$ wget http://server/metrics/VM/foo/throughput/instantaneous
15.

and "history" can be used to fetch an array of old values, plus a uri which will block until new values are available:

$ wget http://server/metrics/VM/foo/throughput/history
{ "data": [ 1., 2., 3. ],
  "next": "/metrics/VM/foo/throughput/history/5
}

(where "5" is some kind of next data id)

Alarm state example

Message queue example

Live stream of messages / logs example