Revision as of 15:50, 1 May 2013

In future we would like to have a diagnostic console which can help understand the state of an individual host (not pool). We would like to see:

instantaneous load information, including
1. disk and network throughput (bytes per sec)
2. memory usage
3. CPU usage
4. number of messages per second across various internal control interfaces
5. message latency distribution per internal service
the states of "alarms", where an "alarm" is set when one or more of the load metrics crosses some threshold for some period of time. Perhaps we could 3 states and use red/amber/green (people love dashboards with traffic lights)
the contents of message queues containing JSON control messages
a live stream of messages, filtered with some expression
a live stream of logs, filtered with some expression

Ideally the console would be entirely web-based, and the API should be designed to make that easy.

Example messages

The message format is not fixed, it can still be adjusted. Everything here is just an example.

Load information example

There will be lots of load metrics. It's unlikely that many of these will be displayed at once, rather they'll be consulted on-demand or perhaps a small set will be compared together.

The load metrics are grouped by the type of thing they refer to:

$ wget http://server/metrics
{
  "SR": {"uri": "/metrics/sr", "description": "storage"},
  "VM":   {"uri": "/metrics/VM", "description": "virtual machine"}
}

Then each type of thing has a number of instances:

$ wget http://server/metrics/VM
[ { "uuid": "foo",
    "metrics", "/metrics/VM/foo" },
  ...
]

and each instance has a number of available metrics:

$ wget http://server/metrics/VM/foo
[
  { "name": "throughput",
    "units": "bytes/sec",
    "instantaneous": "/metrics/VM/foo/throughput/instantaneous",
    "history": "/metrics/VM/foo/throughput/history"
  }
]

where "instantaneous" can be polled to retrieve a single value:

$ wget http://server/metrics/VM/foo/throughput/instantaneous
15.

and "history" can be used to fetch an array of old values, plus a uri which will block until new values are available:

$ wget http://server/metrics/VM/foo/throughput/history
{ "data": [ 1., 2., 3. ],
  "next": "/metrics/VM/foo/throughput/history/5
}

(where "5" is some kind of next data id)

@@ Line 19: / Line 19: @@
 == Load information example ==
+There will be lots of load metrics. It's unlikely that many of these will be displayed at once, rather they'll be consulted on-demand or perhaps a small set will be compared together.
 The load metrics are grouped by the type of thing they refer to:
@@ Line 24: / Line 26: @@
  $ wget http://server/metrics
  {
-   "Host": {"uri": "/metrics/host", "description": "host"},
+   "SR": {"uri": "/metrics/sr", "description": "storage"},
    "VM":   {"uri": "/metrics/VM", "description": "virtual machine"}
  }
@@ Line 62: / Line 64: @@
 == Alarm state example ==
 == Message queue example ==

Difference between revisions of "XCP Archive/XCP diagnostic messages"

Revision as of 15:50, 1 May 2013

Contents

Example messages

Load information example

Alarm state example

Message queue example

Live stream of messages / logs example

Navigation menu

Views

Personal tools

Search

WIKI GUIDE

NAVIGATION BY INDEX

NAVIGATION BY AUDIENCE

HYPERVISOR & TOOLS

EMBEDDED/AUTOMOTIVE

UNIKERNELS

COMMUNITY

NAVIGATION BY DOC TYPE

NAVIGATION BY TECHNOLOGY

INTERACTION

Tools