Xenopsd

From Xen
Revision as of 17:09, 20 January 2015 by Dave.scott (talk | contribs) (Note that Xenopsd has been implemented and documented)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Note: Xenopsd has been implemented and is documented here: http://xapi-project.github.io/xenopsd/architecture.html

The xenops daemon

This is an attempt to improve the architecture of the XCP toolstack by splitting *domain management* into a separate service, known as the *xenops daemon*. This should have the following advantages:

  1. it helps untangle the domain management code from the rest of xapi (which will focus on multi-host resource pool management)
  2. it forces us to be explicit about the interface xapi needs, making the future libxl port more predictable and likely to succeed
  3. it forces us to be explicit about the interface xenopsd needs, preventing (for example) the accidental insertion of a expensive RPC to a remote host through an internal xapi interface in a hot codepath. This is currently a very easy mistake to make and a better architecture would make it harder.

Building from source

The current development build instructions are Building Xenopsd

Design principles

  1. xenopsd should ignore domains that it is not managing. This will allow (eg) domains started with 'xl' to safely co-exist
    1. xenopsd could therefore be told to manage one VM, like 'xenvm' (part of XCI)
    2. xenopsd could therefore be told to manage lots of VMs, like 'xapi' currently does
  2. xenopsd should be capable of servicing all guest requests without making remote RPCs. This will allow (eg) VMs to reboot in an XCP resource pool even when the master is offline.
  3. xenopsd should not require xapi to be running (although it will require a storage service to be available in order to attach new disks)
  4. xenopsd should have a "simulator" backend as well as a real "xen" backend to allow us to replace the current "XIU" simulation system which doesn't support all operations (e.g. VM migrate)
  5. xenopsd should have a simulator-based test suite to allow us to test the component at build time.
  6. xenopsd should support a level-triggered (rather than edge-triggered) event system, to allow clients (ie 'xapi') to resynchronise as quickly as possible. It will be assumed that clients always want to 'diff' their current state against the 'tip' state and that they don't care about the exact sequence of historical changes (those are left for the audit log).
  7. xenopsd should be able to handle operations on managed VMs concurrently (rather than serially) and avoid operations on one VM having dependencies on other VMs (eg VM_suspend should not depend on a dom0 VBD_plug)
  8. xenopsd should be aware of driver-domains; there shall be no hardcoded "domid = 0" anywhere in the code
  9. xenopsd should support cancellation for all blocking operations

Design overview

At the high-level, we'll have a JSON-rpc interface whose concrete IDL is created with rpc-light. This will support operations to:

  1. add/modify/remove VM metadata
  2. start/stop/suspend/resume/migrate VMs
  3. blocking poll for events

Operations received through the interface will either be processed immediately (eg an event poll) or be queued, with one queue per VM. This ensures that VM operations are handled sequentially. The queue contents will be introspectable and cancellable through the API.

A thread pool with a configurable number of threads will take operations from the per-VM queues and execute them.

Machine-affecting operations (e.g. domain create, invocations of libxenguest and, in the future, libxl) will be through a backend plugin. There will be two plugins initially:

  1. xen: this will use the existing xapi domain management code
  2. simulator: this will pretend to manage domains without actually generating any side-effects.

The backend interface will be as simple as possible. The backend will be selectable at load-time.

The "xen" backend will monitor the domains running on a host, and look for:

  1. domain creations
  2. domain shutdowns
  3. xenstore device frontend/backend changes

Any changes on a managed VM will be converted into special messages:

  1. VM_check_state
  2. VBD_check_state

These messages will be put in the appropriate per-VM queue and will cause the higher-level logic to poll the current state of the device (which may have changed since the event was triggered) and

  1. initiate reboots
  2. cleanup device state (e.g. call the SMAPI vdi_detach)

The "simulator" backend will support a backdoor "trigger" operation which will allow a test harness to inject events.

Xenopsd-internals.svg

Interfacing with xapi

The XCP toolstack, 'xapi' will defer to 'xenopsd' for all VM lifecycle operations. We shall have the following principles:

  1. when a VM.resident_on is set, "dynamic" VM properties (e.g. guest agent version) shall be owned by the specified host, updated on receipt of events from xenopsd
  2. when a "static" VM property is set (e.g. name_label), xapi shall reset the VM metadata in xenopsd so the property change takes effect on next reboot, on recept of events from the master xapi
  3. on any event error (whether event from xenopsd or event from xapi), the code shall resynchronise from scratch

The XenAPI VM.start operation shall look as follows:

  1. on the master:
    1. acquire resources (e.g. choose a host with enough capacity)
    2. set VM.scheduled_to_be_resident_on to the specified host
    3. forward the XenAPI call
  2. on the host:
    1. import the latest version of the metadata into xenopsd (so it doesn't depend on the pool db)
    2. call VM.start on xenopsd
    3. set VM.resident_on
    4. insert a barrier into the xenopsd event stream, to ensure that all generated events are processed before the XenAPI call returns.


Code status

The stable version of xenopsd was merged into XCP 1.6 and lives here:

https://github.com/xen-org/xen-api/tree/master

An experimental development version lives here:

https://github.com/djs55/xenopsd/tree/master