Live-Updating Xen

Current State

Merged upstream

Multiboot2 support (i.e. relocation support) merged in kexec-tools v2.0.20

Posted upstream, in review

Early cleanups and fixes (v1): https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg00000.html

TODO: For early `vmap()` we really want to make it officially OK to free boot-allocated pages with `free_xenheap_pages()` and even `free_domheap_pages()`. This involves fixing the esoteric corner cases in which it currently (rarely) doesn't quite work. Plan is to merge the `PGC_allocated` bit into the `PGC_state` bits, giving us 3 bits which can encode 8 states, of which 6 are currently valid: { inuse, offlining, broken_offlining, offline, broken, free }. We use the all-zeroes as 'never touched by the heap' moving inuse to 1, and then free_xenheap_pages() and free_domheap_pages() can check for that and call init_heap_pages() instead of free_heap_pages() if necessary. And we have one spare state for future use. (Varad)

Posted as RFC

Physical memory management over kexec Handover protocol documentation Potentially out of date PDF version
Management of live update data stream passing domains' state from Running Xen to Target Xen.
Definition of state record format based on migration stream record format.
Reservation of domain-owned pages in Target Xen as heap allocator starts up.

In development hacks

PV domain save/restore over kexec with certain caveats.

<dwmw2_gone>    [root@localhost ~]# xl info | grep cc_compile_date
<dwmw2_gone>    cc_compile_date        : Wed Jan 22 21:10:38 GMT 2020
<dwmw2_gone>    [root@localhost ~]# KEXEC_LIVE_UPATE=1  ./kexec-tools/build/sbin/kexec xen2 --append="console=vga,com1 crashkernel=128M<4G no-real-mode insert_l1d_flush=0 dom0_max_vcpus=1 liveupdate=128M@2936M:0xb7800000"  --mem-min=0xaf800000 -t multiboot2-x86 -f
<dwmw2_gone>    can't get linerar framebuffer address
<dwmw2_gone>    kexec failed: Invalid argument
<dwmw2_gone>    [root@localhost ~]# xl info | grep cc_compile_date
<dwmw2_gone>    cc_compile_date        : Wed Jan 22 21:45:36 GMT 2020
<dwmw2_gone>    Wheee. Really must fix that -EINVAL :)
<andyhhp>       is that a kexec reload actually preserving dom0?
<dwmw2_gone>    yep
<andyhhp>       ship it :)
<dwmw2_gone>    a carefully configured dom0 with 2l event channels, one vcpu

Being worked on

Pass M2P over (dwmw2)
Refactor internal LU_DOMAIN_INFO record to post upstream (dwmw2)
Refactor internal page list record into a single uint64_t per contiguous range of MFNs of the same time (dwmw2)
Continue fixing PV Dom0 (Julien / Varad)
- Support more than one vCPU (Varad)
- FIFO event channels (Varad)
Refactor internal PV save/restore (once it's all working perfectly) for posting upstream; especially all the hacks through domain creation (TBD)
Save/restore HVM domains (Julien)
Upstreaming Guest transparent HVM migration support (Paul)

kexec-tools `--live-update` support including memory layout based on `KEXEC_RANGE_MA_LIVEUPDATE` and `liveupdate=` command line. (Varad)

Initial proof-of-concept with patches from Varad's tree (link below) - no kexec involved:

Boot xen with domkill_leakguest cmdline param.
Save a PV domain state, leave guest memory in the RAM:
# xl save -s domU domU.img 

Restore domain state reusing magic mfns. The shared_info page contents are preserved:
# xl restore -T domU.img <l3tab_mfn> <l2tab_mfn> <shared_info_mfn>

TODO: Restore console, reconstruct guest pagetables from shared_info.

Development trees

TODO

This list will move to the JIRA instance

Devel milestone: PV domU persists across domain destroy/create
Dom0 persists across kexec
HVM guests persist across kexec
PV guests persist across kexec
One guest persists across kexec
Multiple guests persist across kexec
Guests exercise workloads
Update to same Xen binary as the Target Xen
Update to a Xen binary with a minor change, like a new printk
Update to a Xen binary with a fix for an XSA
Update to a new minor version
Update to a new major version

More information

Design Session Notes from Xen Summit 2019

Brief project overview:
- We want to build Xen Live-update
- early prototyping phase
- IDEA: change running hypervisor to new one without guest disruptions
- Reasons:
  - Security - we might need an updated versions for vuln mitigation
  - Development cycle accelaration - fast switch to hypervisor during dev
  - Maintainability - reduce version diversity in the fleet
- We are currently eyeing a combination of guest transparent live migration and kexec into a new xen buildb
- For more details: Live-Update talk

Terminology:
- Running Xen -> The xen running on the host before update (Source)
- Target Xen -> The xen we are updating *to*

Design discussions:

Live-update ties into multiple other projects currently done in the Xen-project:
- Secret free Xen: reduce the footprint of guest relevant data in Xen
  - less state we might have to handle in the live update case
- dom0less: bootstrap domains without the involvement of dom0
  - this might come in handy to at least setup and continue dom0 on target xen
  - If we have this this might also enable us to de-serialize the state for other guest-domains in xen and not have to wait for dom0 to do this

We want to just keep domain and hardware state
- Xen is supposedly completely to be exchanged
- We have to keep around the IOMMU page tables and do not touch them
  - this might also come in handy for some newer UEFI boot related issues?
  - We might have to go and reinject certain interrupts
- do we need to dis-aggregate xenheap and domheap here?
  - We are currently trying to avoid this

A key stepstone for Live-update is guest transparent live migration
- This means we are using a well defined ABI for saving/restoring domain state
  - We do only rely on domain state and no internal xen state
- The idea is to migrate the guest not from one machine to another (in space) but on the same machine from one hypervisor to another (in time)
- In addition we want to keep as much as possible in memory unchanged and feed this back to the target domain in order to save time
- This means we will need additional info on those memory areas and have to be super careful not to stomp over them while starting the target xen
- for live migration: domid is a problem in this case
  - randomize and pray does not work on smaller fleets
  - this is not a problem for live-update
  - BUT: as a community we shoudl make this restriction go away

Exchanging the Hypervisor using kexec
- We have patches on upstream kexec-tools merged that enable multiboot2 for Xen
- We can now load the target xen binary to the crashdump region to not stomp over any valuable date we might need later
- But using the crashdump region for this has drawbacks when it comes to debugging and we might want to think about this later
  - What happens when live-update goes wrong?
  - Option: Increase Crashdump region size and partition it or have a separate reserved live-update region to load the target xen into
  - Separate region or partitoned region is not a priority for V1 but should be on the road map for future versions

Who serializes and deserializes domain state?
- dom0: This should work fine, but who does this for dom0 itself?
- Xen: This will need some more work, but might covered mostly by the dom0less effort on the arm side
  - this will need some work for x86, but Stefano does not consider this a lot of work
- This would mean: serialize domain state into multiboot module and set domains up after kexecing xen in the dom0less manner
  - make multiboot module general enough so we can tag it as boot/resume/create/etc.
    - this will also enable us to do per-guest feature enablement
    - finer granular than specifying on cmdline
    - cmdline stuff is mostly broken, needs to be fixed for nested either way
    - domain create flags is a mess

Live update instead of crashdump?
- Can we use such capabilities to recover from a crash be "restarting" xen on a crash?
  - live updating into (the same) xen on crash
- crashing is a good mechanism because it happens if something is really broken and most likely not recoverable
- Live update should be a concious process and not something you do as reaction to a crash
  - something is really broken if we crash
  - we should not proactively restart xen on crash
    - we might run into crash loops
- maybe this can be done in the future, but it is not changing anything for the design
  - if anybody wants to wire this up once live update is there, that should not be too hard
  - then you want to think about: scattering the domains to multiple other hosts to not keep them on broken machines

We should use this opportunity to clean up certain parts of the code base:
- interface for domain information is a mess
  - HVM and PV have some shared data but completely different ways of accessing it

Volume of patches:
- Live update: still developing, we do not know yet
- guest transparent live migration:
  - We have roughly 100 patches over time
  - we believe most of this has just to be cleaned up/squashed and will land us at a reasonable much lower number
  - this also needs 2-3 dom0 kernel patches

Summary of action items:
- coordinate with dom0less effort on what we can use and contribute there
- fix the domid clash problem
- Decision on usage of crash kernel area
- fix live migration patch set to include yet unsupported backends
  - clean up the patch set
  - upstream it

Longer term vision:
- Have a tiny hypervisor between Guest and Xen that handles the common cases
  - this enables (almost) zero downtime for the guest
  - the tiny hypervisor will maintain the guest while the underlying xen is kexecing into new build

Somebody someday will want to get rid of the long tail of old xen versions in a fleet
- live patch old running versions with live update capability?
- crashdumping into a new hypervisor?
  - "crazy idea" but this will likely come up at some point

Live-Updating Xen

Contents

Live-Updating Xen

Current State

Merged upstream

Posted upstream, in review

Posted as RFC

In development hacks

Being worked on

Development trees

TODO

More information

Design Session Notes from Xen Summit 2019

Navigation menu

Views

Personal tools

Search

WIKI GUIDE

NAVIGATION BY INDEX

NAVIGATION BY AUDIENCE

HYPERVISOR & TOOLS

EMBEDDED/AUTOMOTIVE

UNIKERNELS

COMMUNITY

NAVIGATION BY DOC TYPE

NAVIGATION BY TECHNOLOGY

INTERACTION

Tools