Difference between revisions of "Live-Updating Xen"

From Xen
(Live-Updating Xen)
(Posted upstream for review)
(3 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
== Current State ==
 
== Current State ==
   
  +
=== Merged upstream ===
* [https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/log/?h=v2.0.20-rc1 kexec work merged for v2.0.20]
 
  +
* PV domU serialization work ongoing
 
  +
* [https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/log/?h=v2.0.20 Multiboot2 support (i.e. relocation support) merged in kexec-tools v2.0.20]
  +
  +
=== Posted upstream for review ===
  +
  +
* Physical memory management over kexec [https://xenbits.xen.org/gitweb/?p=people/dwmw2/xen.git;a=blob;f=docs/specs/live-update-handover.pandoc;hb=refs/heads/lu-master Handover protocol documentation] [http://david.woodhou.se/live-update-handover.pdf Potentially out of date PDF version]
  +
* Management of live update data stream passing domains' state from Running Xen to Target Xen.
  +
* Definition of state record format based on migration stream record format.
  +
* Reservation of domain-owned pages in Target Xen as heap allocator starts up.
  +
  +
=== In development hacks ===
  +
  +
* PV domain save/restore over kexec with certain caveats.
  +
  +
<dwmw2_gone> [root@localhost ~]# xl info | grep cc_compile_date
  +
<dwmw2_gone> cc_compile_date : Wed Jan 22 21:10:38 GMT 2020
  +
<dwmw2_gone> [root@localhost ~]# KEXEC_LIVE_UPATE=1 ./kexec-tools/build/sbin/kexec xen2 --append="console=vga,com1 crashkernel=128M<4G no-real-mode insert_l1d_flush=0 dom0_max_vcpus=1 liveupdate=128M@2936M:0xb7800000" --mem-min=0xaf800000 -t multiboot2-x86 -f
  +
<dwmw2_gone> can't get linerar framebuffer address
  +
<dwmw2_gone> kexec failed: Invalid argument
  +
<dwmw2_gone> [root@localhost ~]# xl info | grep cc_compile_date
  +
<dwmw2_gone> cc_compile_date : Wed Jan 22 21:45:36 GMT 2020
  +
<dwmw2_gone> Wheee. Really must fix that -EINVAL :)
  +
<andyhhp> is that a kexec reload actually preserving dom0?
  +
<dwmw2_gone> yep
  +
<andyhhp> ship it :)
  +
<dwmw2_gone> a carefully configured dom0 with 2l event channels, one vcpu
  +
  +
   
 
Initial proof-of-concept with patches from Varad's tree (link below) - no kexec involved:
 
Initial proof-of-concept with patches from Varad's tree (link below) - no kexec involved:
Line 17: Line 44:
   
 
== Development trees ==
 
== Development trees ==
http://git.infradead.org/users/dwmw2/xen.git/shortlog/refs/heads/bootcleanup
+
* http://git.infradead.org/users/dwmw2/xen.git/shortlog/refs/heads/bootcleanup
  +
* https://xenbits.xen.org/gitweb/?p=people/dwmw2/xen.git;a=shortlog;h=refs/heads/lu-master
 
https://github.com/varadgautam/xen/tree/liveupdate-devel
+
* https://github.com/varadgautam/xen/tree/liveupdate-devel
   
 
== TODO ==
 
== TODO ==

Revision as of 09:03, 29 January 2020

Live-Updating Xen

Current State

Merged upstream

Posted upstream for review

  • Physical memory management over kexec Handover protocol documentation Potentially out of date PDF version
  • Management of live update data stream passing domains' state from Running Xen to Target Xen.
  • Definition of state record format based on migration stream record format.
  • Reservation of domain-owned pages in Target Xen as heap allocator starts up.

In development hacks

  • PV domain save/restore over kexec with certain caveats.
<dwmw2_gone>    [root@localhost ~]# xl info | grep cc_compile_date
<dwmw2_gone>    cc_compile_date        : Wed Jan 22 21:10:38 GMT 2020
<dwmw2_gone>    [root@localhost ~]# KEXEC_LIVE_UPATE=1  ./kexec-tools/build/sbin/kexec xen2 --append="console=vga,com1 crashkernel=128M<4G no-real-mode insert_l1d_flush=0 dom0_max_vcpus=1 liveupdate=128M@2936M:0xb7800000"  --mem-min=0xaf800000 -t multiboot2-x86 -f
<dwmw2_gone>    can't get linerar framebuffer address
<dwmw2_gone>    kexec failed: Invalid argument
<dwmw2_gone>    [root@localhost ~]# xl info | grep cc_compile_date
<dwmw2_gone>    cc_compile_date        : Wed Jan 22 21:45:36 GMT 2020
<dwmw2_gone>    Wheee. Really must fix that -EINVAL :)
<andyhhp>       is that a kexec reload actually preserving dom0?
<dwmw2_gone>    yep
<andyhhp>       ship it :)
<dwmw2_gone>    a carefully configured dom0 with 2l event channels, one vcpu


Initial proof-of-concept with patches from Varad's tree (link below) - no kexec involved:

Boot xen with domkill_leakguest cmdline param.
Save a PV domain state, leave guest memory in the RAM:
# xl save -s domU domU.img 

Restore domain state reusing magic mfns. The shared_info page contents are preserved:
# xl restore -T domU.img <l3tab_mfn> <l2tab_mfn> <shared_info_mfn>

TODO: Restore console, reconstruct guest pagetables from shared_info.

Development trees

TODO

This list will move to the JIRA instance

  • Devel milestone: PV domU persists across domain destroy/create
  • Dom0 persists across kexec
  • HVM guests persist across kexec
  • PV guests persist across kexec
  • One guest persists across kexec
  • Multiple guests persist across kexec
  • Guests exercise workloads
  • Update to same Xen binary as the Target Xen
  • Update to a Xen binary with a minor change, like a new printk
  • Update to a Xen binary with a fix for an XSA
  • Update to a new minor version
  • Update to a new major version

More information

Design Session Notes from Xen Summit 2019

  • Brief project overview:
    • We want to build Xen Live-update
    • early prototyping phase
    • IDEA: change running hypervisor to new one without guest disruptions
    • Reasons:
      • Security - we might need an updated versions for vuln mitigation
      • Development cycle accelaration - fast switch to hypervisor during dev
      • Maintainability - reduce version diversity in the fleet
    • We are currently eyeing a combination of guest transparent live migration and kexec into a new xen buildb
    • For more details: Live-Update talk
  • Terminology:
    • Running Xen -> The xen running on the host before update (Source)
    • Target Xen -> The xen we are updating *to*
  • Design discussions:
  • Live-update ties into multiple other projects currently done in the Xen-project:
    • Secret free Xen: reduce the footprint of guest relevant data in Xen
      • less state we might have to handle in the live update case
    • dom0less: bootstrap domains without the involvement of dom0
      • this might come in handy to at least setup and continue dom0 on target xen
      • If we have this this might also enable us to de-serialize the state for other guest-domains in xen and not have to wait for dom0 to do this
  • We want to just keep domain and hardware state
    • Xen is supposedly completely to be exchanged
    • We have to keep around the IOMMU page tables and do not touch them
      • this might also come in handy for some newer UEFI boot related issues?
      • We might have to go and reinject certain interrupts
    • do we need to dis-aggregate xenheap and domheap here?
      • We are currently trying to avoid this
  • A key stepstone for Live-update is guest transparent live migration
    • This means we are using a well defined ABI for saving/restoring domain state
      • We do only rely on domain state and no internal xen state
    • The idea is to migrate the guest not from one machine to another (in space) but on the same machine from one hypervisor to another (in time)
    • In addition we want to keep as much as possible in memory unchanged and feed this back to the target domain in order to save time
    • This means we will need additional info on those memory areas and have to be super careful not to stomp over them while starting the target xen
    • for live migration: domid is a problem in this case
      • randomize and pray does not work on smaller fleets
      • this is not a problem for live-update
      • BUT: as a community we shoudl make this restriction go away
  • Exchanging the Hypervisor using kexec
    • We have patches on upstream kexec-tools merged that enable multiboot2 for Xen
    • We can now load the target xen binary to the crashdump region to not stomp over any valuable date we might need later
    • But using the crashdump region for this has drawbacks when it comes to debugging and we might want to think about this later
      • What happens when live-update goes wrong?
      • Option: Increase Crashdump region size and partition it or have a separate reserved live-update region to load the target xen into
      • Separate region or partitoned region is not a priority for V1 but should be on the road map for future versions
  • Who serializes and deserializes domain state?
    • dom0: This should work fine, but who does this for dom0 itself?
    • Xen: This will need some more work, but might covered mostly by the dom0less effort on the arm side
      • this will need some work for x86, but Stefano does not consider this a lot of work
    • This would mean: serialize domain state into multiboot module and set domains up after kexecing xen in the dom0less manner
      • make multiboot module general enough so we can tag it as boot/resume/create/etc.
        • this will also enable us to do per-guest feature enablement
        • finer granular than specifying on cmdline
        • cmdline stuff is mostly broken, needs to be fixed for nested either way
        • domain create flags is a mess
  • Live update instead of crashdump?
    • Can we use such capabilities to recover from a crash be "restarting" xen on a crash?
      • live updating into (the same) xen on crash
    • crashing is a good mechanism because it happens if something is really broken and most likely not recoverable
    • Live update should be a concious process and not something you do as reaction to a crash
      • something is really broken if we crash
      • we should not proactively restart xen on crash
        • we might run into crash loops
    • maybe this can be done in the future, but it is not changing anything for the design
      • if anybody wants to wire this up once live update is there, that should not be too hard
      • then you want to think about: scattering the domains to multiple other hosts to not keep them on broken machines
  • We should use this opportunity to clean up certain parts of the code base:
    • interface for domain information is a mess
      • HVM and PV have some shared data but completely different ways of accessing it
  • Volume of patches:
    • Live update: still developing, we do not know yet
    • guest transparent live migration:
      • We have roughly 100 patches over time
      • we believe most of this has just to be cleaned up/squashed and will land us at a reasonable much lower number
      • this also needs 2-3 dom0 kernel patches
  • Summary of action items:
    • coordinate with dom0less effort on what we can use and contribute there
    • fix the domid clash problem
    • Decision on usage of crash kernel area
    • fix live migration patch set to include yet unsupported backends
      • clean up the patch set
      • upstream it
  • Longer term vision:
    • Have a tiny hypervisor between Guest and Xen that handles the common cases
      • this enables (almost) zero downtime for the guest
      • the tiny hypervisor will maintain the guest while the underlying xen is kexecing into new build
  • Somebody someday will want to get rid of the long tail of old xen versions in a fleet
    • live patch old running versions with live update capability?
    • crashdumping into a new hypervisor?
      • "crazy idea" but this will likely come up at some point