Understanding the Virtualization Spectrum

From Xen

If you know anything about Xen Project, you have probably heard terms like PV, HVM, PVHVM, and PVH. Initially, these terms can be very confusing. But, with a little explanation, it is possible to understand the what all these terms mean without suffering an aneurism.

Full Virtualization

In the early days of virtualization (at least in the x86 world), the assumption was that you needed your hypervisor to provide a virtual machine that was functionally nearly identical to a real machine. This included the following aspects:

  • Disk and network devices
  • Interrupts and timers
  • Emulated platform: motherboard, device buses, BIOS
  • “Legacy” boot: i.e., starting in 16-bit mode and bootstrapping up to 64-bit mode
  • Privileged instructions
  • Pagetables (memory access)

In the early days of x86 virtualization, all of this needed to be virtualized: disk and network devices needed to be emulated, as did interrupts and timers, the motherboard and PCI buses, and so on. Guests needed to start in 16-bit mode and run a BIOS which loaded the guest kernel, which (again) ran in 16-bit mode, then bootstrapped its way up to 32-bit mode, and possibly then to 64-bit mode. All privileged instructions executed by the guest kernel needed to be emulated somehow; and the pagetables needed to be emulated in software.

This mode, where all of the aspects the virtual machine must be functionally identical to real hardware - is called "fully virtualized mode."

Xen Project and Paravirtualization

Unfortunately, particularly for x86, virtualizing privileged instructions is very complicated. Many instructions for x86 behave differently in kernel and user mode without generating a trap, meaning that your options for running kernel code were to do full software emulation (incredibly slow) or binary translation (incredibly complicated, and still very slow).

The key question of the original Xen Project research project at Cambridge University was, “What if instead of trying to fool the guest kernel into thinking it’s running on real hardware, you just let the guest know that it was running in a virtual machine, and changed the interface you provide to make it easier to implement?” To answer that question, they started from the ground up designing a new interface designed for virtualization. Working together with researchers at both the Intel and Microsoft labs, they took both Linux and Windows XP, and ripped out anything that was slow or difficult to virtualize, replacing it with calls into the hypervisor (hypercalls) or other virtualization-friendly techniques. (The Windows XP port to Xen 1.0, as you might imagine, never left Microsoft Research; but it was benchmarked in the original paper.)

The result was impressive — by getting rid of all the difficult legacy interfaces, they were able to make a fast, very lightweight hypervisor in under 70,000 lines of code.

This technique of changing the interface to make it easy to virtualize they called paravirtualization (PV). In a paravirtualized VM, guests run with fully paravirtualized disk and network interfaces; interrupts and timers are paravirtualized; there is no emulated motherboard or device bus; guests boot directly into the kernel in the mode the kernel wishes to run in (32-bit or 64-bit), without needing to start in 16-bit mode or go through a BIOS; all privileged instructions are replaced with paravirtualized equivalents (hypercalls), and access to the page tables was paravirtualized as well.

Xen Project and Full Virtualization

In early versions of Xen Project, paravirtualization was the only mode available. Although Windows XP had been ported to the Xen Project platform, it was pretty clear that such a port was never going to see the light of day outside Microsoft Research. This meant, essentially, that only open-source operating systems were going to be able to run on our hypervisor.

At the same time the Xen Project team was coming up with paravirtualization, the engineers at Intel and AMD were working to try to make full virtualization easier. The result was something we now call HVM — which stands for “hardware virtual machine”. Rather than needing to do software emulation or binary translation, the HVM extensions do what might be called “hardware emulation”.

Technically speaking, HVM refers to a set of extensions that make it much simpler to virtualize one component: the processor. To run a fully virtualized guest, many other components still need to be virtualized. To accomplish this, the Xen Project integrated qemu to emulate disk, network, motherboard, and PCI devices; wrote the shadow code, to virtualize the pagetables; wrote emulated interrupt controllers in the hypervisor; and integrated ROMBIOS to provide a virtual BIOS to the guest.

Even though the HVM extensions are only one component of making a fully virtualized VM, the “fully virtualized” mode in the hypervisor was called HVM mode, distinguishing it from PV mode. This usage spread throughout the toolstack and into the user interface; to this day, users generally speak of running a VM in “PV mode” or in “HVM mode”.

From Poles to a Spectrum

Xen Project may have started with two polar modes, but it has progressed to a spectrum which progresses between the end points of Paravirtualization and Full Virtualization.

This spectrum has evolved over time. It arose when people began to realize that both polees were deficient when it came to performance. Over time, the project's engineers developed enhancements which were aimed at improving the overall performance. These enhancements borrow concepts from the opposite pole to create hybrids which come closer and closer to the optimal mode.

Enhancements

Enhancements to HVM

HVM with PV drivers

Once you have a fully-virtualized system, the first thing you notice is that the interface you provide for network and disks — that is, emulating a full PCI card with MMIO registers and so on — is really unnecessarily complicated. Because nearly all modern kernels have ways to load third-party device drivers, it’s a fairly obvious step to create disk and network drivers that can use the paravirtualized interfaces. Running in this mode can be called fully virtualized with PV drivers. It's an incremental step toward higher performance.

PVHVM

But fully virtualized mode, even with PV drivers, has a number of things that are unnecessarily inefficient. One example is the interrupt controllers: fully virtualized mode provides the guest kernel with emulated interrupt controllers (APICs and IOAPICs). Each instruction that interacts with the APIC requires a trip up into Xen Project and a software instruction decode; and each interrupt delivered requires several of these emulations.

As it turns out, many of the the paravirtualized interfaces for interrupts, timers, and so on are actually available for guests running in HVM mode; they just need to be turned on and used. The paravirtualized interfaces use memory pages shared with Xen Project, and are streamlined to minimize traps into the hypervisor.

So Stefano Stabellini wrote some patches for the Linux kernel that allowed Linux, when it detects that it’s running in HVM mode under Xen Project, to switch from using the emulated interrupt controllers and timers to the paravirtualized interrupts and timers. This new feature he called PVHVM, because although it runs in HVM mode, it uses the PV interfaces extensively.

Enhancements to PV

PVH

A lot of the choices Xen Project made when designing a PV interface were made before HVM extensions were available. Nearly all hardware now has HVM extensions available, and nearly all also include hardware-assisted pagetable virtualization. What if we could run a fully PV guest — one that had no emulated motherboard, BIOS, or anything like that — but used the HVM extensions to make the PV MMU unnecessary, as well as to speed up system calls in 64-bit mode?

This is exactly what PVH is. It’s a fully PV kernel mode, running with paravirtualized disk and network, paravirtualized interrupts and timers, no emulated devices of any kind (and thus no qemu), no BIOS or legacy boot — but instead of requiring PV MMU, it uses the HVM hardware extensions to virtualize the pagetables, as well as system calls and other privileged operations.

We fully expect PVH to have the best characteristics of all the possible mode & feature combinations — a simple, fast, secure interface, low memory overhead, while taking full advantage of the hardware. If HVM had been available at the time the Xen Project Hypervisor was designed, PVH is probably the mode we would have chosen to use. In fact, in the new ARM port, it is the primary mode that guests will operate in.

Once PVH is well-established (perhaps five years or so after it’s introduced), we will probably consider removing non-PVH support from the Linux kernel, making maintenance of Xen Project support for Linux much simpler. The kernel will probably support older kernels for some time after that. However, rest assured that none of this will be done without consideration of the community.

Given the number of other things in the fully virtualized – paravirtualized spectrum, finding a descriptive name has been difficult. The developers have more or less settled on “PVH” (mainly PV, but with a little bit of HVM), but it has in the past been called other things, including “PV in an HVM container” (or just “HVM containers”), and “Hybrid mode”.

What to Choose?

The first step is to identify which pole (PV or HVM) to start from. The rules of thumb for choosing is:

  • PV: Use this if the PVH feature is supported, which boots the best type of hybrid. You may also need to choose this if the processor doesn't do HVM: eg, following the EC2 instance type matrix, where you will end up with a fully PV mode instance.
  • HVM: Until PVH is ready for production work, this is the preferred mode. Xen Project and the guest kernel are likely to support a number of enhancements which use Xen Project's PV code paths, so the booted instance will be a hybrid. The degree to which the hybrid approaches optimal function will depend on the number of enhancements supported in the hardware. In HVM, later hardware means support for more enhancements; so if you want the best results, use more recent hardware.

The flowchart below, devised by Brendan Gregg, shows the method for choosing the optimal performance selection for most situations. While it is possible that some people may have edge cases which may require a slightly different selection, most people will find their way to the top of the performance heap just by following this simple logic flow:

Xen-mode-flow.png

PVHVM and PVH can be thought of as features and not modes. The choices are PV or HVM, which describes how the instance boots, and your choice may be influenced by the features available. Currently, on EC2, PVH is not available, but PVHVM is: so the best choice for Linux (in general) would be booting a "HVM" instance with PVHVM enabled (eg, setting CONFIG_XEN_PVHVM).

The Paravirtualization Spectrum

So to summarize: There are a number of things that can be either virtualized or paravirtualized when creating a VM; these include:

   Disk and network devices
   Interrupts and timers
   Emulated platform: motherboard, device buses, BIOS, legacy boot
   Privileged instructions and pagetables (memory access)

Each of these can be fully virtualized or paravirtualized independently. This leads to a spectrum of virtualization modes, summarized in the table below (created by Lars Kurth, modified by Brendan Gregg):

Xen-colors.png

The first three of these will all be classified as “HVM mode”, and the last two as “PV mode” for historical reasons. PVH is the latest refinement of PV mode, which we expect to be a sweet spot between full virtualization and paravirtualization: it combines the best advantages of Xen Project’s PV mode with full utilization of hardware support.

Hopefully this has given you an insight into what the various modes are, how they came about, and what are the advantages and disadvantages of each.

References