Xen PCI Passthrough

From Xen
Revision as of 20:36, 10 November 2011 by Lars.kurth (talk | contribs) (passing multiple PCI devices)


Xen PCI Passthrough

Xen version 3.0 released in 2005 was the first version to support PCI passthrough. You can use PCI passthru to assign a PCI device (NIC, disk controller, HBA, USB controller, firewire controller, soundcard, etc) to a virtual machine guest, giving it full and direct access to the PCI device.

The guest VM (domU) needs to have a driver for the actual PCI device, just like you'd use the PCI device on baremetal (without Xen).

Xen PCI passthru requirements

Required software for Xen PCI passthrough:

  • Xen hypervisor v3.0 or newer.
  • Dom0 kernel must have pciback driver (xen-pciback in pvops dom0 kernels).

See below for additional requirements depending which type of Xen PCI passthru you're using (PV or HVM/IOMMU).

Xen PCI passthru to a PV (paravirtual) guest

Traditional Xen PV PCI passthru to a PV domU gives the guest full control of the PCI device, including DMA access. This can be potentially insecure and unstable, if the guest is malicious or has buggy drivers. The advantage of this PV PCI passthru method is that it has been available for years in Xen, and it doesn't require any special hardware or chipset (it doesn't require hardware IOMMU (VT-d) support from the hardware).

Additional requirements for Xen PCI passthru to PV domU:

  • PV domU kernel needs to have the Xen PCI frontend driver loaded for PCI passthru to work! This driver is called xen-pcifront in pvops kernels.
  • You need to add additional kernel options to the domU kernel cmdline to enable Xen PCI passthru, see below.

You can also use the more safe and secure hardware IOMMU (VT-d) PCI passthru for PV guests, see below for that.

For DMA access to work for pci devices in a PV guest it is required to enable the swiotlb in the domU (guest VM) kernel. (for some explanation about the swiotlb see http://lwn.net/Articles/91870/)

Add this option to your domU (guest VM) kernel boot options (cmdline arguments) list :


swiotlb=force

NOTE: This option is only required for older PV guest kernel versions (linux-2.6.18-xen, RHEL5, SLES10/11, Debian etch/lenny).

upstream kernel.org (pvops) Linux domU kernels requires just this option:


iommu=soft

Xen PCI passthru to an HVM (fully virtualized) guest

Requirements for Xen PCI passthru to HVM guest:

  • Hardware IOMMU (Intel VT-d or AMD IOMMU) is required from the CPU/motherboard/chipset/BIOS.

Note that IOMMU/VT-d is different additional feature than the normal CPU virtualization extensions (VMX/SVM) required for running Xen HVM guests!! Intel VT-x or AMD-V provide the VMX or SVM CPU flags (CPU virtualization extensions), but these flags don't give you IOMMU/VT-d support! IOMMU support is not yet available in many chipsets (as of the beginning of the year 2010).

To verify you have IOMMU support enabled:

  • Check if IOMMU (Intel VT-d or AMD IOMMU) is enabled in the system BIOS. Some BIOSes call this feature "IO virtualization" or "Directed IO". After changing settings in the BIOS make sure you completely poweroff the machine, unplug the power cord, let it be without power for a while, and then restart the system. Some systems are known to not enable IOMMU for real until you poweroff the system completely!
  • If running Xen 3.4.x (or older version) you need to add iommu=1 flag (or vtd=1 in even older versions) for Xen hypervisor (xen.gz) to grub.conf and reboot.
  • Xen 4.0.0 and newer versions enable IOMMU support as a default if supported by the hardware and BIOS, no additional boot flags required for the hypervisor.
  • read "xm dmesg" Xen hypervisor boot messages and check if "IO virtualization" gets enabled.
  • Unfortunately there are many buggy BIOSes causing Xen to disable IO virtualization because of errors in the BIOS DMAR/ACPI tables. Xen tries to workaround these bugs in the BIOS, but sometimes it's not possible. Please report all the details about your hardware and software to xen-devel mailinglist if IO virtualization gets disabled due to buggy BIOS. Also see below for troubleshooting tips.

Please see the VTdHowTo wiki page for more information about PCI passthru to Xen HVM guest.

Xen HVM guest doesn't need to have a special kernel or special pcifront drivers in it for the PCI passthru to work.

Xen PCI passthru usage

  1. Make sure dom0 has Xen pciback driver available and loaded.
  2. Boot into Xen dom0 and run "lspci" to find out the PCI device ID (BDF notation) of the PCI device to passthru.
  3. Edit grub.conf and configure dom0 kernel options to hide the PCI device ID (BDF) from dom0. This can be done using the "guestdev=", "pciback.hide=" or "xen-pciback.hide=" methods, depending which dom0 kernel you're using. See below for more information about different kernels. Example for pvops dom0 kernel (xen-pciback):
module       /boot/vmlinuz-2.6.32.10 root=/dev/sda1 ro nomodeset xen-pciback.hide=(08:05.0)


  1. Or to hide multiple devices from dom0:
module       /boot/vmlinuz-2.6.32.10 root=/dev/sda1 ro nomodeset xen-pciback.hide=(01:00.0)(00:02.0)
  1. Note that you can also do the device hiding dynamically (without reboot) using the xen-pciback sysfs interface, see below for more information about that.
  2. Reboot the system
  3. Run "xm pci-list-assignable-devices" and verify the PCI device is available for passthru
  4. Note important: Make sure the device shows up in the "xm pci-list-assignable-devices" list! Don't continue before you've gotten that properly working.
  5. Edit "/etc/xen/<guest>" cfgfile and add the following line to enable PCI passthru
pci = [ '01:00.0' ]
  1. Or if you want to passthru multiple PCI devices (all have to be hidden from dom0)
pci = [ '01:00.0', '00:02.0' ]
  1. If you're going to PCI passthru to a PV guest make sure the *guest* kernel cmdline has "swiotlb=force" and possibly "iommu=soft" parameters (see above for more information).
  2. Start the guest with "xm create".
  3. You can verify the PCI passthru status with: "xm pci-list <guest>"
  4. If your guest is a PV guest, make sure it has Xen PCI frontend driver loaded, and then verify with "lspci" the guest kernel can see the passthru device.
  5. If your guest is an HVM guest, use "lspci" in the guest to verify the passthru device is visible

Xen PCI passthru hotplug and hot-unplug

You can also do PCI passthru online to already running Xen guest. The PCI device must be ready for passthru before doing the hotplug, ie. it has to show up in the " xm pci-list-assignable-devices" list. This hotplug method works for both PV and HVM guests. Commands for Xen guest PCI hotplug/hot-unplug are:

  • hotplug
xm pci-attach <guest> <pci device> <guest virtual slot number>
  • hot-unplug
xm pci-detach <guest> <pci device> <guest virtual slot number>

Xen dom0 pciback driver backend modes

List of Xen pciback modes that you can set in the kernel configuration (.config file) in xen/stable-2.6.32.x kernel:

  • CONFIG_XEN_PCIDEV_BACKEND_PASS=y means PCI device gets the same PCI ID in the guest than in dom0.
  • CONFIG_XEN_PCIDEV_BACKEND_VPCI=y means PCI device gets virtual PCI ID in the guest, not the same PCI ID as in dom0.

Note that in upstream Linux 3.1.0 and later versions you can set PASS/VCPI as a module/driver option when loading the driver!

You can use the following on dom0 Linux kernel command line in grub.conf (if xen-pciback is built-in to the kernel):

xen-pciback.passthrough=1


or the following if loading xen-pciback driver as a module:

modprobe xen-pciback passthrough=1


In Linux 3.1+ this will give the same behavious as earlier CONFIG_XEN_PCIDEV_BACKEND_PASS .config option.

Xen PCI passthru limitations

When the guest has PCI passthru devices in use, operations like save/restore/migration are not possible. You have to detach (unplug) the passthru device before save, restore or live migration is possible.

Xen VGA graphics adapter passthru

Please see the XenVGAPassthrough wiki page for more information about VGA graphics card passthru.

Can I use hardware IOMMU (VT-d) passthru also for PV guests?

Yes, you can use the more safe and secure hardware IOMMU (VT-d) passthru also for PV guests, instead of the normal Xen PV PCI passthru. As a default Xen uses the normal (non-IOMMU) PV passthru for PV guests.

You need to add "iommu=pv" boot option for Xen hypervisor (xen.gz) in grub.conf and reboot. After rebooting verify from "xm dmesg" that IO virtualization for PV guests gets enabled. Naturally you must have hardware IOMMU (Intel VT-d or AMD-V) support for this to work.

From Xen 4.0.1 and above, the iommu=pv is not necessary, as it is turned on by default as long as vt-d is presented during boot.

I'm using pvops dom0 kernel and "pciback.hide" doesn't seem to do anything!

pvops dom0 kernels renamed the xen related driver-modules in Dec 2009 to have xen- prefix in them, so pciback driver is now called xen-pciback. So please use "xen-pciback.hide" option instead of just "pciback.hide".

Is it possible to hide the PCI devices dynamically (on the run) from dom0, so that I don't have to reboot the machine?

Yes, it's possible with pvops dom0 kernel xen pci backend sysfs interface. Please see this email for instructions: http://lists.xensource.com/archives/html/xen-devel/2010-03/msg00448.html

I get "non-page-aligned MMIO BAR" error when trying to start the guest

If using linux-2.6.18-xen, add these options to grub.conf for the 2.6.18.8 dom0 kernel which should fix the alignment:


guestdev=01:00.0,01:02.0 reassign_resources

replace "01:00.0" and "01:02.0" with your actual PCI devices you want to passthru. Note the "," to separate the entries.

There was a change in Apr 2009 in linux-2.6.18-xen (http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/a3ad7a5f2dcd) that changed the syntax.. the earlier/old syntax for linux-2.6.18-xen is:


pciback.permissive pciback.hide=(01:00.0)(02:01.0) reassigndev

If you're using Linux 2.6.31 or newer pvops dom0 kernel then there's no guestdev/reassign_resources, but instead you use:


xen-pciback.permissive xen-pciback.hide=(08:05.0)(09:06.1) pci=resource_alignment=08:05.0;09:06.1

If you're using Linux 2.6.31 or newer dom0 kernel based on the Novell/SLES/OpenSuse Xenlinux forward-ported patches, then you use this syntax:


pciback.permissive pciback.hide=(00:1d.7)(00:1a.0)(00:1a.1)(00:1a.7)(00:1b.0) pci=resource_alignment=00:1a.7;00:1d.7

Note the ";" to separate multiple PCI ID entries for "pci=resource_alignment".

If using GRUB2, and using resource_alignment for multiple devices, you need to wrap the resource_alignment with single quotes like this:

'pci=resource_alignment=00:1a.7;00:1d.7'

Otherwise GRUB2 will parse the line wrong and you won't get any resource_alignment! For more info see: http://lists.xensource.com/archives/html/xen-users/2011-09/msg00360.html .

I get "Error: pci: 0000:02:06.0 must be co-assigned to the same guest with 0000:02:05.0" error when trying to start the guest

This error usually happens when you're trying to passthru only a single function from a multi-function device (for example a dual-port nic), or only one of the devices behind the same PCI bridge. This is not allowed by the Intel VT-d specification. Please see this email for the explanation of this issue: http://lists.xensource.com/archives/html/xen-devel/2010-01/msg00870.html and the patch implementing these FLR methods: http://xenbits.xensource.com/xen-unstable.hg?rev/e61978c24d84

If you want to manually override this in Xen 4.0.0 or newer you can specify "pci-passthrough-strict-check no" in /etc/xen/xend-config.sxp, and after restarting xend passthru code won't give this error anymore. In some (many?) cases PCI passthru can work after this change.

If the PCI device is a single-function device, you can also move it to a different PCI slot to workaround the issue.

With Xen 3.4.x and 3.3.x versions you can apply a "disable FLR" patch to workaround this issue: http://lists.xensource.com/archives/html/xen-devel/2008-10/binAofZNDKlrU.bin and discussion about it here http://lists.xensource.com/archives/html/xen-devel/2008-10/msg00280.html

Xen 4.0.0 says IO virtualization is disabled, how can I enable more verbose logging to find out why it gets disabled?

Add "iommu=verbose" option for Xen hypervisor (xen.gz) in grub.conf and reboot. After rebooting read "xm dmesg" log (or set up a serial console). As a default Xen 4.0.0 is not verbose about IOMMU initialization and related ACPI DMAR table parsing.

Does upstream kernel.org Linux 2.6.3x kernel work as PV guest (domU) kernel for PCI passthru usage?

Yes. Starting with kernel.org Linux 2.6.37 Xen PV domU PCI passthrough is supported out-of-the-box, ie. xen-pcifront driver is included in the standard kernel!

There are also multiple additional git trees and branches available with the pvops Xen PCI Frontend driver included.

Jeremy's xen.git in git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git has Xen pcifront driver at least in the following branches:

  • xen/stable-2.6.31.x
  • xen/stable-2.6.32.x

Konrad's xen.git in git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git has Xen pcifront driver in the following branch (note these branches are only suitable for domU use):

  • pv/merge.2.6.33
  • pv/merge.2.6.34
  • devel/merge.2.6.35-rc3

You can get the branches like this:


$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git linux-domU
$ cd linux-domU
$ git checkout -b temp  origin/pv/merge.2.6.34

Bugs:

  • [Fixed with Xen c/s 23428 "libxl: Add 'e820_host' option to config file"] Guest limited to 3GB. If you try to pass in more than that the PCI devices do not show up.
  • [Fixed with Xen c/s 23249 "pv-grub: Fix for incorrect dom->p2m_host[] list initialization"] Starting the DomU using pvgrub with 'iommu=soft swiotlb=force' breaks pvgrub.
  • [Fixed] Passing in PCI device with function != 0 and they do not show up in DomU.
  • [Fixed] 32-bit DomU with 64-bit Dom0 fails. e100 driver can't map its PCI BARs.

See http://lists.xensource.com/archives/html/xen-devel/2010-04/msg00285.html for Apr 2010 update of pcifront/swiotlb patches.

See http://lists.xensource.com/archives/html/xen-devel/2010-04/msg01065.html for Apr 2010 updated pcifront/pciback patches with MSI/MSI-X support.

Which Xen PV guest (domU) kernels have Xen pcifront driver included, required for PCI passthru?

Here's a list of the kernels that have the Xen pcifront driver included. This list might not be complete.

  • kernel.org Linux 2.6.37 and newer versions
  • linux-2.6.18-xen from xen.org
  • Pvops kernels from Jeremy's or Konrad's git trees (see above).
  • RHEL5 / CentOS 5 kernel-xen 2.6.18
  • SLES 10 / SLES 11
  • OpenSUSE 11.x
  • Novell/OpenSuse forward-ported Xenlinux patches, rebased by Andrew Lyon for 2.6.29, 2.6.31, 2.6.32, 2.6.33, etc
  • Debian etch 2.6.18-6-xen, lenny 2.6.26-2-xen, squeeze 2.6.32-xen.

See XenKernelFeatures wiki page for more information about supported features in Xen kernels.

My hardware/motherboard does have an IOMMU included, but Xen doesn't enable hardware assisted IO virtualization!

Unfortunately many motherboards ship with broken BIOSes (for example incorrect ACPI DMAR, DRHD or RMRR tables) that causes Xen to disable IO virtualization as a security measure, or to prevent crashes from happening later on.

You can check if Xen enabled IO virtualization by running "xm dmesg" command and reading through the log. There's a line about IO virtualization telling if it's enabled or disabled. You need to have at least Xen 3.4 or newer for IOMMU (VT-d) to work.

If IO virtualization gets disabled, but it's available on your hardware, you should try these steps to troubleshoot it:

  • Check the BIOS version installed, and check the vendors support site for BIOS updates. Install the latest BIOS/firmware updates.
  • Enable "IOMMU", "IO virtualization" or "VT-d" in the BIOS and power-off, then restart the machine.
  • Set "iommu=verbose" boot option for Xen hypervisor (xen.gz) in grub.conf, if running Xen 4.0.0 or newer.
  • Read Xen hypervisor boot messages from "xm dmesg" to see if IO virtualization is enabled or disabled.
  • If Xen complains about broken BIOS, let the motherboard/system vendor know about it.
  • Intel developers also want to know about broken IOMMU/VT-d BIOS implementations, see this email: http://lists.xensource.com/archives/html/xen-devel/2010-01/msg00841.html, so let them know all the details about your hardware and software if you have broken BIOS.
  • Upgrade to Xen hypervisor 4.0.0 or later, since this version added many workarounds for buggy BIOSes.

What is PCI device ID BDF notation?

Please see BDFNotation wiki page for more information.

I really would like to do Xen PCI passthru to an HVM guest but I don't have IOMMU/VT-d capable hardware.. is there another way?

Well.. yes. This is totally unsupported, and it only works for the first started guest. Oh, and you're totally on your own with this patch: http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00018.html . Good luck :)

How can I check if PCI device supports FLR (Function Level Reset) ?

Run "lspci -vv" (in dom0) and check if the device has "FLReset+" in the DevCap field.

pci-stub?

pci-stub can be used only with Xen HVM guest PCI passthru, so it's recommended to use pciback instead, which works for both PV and HVM guests.

passing multiple PCI devices

When passing PCI devices rather then PCIe device, it is necessary to include all the sub devices before PCI passthrough works. E.g.

lspci

  • 06:00.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 62)
  • 06:00.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 62)
  • 06:00.2 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 65)
  • 06:03.0 FireWire (IEEE 1394): Agere Systems FW322/323 (rev 70)

For user who wish to have 06:00.0-2 to pass to domU, it is necessary to add following line to the /boot/grub/grub.cfg in dom0 kernel xen-pciback.hide=(06:00.0)(06:00.1)(06:00.2)(06:00.3)(06:03.0)

When (06:03.0) is left out, the pci passthrough won't work!!

In the /etc/xen/abc.cfg file, the following line is fine pci = ['06:00.0', '06:00.1', '06:00.2']

The above example is used for Xen 4.0.1 with Debian Squeeze as dom0.]