VTd HowTo

From Xen


Icon todo.png To Do:

This page is outdated. In particular the hardware sections need to be updated.


Please also check XenPCIpassthrough wiki page for more general information about Xen PCI passthru usage!

VT-d Pass-Through is a technique to give a domU exclusive access to a PCI function using the IOMMU provided by VT-d. It is primarily targeted at HVM (fully virtualised) guests because PV (paravirtualized) pass-through does not require VT-d (altough it may be utilized too)

Xen 4.1 xl tools notes

  • Only devices with FLR (Function Level Reset) capabilities are supported.
  • Passing through a PCI card without FLR capability will result an error.

To check if your PCI devices have FLR function, check in this wiki, [How can I check if PCI device supports FLR (Function Level Reset) ?]

If you see output with "FLReset-" then your PCI device don't support FLR function. If output have "FLReset+" then it does.

As this time of writing, June 2012, there are very very few PCI devices support FLR function.

Supported OSes

  • Dom0 (Host) OS: PAE, 64-bit
  • DomU (Guest) OS: 32-bit, PAE, 64-bit

Tested DomO OSes

  • 64-bit host: 32/PAE/64 Linux/XP/Win2003/Vista guests
  • PAE host: 32/PAE Linux/XP/Win2003/Vista guests

VT-d Enabled Systems

Note that in addition to the motherboard chipset and BIOS also your CPU must have support for IOMMU IO virtualization (VT-d)! Make sure to check your CPU vendors spec sheets for more information.

CPUs known to work (with a motherboard from below list and a proper BIOS):

(Intel reference the list of compatible processors vt-d)

  • Intel Core2Duo (with VT-x)
  • Intel Core2Quad
  • Intel Core i7 (see link to Intel reference list above - most "K" versions don't support VT-d)
  • Intel Core i5 (vPro Brand)
  • Intel Core i5, i7 Haswell - see link to Intel reference list above - most non-K versions support VT-d
  • Intel Xeon E3-1245 V2
  • AMD FX-8120 / FX-8150

For VT-d enabling work on Xen, we have been using development systems using following Intel motherboards:

  • DQ35MP
  • DQ35JO
  • Notes on VT-d compatibility:
    • VT-d is enabled on the following chipsets:
      • Intel Q35 (desktop / workstation)
      • Intel Q45 (desktop / workstation)
      • Intel X58 (desktop / workstation)
      • Intel X79
      • Intel 55x0 (server)
      • Intel 3450 (workstation / server)
      • Intel Z87, H87, Q87 and B85 (information provided by ASRock support)
  • The following chipsets have VT-d capability in theory, but most OEMs (such as Asus and Gigabyte) do not have it enabled on boards based on these:
    • Intel X38 (desktop / workstation)
    • Intel X48 (desktop / workstation)
    • Intel 32x0 (server)
  • For Intel Desktop Boards, these have VT-d support enabled:
    • Intel DQ35JO
    • Intel DQ35MP
    • Intel DX38BT, DX48BT2 (BIOS 1554 or 1782 required, VT-d is known to not work in other versions; iommu=workaround_bios_bug required)
    • Intel DQ45CB (BIOS 0061 required, previous versions are known to cause problems)
    • Intel DQ45EK
    • Intel DX58SO
    • Intel DQ67SW
  • For ASUS Desktop Boards, these have VT-d support enabled, but Asus does NOT support Linux, so you are on your own with any Linux or Xen issues like broken BIOSes:
    • ASUS P5E-VM DO (Intel Q35 chipset) requires IGD to be enabled (otherwise DMAR-table becomes corrupted)
    • ASUS P6T Deluxe (Intel X58 chipset) requires (currently non-public) BIOS update to correct DMAR-table issue
    • ASUS P6T documentation claims support, but does not work due to DMAR-table issue
    • ASUS P6T6 WS Revolution (Intel X58 chipset) incorrect bios DMAR-table
    • ASUS Sabertooth X58 (Intel X58 chipset) incorrect bios DMAR-table (2nd RMRR structure is incorrect)
    • ASUS Sabertooth X79 (with BIOS release 1203 and Marvell SATA controller disabled)
    • ASUS Z8NA-D6 (dual processor nehalem board) works
    • ASUS P8Z77-M PRO unfortunately does not support VT-d notwithstanding that this motherboard can operate with Intel CPUs that do support it such as the i5-3570 - confirmed by an Asus Engineer over e-mail correspondence Aug 3, 17
  • Most server boards based on the Tylersburg chipset (55x0), Cougar Point (C20x), Panther Point (C216) and few boards based on 32x0 should have working VT-d, known examples are:
    • Intel server board: S3210SHLX (BIOS >R0044 required.)
    • Supermicro server mainboard: X8DT3-F
    • Supermicro server 5026T-TB: [1] Only work USB 1.1.
    • Supermicro server: X9SAE/X9SAE-V (BIOS >= 2.0a required.)
  • These motherboards are known to have broken BIOS preventing IO virtualization (VT-d IOMMU) from working:
    • Supermicro X7SB4 (with official BIOS 1.2a) has broken ACPI DMAR table with zero length entries. BIOS version 1.3 Beta fixes the problem.
    • Samsung X460 laptop: BIOS doesn't provide DMAR table so VT-d cannot be used.
  • There are workarounds for disabled IOMMUs on the 55x0 chipset here.
  • As far as we know, following OEM systems also have VT-d enabled. Feel free to add others as they become available.
  • VT-d Compatible ASRock Motherboards (Socket 1155, 2011, 1150) probably all Z68 and Z77 board's
    • ASRock H77 Pro4-M
    • ASRock Z68 Extreme4 Gen3
    • ASRock Z68 Professional Gen3
    • ASRock Z77 Pro3
    • ASRock Z77 Pro4
    • ASRock Z77 Extreme4
    • ASRock Z77 Extreme6
    • ASRock Z77 Professional
    • ASRock X79 Extreme9 (socket 2011)
    • all boards based on Z87, H87, Q87 and B85 (information provided by ASRock support)
    • no information on cheaper Haswell chipsets (H81 etc), no information on feature set as of June 2013.

For Z77 Extreme4/Extreme6 users: ASRock introduced change in how VT-d setting can be modified within BIOS settings, this affects BIOS version >2.50 (for Extreme4 see [2]) and >2.30 (for Extreme 6 see [3]). So these BIOS versions do not disable VT-d support, but just disable user ability to modify this setting within BIOS e.g. for non-K processors VT-d will be always enabled and for K-series disabled.

  • Gigabyte Motherboards (Socket 1155) suggested by manufacturer on inquiry
    • GA-X79-UD5 (rev. 1.0)
    • GA-X79-UD7 (rev. 1.0)
    • GA-Z77X-D3H (rev. 1.0)
  • MSI 1155 Motherboards (Socket 1155) suggested by manufacturer on inquiry
    • Z68A-GD80-G3
    • Z77A-GD80
    • Z77A-GD65
    • Z77A-GD55
    • Z77A-G45
    • Z77A-G43

AMD desktop chipsets with IOMMU support

  • AMD 890FX chipset supports IOMMU. Other 890 chipsets don't have IOMMU support!
  • AMD 990FX, 990X and 970 chipsets support IOMMU.
  • Even when the chipset supports IOMMU, the bios must have a ACPI IVRS table to enable the use of it! So actual support depends on the motherboard manufacturer. At the time of writing all motherboards seem to have a (beta)bios available supporting the IOMMU. A thread with user expiriences can also be found at this forum: http://forums.tweaktown.com/f69/ga-890fxa-ud5-iommu-bios-switch-39801/
  • Motherboards with a BIOS supporting the IOMMU(as reported by users):
    • ASUS Crosshair IV (reported working by Jens Krehbiel-Gräther)
    • ASUS Crosshair V Formula (reported working by Pavel Matěja)
    • ASUS F2A85-V PRO
    • ASUS M4A89TD Pro/USB3 (reported working by Jens Krehbiel-Gräther)
    • Asrock 890FX Deluxe3 (reported working by Jens Krehbiel-Gräther)
    • Biostar TA890FXE (from bios version 89FAD629.BST reported working by Joop Boonen, Konrad Rzeszutek Wilk)
    • Gigabyte GA-970A-UD3 (Bios F7)
    • Asrock released bios updates supporting IOMMU for all motherboards with A55 or A75 chipset (see discussion)
  • Motherboards with a beta-bios available from tech-support that supports the IOMMU:
    • Gigabyte GA-890FXA-UD5
    • Gigabyte GA-890FXA-UD7
    • MSI 890FXA-GD70 (from beta-bios 1.75 reported working by Sander Eikelenboom)

AMD server (opteron) chipsets with IOMMU support

  • AMD SR5690 / SR5670 (Tyan S8212)

Caveat on Conventional PCI Device Pass-Through

  • The VT-d specification states that all conventional PCI devices behind a PCIe-to-PCI bridge have to be assigned to the same domain.
  • PCIe devices do not have this restriction.

Notes on Compiling Source Code

It is not the intention of this document to describe in detail how to compile and install Xen from source. Rather, the intention of this section is to highlight points of particular note when compiling from source for use with VT-d.

Dom0 Kernel

A prerequisite for binding devices to pciback, which is discussed in more detail below, is that pciback is compiled statically into or as a module for the dom0 kernel. This means that in the kernel config CONFIG_XEN_PCIDEV_BACKEND=y should be set to statically compile pciback into the kernel or CONFIG_XEN_PCIDEV_BACKEND=m should be set to compile pciback as a module.

For pv-ops dom0, you can use pci-stub to hide device. Set CONFIG_PCI_STUB=y to build it into kernel or CONFIG_PCI_STUB=m to build as a module.

Device Model

When building the device model, also known as qemu-dm and qemu-xen, often automatically downloaded during compilation and found in the tools/ioemu-dir/ subdirectory of the xen tree, it is important that the development headers and libraries for libpci are installed. Otherwise qemu-dm will be built without pass-through support.

If the device model has been compiled without pass-through support, the following error will show up in xend.log at run-time when pass-through is attempted.

ERROR (XendDomainInfo:581) Device model didn't tell the vslots for PCI device

VT-d boot parameter: iommu

VT-d is disabled by default, to enable it, need 'iommu' parameter to enable it.

  • off|no|false|disable: Disable IOMMU (default)
  • pv: Enable IOMMU for PV domains
  • no-pv: Disable IOMMU for PV domains (default)
  • force|required: Don't boot unless IOMMU is enabled
  • workaround_bios_bug: Workaround some bios issues to still enable VT-d, don't guarantee security
  • pass-through: Enable VT-d DMA pass-through (no DMA translation for Dom0)
  • no-snoop: Disable VT-d Snoop Control
  • no-qinval: Disable VT-d Queued Invalidation
  • no-intremap: Disable VT-d Interrupt Remapping
  • verbose: In Xen 4.0.0 and newer, enable verbose logging while enabling IOMMU and parsing ACPI DMAR tables

Usually, you just need 'iommu=1' to enable VT-d. At the same time, most of VT-d features (DMA remapping, snoop control, queued invalidation and interrupt remapping) are enabled by default if they are available. You can use 'no-xxx' to disable a feature, for example, 'iommu=no-snoop' disable snoop control.

When RMRR address range is not in reserved memory (BIOS issue), can use 'iommu_inclusive_mapping=1' to work around it.

The grub configuration is like:


title Xen-Linux (2.6.18-xen)
        root (hd0,0)
        kernel /boot/xen.gz iommu=1
        module /boot/vmlinuz-2.6.18.8-xen root=LABEL=/
        module /boot/initrd-2.6.18-xen.img

Binding Devices to pciback

In order to pass-through devices using VT-d they need to be bound to pciback to ensure that they are not bound to another dom0 driver and this free for use by pass-through. It is possible to view this as hiding the device from Dom0.

Binding at Boot-Time with "old-style" Xen 2.6.18 dom0 Linux kernel

If pciback is statically compiled into the kernel, then perhaps the simplest to make PCI devices available for VT-d pass-through is to use the pciback.hide kernel parameter, as illustrated in the following grub configuration snippet. .


title Xen-Linux (2.6.18-xen)
        root (hd0,0)
        kernel /boot/xen.gz iommu=1
        module /boot/vmlinuz-2.6.18.8-xen root=LABEL=/ ro pciback.hide=(01:00.0)(00:02.0)
        module /boot/initrd-2.6.18-xen.img

The arguments to pciback.hide are PCI functions in Self:BDFNotation, enclosed in brackets. The output of the lspci command may be useful when selecting PCI functions to hide.

Binding at Boot-Time with pvops dom0 Linux kernel (2.6.31, 2.6.32, and newer)

title Xen-Linux (2.6.32.9)
        root (hd0,0)
        kernel /boot/xen.gz iommu=1
        module /boot/vmlinuz-2.6.32.9 root=LABEL=/ ro xen-pciback.hide=(01:00.0)(00:02.0)
        module /boot/initrd-2.6.32.9.img

The pciback driver name was changed from "pciback" to "xen-pciback" in December 2009 in pvops kernels.

Binding at Run-Time

As discussed in Assign_hardware_to_DomU_with_PCIBack_as_module, it is possible to bind a device to pciback after dom0 has booted, even if it has been bound to another device in the mean time. This method can be used both when pciback is statically compiled into the dom0 kernel and when it has been compiled as a module.

Binding at Module Insertion Time

As also discussed in Self:Assign_hardware_to_DomU_with_PCIBack_as_module, it is possible to bind a device to pciback at the time that the pciback module is inserted. This technique is only useful if pciback has been compiled as a module.

Binding Devices to pci-stub

If using pv-ops dom0, also can use pci-stub to hide devices for assignment (example PCI device 01:00.0) * lspci -n * locate the entry for device 01:00.0 and note down the vendor & device ID 8086:10b9 .

Note: Binding devices with pci-stub only works for pci passthrough to HVM guests! Hiding with xen-pciback as mentioned above works for both HVM and PV guests and is preferred.


...
01:00.0 0200: 8086:10b9 (rev 06)
...
  • then use following commands to hide it: .


echo "8086 10b9" > /sys/bus/pci/drivers/pci-stub/new_id
echo "0000:01:00.0" > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
echo "0000:01:00.0" > /sys/bus/pci/drivers/pci-stub/bind

Viewing Devices that are Available for Pass-Through

xm's pci-list-assignable-devices can be used to list devices that are available for pass-through. That is, devices that have been hidden from dom0 by pciback.

[root@vt-vtd ~]# xm pci-list-assignable-devices
0000:01:00.0
0000:00:02.0

The devices are shown in Extended BDF notation, for more information see Self:BDFNotation.

Implementation Details

xm finds all devices the devices owned by pciback, checks if they have proper FLR method, checks if they have page-aligned MMIO BARs, and checks if the devices have been already assigned, finally it prints out the assignable devices. xm assumes the integrated devices (whose bus number is 0) have a proper FLR method.

Availability of this Feature

Boot-Time VT-d Device Pass-Through

As the name suggests, boot-time pass-through occurs at HVM domU (guest) boot-time. It is also referred to as static VT-d device pass-through, however, this name is confusing as devices attached using this method may subsequently be hot-unplugged. To configure static pass-through for a domain add a "pci" line to the configuration file for the domain. For example, the following may be added to /etc/xen/hvm.conf:

pci = [ '01:00.0', '00:02.0' ]

After updating the configuration file, start HVM guest and use "lspci" to verify that the the pass-through device is present in the guest. "ifconfig" or "ip addr show" may be used to see if IP address has been assigned to NIC devices.

Virtual Slot Designation

As of Xen 3.4.0 it is possible to specify the virtual slot that is used for by boot-time VT-d device pass-through. This is optional and when done is specified on a per-device basis by appending @SLOT_NUMBER to the BDF. For example, the following assigns 00:02.0 to virtual slot 7 while leaving xen to choose the slot for 01:00.0:

pci = [ '01:00.0', '00:02.0@7' ]

Viewing Devices That Have Been Passed-Through

xm's pci-list command can be used to view the VT-d devices that have been passed-through to a domain. For example, below the devices attached to the HVM domain named HVMDomainVtd are shown.

[root@vt-vtd ~]# xm pci-list HVMDomainVtd
VSlt domain   bus   slot   func
0x6  0x0      0x01  0x00   0x0
0x7  0x0      0x00  0x02   0x0

Note that the domain field in the output above refers to the PCI domain not the Xen domain. PCI domains are a physical property of the host, as are PCI buses.

VT-d Device Hot-Plug

VT-d device hot-plug involves attaching a device to an HVM domain once it has booted.

xm's pci-attach command is used to perform hot-plug The device to be attached is specified using Self:BDFNotation and optionaly the desired virtual slot. If the virtual slot is omitted then a free one will be used. For example, the following command inserts the physical device into the HVM domain called HVMDomainVtd at virtual slot 7:

[root@vt-vtd ~]# xm pci-attach HVMDomainVtd 0:2:0.0 7

As described above, xm pci-list can be used to verify the that the device is attached.

VT-d Device Hot-Unplug

VT-d hot-unplug refers to detaching a pass-through device from a running HVM domain. The device may have been attached using boot-time VT-d device pass-through or VT-d device hot-plug.

xm's pci-detach command is used to perform hot-unplug. The device to be detached is specified using Self:BDFNotation. The following command removes the device 0:2:0.0 from the domain named HVMDomainVtd.

[root@vt-vtd ~]# xm pci-detach HVMDomainVtd 0:2:0.0

As previously shown, pci-list can be used to show the devices that are attached to a domain and thus confirm that 0:2:0.0 has been detached as requested.

VT-d Device Hot-Plug/Unplug Usage Model

  • Live Migration: VT-d pass-through devices break live migration as physical device can't be save/restored. However, by hot-unplugging all the VT-d devices before live migration this problem is overcome.
  • Device Switching: VT-d hot-plug can be used to dynamically switch physical device between different HVM guest without shutdown.

Virtual Slots

Up until Xen 3.4.0, slots 6 and 7 were reserved in HVM guests for VT-d pass-through. This meant that:

  • Only two pass-through devices could be attached
  • Attached devices would always be in slot 6 or 7
  • Slots 6 and 7 could not be used for IOEMU devices

Xen 3.4.0 removes the reservation system. Any slot which isn't in use may be used for pass-through or IOEMU devices. Given that HVM domains have one virtual PCI bus, that a PCI bus has 32 slots for devices, and that a domain typically has a minimum of 3 IOEMU devices, 28 will be available for pass-through or extra IOEMU devices.

Enabling MSI/MSI-X for Assigned Devices

  • As of Xen 3.4.0 MSI/MSI-X is always on.
  • In Xen 3.3.0, MSI/MSI-X could be enabled using the xen boot parameter "msi=1".
kernel xen.gz msi=1
  • Prior to Xen 3.3.0, MSI/MSI-X could be enabled using the xen boot parameter" msi_irq_enable=1"
    • The following snippet from a grub configuration enables MSI/MSI-X
kernel xen.gz msi_irq_enable=1

Single-Function and Multi-Function Pass-Through

A PCI device may have up to 8 functions. A device with only one function is called a single-function device and a device with more than one function is called a multi-function device. A common example of a multi-function device are USB devices. For example, the following USB device is device 1a on bus 0 and has 3 functions: 0, 1 and 7.

[root@vt-vtd ~]# lspci -s 00:1a
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)

As of Xen 3.4.1, and in all prior versions of Xen that supported VT-d pass-through, only single-function VT-d pass-through devices are supported in HVM domains. This means that if a function of as multi-function device is passed-through to an HVM domain, it will appear as function zero of a single-function device in the HVM domain, regardless of its physical function number.

Single-function devices always have function zero present (and no other functions) and thus the function number does not change when a single-function device is passed through they are passed-through.

The following table shows that regardless of the physical function number, the virtual function number will be zero.

Physical (dom0)
Single-Function 00:02.0
Multi-Function 00:1a.1

Support for passing-through multi-function devices as multi-function devices has been merged into xen-unstable and should appear in 4.0. A back-port to 3.4.1 is available as per the announcement email.

This feature is accessed using extensions to BDF notation to allow a multi-function device to be specified as a single unit. For details please see Self:BDFNotation. Note that the back-port to 3.4.1 does not support the explicit mapping of virtual function numbers (the = syntax) as this is considered too green at this time.