Difference between revisions of "Windows PV Drivers Presentation"

From Xen
(Slide 13)
(Slide 14)
Line 143: Line 143:
   
 
=== Slide 14 ===
 
=== Slide 14 ===
  +
  +
Dynamic interface discovery works using an IRP_MJ_PNP minor number that Microsoft dedicate for that purpose: IRP_MN_QUERY_INTERFACE.
  +
  +
A child driver will pass an IRP_MN_QUERY_INTERFACE IRP to its parent, identifying the interface it wants to get hold of with a GUID and version number. The IRP references a data buffer and, if the parent recognises the GUID it fills the buffer with a jump table and some context information and completes the IRP with a success code. If the parent doesn’t recognize the GUID it could fail the IRP but generally it will pass it on up to its parent. Hence a driver can usually get hold of an interface implemented by any of its ancestors (or filters thereof). For example XENNET makes use of an interface provided by XENBUS, even though XENVIF sits between the two.
  +
 
=== Slide 15 ===
 
=== Slide 15 ===
 
=== Slide 16 ===
 
=== Slide 16 ===

Revision as of 14:10, 10 February 2015

Notes:

Slide 0

Hi, I’m Paul Durrant. I’m a principal engineer in the XenServer group at Citrix and I’m project lead for the XenProject Windows PV Drivers.

Slide 1

In this presentation I’m going to be giving an overview of the drivers.

We’ll start with the origins of the drivers, and the journey from the original XenServer-specific closed-source ‘Legacy’ drivers, through the open source XenServer drivers (dubbed the ‘Standard’ drivers in Citrix and available on GitHub), to the current generic XenProject drivers, the source of which is now hosted on Xenbits.

I’ll then move on to the way that functionality is broken down into interfaces, how they are provided and consumed, and how compatibility is maintained as they evolve.

And finally I’ll give a brief overview of what you need to do to build and install the drivers, and contribute to the project.

Slide 2

To start with I need to introduce some Windows driver terminology and some conventions I’ve used in the diagrams in this presentation.

Windows devices are organized into a tree, or a set of trees, rooted at what’s called a Physical Device Object or PDO. In my view of the world trees grow downwards so I put PDOs at the top 

Normally a PDO just represents a piece of hardware, which is not that useful unless you have some code to talk to it. That code is called a Function Driver and when a function driver attaches to a PDO it creates a corresponding Function Device Object or FDO.

Unlike some OS, such as Linux, Windows has a concept of demand-loading drivers. Hence function drivers do not contain code to discover their hardware. Instead they are part of a package described by what is called an INF file. In that INF file there are entries to tell Windows what PDO ‘names’ a particular function driver will ‘bind’ to, So, as Windows builds its device tree it can look at the names of newly created PDOs and determine which Function Drivers to load.

A Function Driver can also be what’s called a Bus Driver. That means that, having created its FDO, it can also create PDOs. For example, the root PCI driver binds to a PDO created by the ACPI driver (which is parsing the DSDT). It will create an FDO to bind to that, enumerate the root bus (using PCI config cycles) and create a PDO for each unique bus/device/function that it finds.

Slide 3

The first set of drivers we’ll mention are the closed source ‘Legacy’ drivers.

Before XenServer 6.1 was released, these were the only PV drivers and they were getting pretty long in the tooth. I believe they were written for Windows 2000 support on the first version of XenServer (or possibly even XenEnterprise?) to support HVM guests.

They are still used in XenServer today, but only for Windows Server 2003 (and XP before it went EOL).

Citrix have never provided source for these drivers, and that is mainly because there is code in them that is of unknown origin. Also, there is less and less point in doing so as time goes by. Server 2003 will be EOL this year (2015), at which point these drivers will finally be consigned to history.

Slide 4

To give you an idea of why Citrix made these drivers ‘Legacy’ and replaced them with a new set for Vista onwards, let’s take a look at the structure of the driver packages and how they (just about) hang together…

The first thing you’ll notice is there are essentially two ‘root’ PDOs. The one on the right is the Xen Platform PCI device, created by QEMU, and a key part of any HVM guest running on pretty much any Xen distribution. The one on the left, however, is synthesized by a driver installer package.

The main virtual bus driver is called XENEVTCHN (don’t know why) and that, along with the export driver XENUTIL (an export driver is like a kernel DLL), is where most of the code that talks to Xen lives. XENEVTCHN is the ultimate parent of the PV network devices, but not the storage devices. Those are dealt with by XENVBD, which binds directly to the PCI device, but uses code in XENUTIL to co-ordinate with XENEVTCHN.

The XENVBD package also installs a filter driver, SCSIFILT. The reason for this driver is that (because it needs to work on versions of Windows older than Vista) XENVBD uses a very old storage driver API in Windows called SCSIPORT, and SCSIPORT has very poor locking semantics and only a single request queue for an HBA. This makes it very slow. SCSIFILT is designed to sit between the generic Windows DISK driver and the XENVBD and intercept storage requests. Being a filter driver it’s not bound by any logo requirement to use a standard Windows storage API and so it bypasses the whole SCSIPORT queuing and locking framework and talks directly to the PV backends, which is a lot faster.

Back over on the left, you can see the XENNET driver for PV networks devices but in between that and XENEVTCHN is another driver, XENVIF. Because the legacy drivers used to be used for versions of Windows all the way from Server 2000 through to 7 and Server 2008R2 they actually had to have two distinct versions of the XENNET driver. Between releasing Server 2003 and Vista Microsoft changed the NDIS API in an incompatible way, so anyone writing Windows network drivers needed to fork their code. Server 2003 and before uses NDIS version 5.x and Vista onwards uses version 6.x.

The original code had both these flavours of XENNET but there was a lot of code duplicated between them and when bugs cropped up it was easy to end up applying a fix to one driver that really should be applied to both. I therefore re-wrote the drivers, moving all the common code into a driver called XENVIF which I also made the parent of all XENNETs, to allow for dynamic interface discovery which is something we’ll come onto later.

Slide 5

So this rather complex structure causes some problems…

SCSIFILT, whilst working round the deficiencies of SCSIPORT, causes some problems. There are utilities which directly open storage devices (SCSIPORT allows this) and send read and write requests. Those requests, because they did not come from the DISK driver, bypass SCSIFILT and thus XENVBD has to have a very odd ‘loopback’ path where it injects the requests into the storage stack as if they did come from the DISK driver to allow them to be intercepted by SCSIFILT. Also, because there are some circumstances where SCSIFILT is not loaded (e.g. if a disk is disabled in Device Manager) both XENVBD and SCSIFILT must have code to deal with the PV state modes, for purposes of VBD unplug… which is more code duplication.

Cross-package linkage dependencies (generally to XENUTIL) are a massive problem. There never was a defined ABI and so it was very easy for packages to become binary incompatible leading to very odd BSODs during upgrade. Really there is no safe way to upgrade legacy drivers… it is best to remove the old set before adding the new set. But, that requires two reboots.

The two root nodes also cause a big problem. Initialization of the PV interfaces to Xen need to be done before either XENEVTCHN or XENVBD can fully function, but you never know which one is going to come up first, and worse… a resource rebalance (something Windows may need to do to redistribute interrupts for example) means either one can be unloaded and reloaded at any time. This makes the initialization code very very complicated, non-obvious and fragile.

Finally, the use of a synthetic root node completely precludes deployment via Windows Update as those nodes can only be created by a driver installer.

Windows Update deployment has always been a goal for Citrix and so that final point is really a showstopper for these drivers.

Slide 6

So, what did we do… We wrote some new drivers, which are now dubbed the ‘standard’ drivers.

Now, it so happened that around the time these drivers were getting towards being fully functional Microsoft changed the landscape and said that all drivers for the then new Windows 8 and Server 2012 release had to be built with the new WDK and the oldest version of Windows supported by that WDK is Vista. So, it was decided that the new set of drivers would only support Vista onwards and older OS would continue with the legacy drivers.

Slide 7

This is the structure of the standard drivers…

As you can see, it’s a bit simpler than the structure of the legacy drivers.

It’s basically a single tree structure with the only complex part surrounding the root node. The new parent bus driver XENBUS binds to the PCI device (which you’ll not has a new ID… more on that later), but makes use of an export driver called XEN. Then there’s a filter driver called XENFILT which actually sits not only between the Xen platform PCI device’s PDO and XENBUS but also between all PCI PDOs and their function drivers.

The reason for the presence of XENFILT is that it allows us to execute code before QEMU emulated devices are exposed to the Windows PnP subsystem and hence we can make sure that emulated device unplug occurs early enough in boot such that Windows does not see devices disappearing when the unplug occurs.

The use of the XEN export driver also gives us a useful hook. Its DllInitialize() routine is called only at boot time, allowing us to perform Xen operations which only need doing after initial domain creation and do not need to be, or should not be, repeated on domain resume (i.e after a suspend or migrate).

Slide 8

This new set of drivers addressed the major shortcomings…

Because the drivers only support Vista onwards, XENVBD could use the newer and much better performing STORPORT storage API and thus SCSIFILT was no longer needed, which reduced complexity.

Because there are no cross package link dependencies, the installation ordering issues and binary compatibility issues during upgrade were solved.

And crucially, because of the single PCI enumeration root node the drivers no longer require an installer and this allows them to be deployed via Windows Update.

Slide 9

But there were still some problems…

That new device ID… The idea of changing it from the standard Xen platform ID was…

When drivers are posted to Windows Update you can only control their deployment by OS version and physical device name. So, if we were to post drivers that deployed on the standard Xen platform physical device anyone anywhere in the world, with a Windows HVM guest (not just on XenServer but AWS for instance) would suddenly start getting drivers from Windows Update! The standard drivers are also completely incompatible with the legacy drivers - installing them before removing legacy drivers leads to instant BSOD. Unsurprisingly, we did not want this to happen.

The big problem with this new device ID is that, to use standard drivers you need to have a PCI device with the new ID and that made upgrade from legacy to standard somewhat complex. Another problem though was that the new device ID required changes to QEMU and the host toolstack, which was unacceptable to the upstream community. Another way was needed.

There was also a second problem. Use of interface discovery removed the cross package link dependencies and made the load ordering of drivers flexible, but the interface compatibility check is an exact match, which means drivers must still be upgraded together otherwise you may get a non-functional system… and that’s a bit of a problem if you are getting drivers from Windows Update and you upgrade your XENNET driver first and then find the new one is not compatible with your XENVIF. How are you going to get the new version of XENVIF without a network? This required some more thought.

Slide 10

Now, in 2013 XenServer went fully open source and this included the standard PV driver source (which, as I said, went onto GitHub).

However, there was a desire to make the Windows drivers even more open such that they would work on most Xen installations.

I therefore proposed, in mid 2014, that the Linux Foundation adopt the PV drivers as a sub-project of the Xen Project. The advisory board agreed to this in June and there is now a project front-page on xenproject.org, source repositories on xenbits and even publicly available binaries courtesy of a build VM hosted by Rackspace.

I’m the project lead, chief maintainer and committer. My Citrix colleagues Ben and Owen are also committers and maintainers.

Slide 11

Citrix plan to use these upstream drivers in the next version of XenServer for all versions of Windows (since XP is already gone and Server 2003 will be gone by mid 2015).

Like the standard drivers, they will be built for XenServer with the Windows 8 WDK and VS 2012, although they can be built with the 8.1 WDK and VS 2013. The reason we don’t plan to update our toolchain for XenServer is because the 8.1 WDK doesn’t support any OS prior to Windows 7, and Vista and Server 2008 are still in support.

Also, the Xen Project drivers have addressed the device ID and compatibility problems of the standard drivers. We’ll come to how in a moment…

Slide 12

The structure of the drivers is basically identical to the standard drivers…

The crucial differences are in how we handle binding to the PCI device and the details of how interface discovery is managed…

Slide 13

The new XENBUS can now bind to 3 different devices…

Depending on where your VM comes from you will have one of the two devices on the left. However, for Windows Update purposes in a XenServer VM you may also end up with the new device on the right. (The C000 device ID is reserved for XenServer for the purposes of Windows Update in a header in the main Xen source repository.)

When XENBUS installs, there’s a module called a co-installer (that is part of the package) that runs just before the driver binds to the PDO and again just after. This module can be used to control how XENBUS behaves…

If the Windows Update device (on the right) is present then the XENBUS instance bound to that will be ‘active’ and the other instance bound to the device on the left will not be. However if the Windows Update device is not present than, when XENBUS binds to one of the devices on the left, that will be active.

Only the active instance of XENBUS will talk to XEN and only the active instance of XENBUS will enumerate child PDOs and those PDOs will carry the device ID of the PCI device in their name. This makes all PDOs that Citrix will target for Windows Update distinct from those in an, say, AWS VM but still allows the drivers to be installed into a generic Xen VM without needing toolstack or QEMU modifications.

Slide 14

Dynamic interface discovery works using an IRP_MJ_PNP minor number that Microsoft dedicate for that purpose: IRP_MN_QUERY_INTERFACE.

A child driver will pass an IRP_MN_QUERY_INTERFACE IRP to its parent, identifying the interface it wants to get hold of with a GUID and version number. The IRP references a data buffer and, if the parent recognises the GUID it fills the buffer with a jump table and some context information and completes the IRP with a success code. If the parent doesn’t recognize the GUID it could fail the IRP but generally it will pass it on up to its parent. Hence a driver can usually get hold of an interface implemented by any of its ancestors (or filters thereof). For example XENNET makes use of an interface provided by XENBUS, even though XENVIF sits between the two.

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22