Difference between revisions of "Xen Roadmap/4.4"

From Xen
(Exception guidelines for after the code freeze)
(Open)
 
(9 intermediate revisions by 3 users not shown)
Line 22: Line 22:
 
* Code freezing point: '''18 November 2013'''
 
* Code freezing point: '''18 November 2013'''
 
* First RCs: '''6 December 2013''' <<== WE ARE HERE
 
* First RCs: '''6 December 2013''' <<== WE ARE HERE
* Release: '''21 January 2014'''
+
* Release: '''When It's Ready 2014'''. (Probably mid-to-end of February.)
   
 
Feedback on the estimated dates is welcome.
 
Feedback on the estimated dates is welcome.
   
Last updated: 6 December
+
Last updated: 10 February
   
 
= Exception guidelines for after the code freeze =
 
= Exception guidelines for after the code freeze =
Line 34: Line 34:
 
maintainers and the release coordinator will use to make decisions; you can help
 
maintainers and the release coordinator will use to make decisions; you can help
 
us out a lot by including your own analysis to any attached patch. Remember that
 
us out a lot by including your own analysis to any attached patch. Remember that
the more conservative you are in your own analysis, the less strict we can afford to be.
+
the more conservative you are in your own analysis, the less strict we need to be in our analysis.
   
 
Our goal for the release are, in this order:
 
Our goal for the release are, in this order:
Line 76: Line 76:
 
- Added Spice vdagent support
 
- Added Spice vdagent support
 
- Added Spice clipboard sharing support
 
- Added Spice clipboard sharing support
  +
- Spice usbredirection support for upstream qemu
   
 
* PHV domU (experimental only)
 
* PHV domU (experimental only)
Line 91: Line 92:
 
* Update to SeaBIOS 1.7.3.1
 
* Update to SeaBIOS 1.7.3.1
   
* Update to qemu 1.6
+
* Update to qemu 1.6.2
   
 
* SWIOTLB (in Linux 3.13)
 
* SWIOTLB (in Linux 3.13)
Line 97: Line 98:
 
* Disk: indirect descriptors (in 3.11)
 
* Disk: indirect descriptors (in 3.11)
   
  +
* Reworked ocaml bindings
 
</pre>
 
</pre>
   
 
= Resolved since last update =
 
= Resolved since last update =
 
<pre>
 
<pre>
  +
* qemu-* parses "008" as octal in USB bus.addr format
* xen_platform_pci=0 doesn't work with qemu-xen
 
  +
  +
* Claim mode and PoD
  +
  +
* Disable IOMMU if no southbridge
  +
  +
* osstest windows-install failures
  +
  +
* libxl / libvirt races
 
</pre>
 
</pre>
   
 
= Open =
 
= Open =
 
<pre>
 
<pre>
  +
* Win2k3 SP2 RTC infinite loops
  +
> Regression introduced late in Xen-4.3 development
  +
owner: andrew.cooper@citrix
  +
status: patches posted, undergoing review.
  +
  +
* PVH regression
  +
  +
* dirty vram / IOMMU bug
  +
> http://bugs.xenproject.org/xen/bug/38
  +
status: Patch posted
  +
  +
* RHEL 7 pygrub patches
  +
> http://bugs.xenproject.org/xen/bug/39
  +
status: Wait for 4.4.1?
  +
  +
* credit2 runqueues
  +
> http://bugs.xenproject.org/xen/bug/36
  +
  +
* RHEL 5.x ocaml build bug
  +
status: patch posted
  +
  +
* libxl / xl does not handle failure of remote qemu gracefully
  +
> Related to http://bugs.xenproject.org/xen/bug/30
  +
> Easiest way to reproduce:
  +
> - set "vncunused=0" and do a local migrate
  +
> - The "remote" qemu will fail because the vnc port is in use
  +
> The failure isn't the problem, but everything being stuck afterwards is
  +
Ian J investigating
  +
  +
* qemu memory leak?
  +
> http://lists.xen.org/archives/html/xen-users/2013-03/msg00276.html
  +
  +
(Open, not for 4.4)
  +
 
* qemu-upstream not freeing pirq
 
* qemu-upstream not freeing pirq
 
> http://www.gossamer-threads.com/lists/xen/devel/281498
 
> http://www.gossamer-threads.com/lists/xen/devel/281498
 
> http://marc.info/?l=xen-devel&m=137265766424502
 
> http://marc.info/?l=xen-devel&m=137265766424502
 
status: patches posted; latest patches need testing
 
status: patches posted; latest patches need testing
  +
it hasn't been tested because of the other passthrough issues.
  +
  +
Not a blocker.
   
 
* Race in PV shutdown between tool detection and shutdown watch
 
* Race in PV shutdown between tool detection and shutdown watch
 
> http://www.gossamer-threads.com/lists/xen/devel/282467
 
> http://www.gossamer-threads.com/lists/xen/devel/282467
 
> Nothing to do with ACPI
 
> Nothing to do with ACPI
status: Patches posted
+
status: Patches posted, need more work, will be stalled for some time
  +
The fix is to the Linux side of things.
 
  +
Not a blocker.
* Supposed regression from a3513737 ("x86: allow guest to set/clear
 
> MSI-X mask bit (try 2)"), as per
 
> http://lists.xenproject.org/archives/html/xen-devel/2013-09/msg01589.html.
 
 
* qemu-traditional mis-parses host bus 8 as 0
 
> http://bugs.xenproject.org/xen/bug/15
 
   
 
* xl does not support specifying virtual function for passthrough device
 
* xl does not support specifying virtual function for passthrough device
 
> http://bugs.xenproject.org/xen/bug/22
 
> http://bugs.xenproject.org/xen/bug/22
  +
Too much work to be a blocker.
 
* xl support for vnc and vnclisten options with PV guests
 
> http://bugs.xenproject.org/xen/bug/25
 
   
 
* xl does not handle migrate interruption gracefully
 
* xl does not handle migrate interruption gracefully
 
> If you start a localhost migrate, and press "Ctrl-C" in the middle,
 
> If you start a localhost migrate, and press "Ctrl-C" in the middle,
 
> you get two hung domains
 
> you get two hung domains
  +
Ian J investigated -- can of worms, too big to be a blocker for 4.4
 
* libxl / xl does not handle failure of remote qemu gracefully
 
> Easiest way to reproduce:
 
> - set "vncunused=0" and do a local migrate
 
> - The "remote" qemu will fail because the vnc port is in use
 
> The failure isn't the problem, but everything being stuck afterwards is
 
 
* xl needs to disallow PoD with PCI passthrough
 
>see http://xen.1045712.n5.nabble.com/PATCH-VT-d-Dis-allow-PCI-device-assignment-if-PoD-is-enabled-td2547788.html
 
 
* Win2k3 SP2 RTC infinite loops
 
> Regression introduced late in Xen-4.3 development
 
owner: andrew.cooper@citrix
 
status: patches posted, undergoing review. ( v2 ID
 
1386241748-9617-1-git-send-email-andrew.cooper3@citrix.com )
 
   
 
* HPET interrupt stack overflow (when using hpet_broadcast mode and MSI
 
* HPET interrupt stack overflow (when using hpet_broadcast mode and MSI
Line 152: Line 178:
 
owner: andyh@citrix
 
owner: andyh@citrix
 
status: patches posted, undergoing review iteration.
 
status: patches posted, undergoing review iteration.
  +
> andyhhp: I have more work to do on the HPET series
  +
> andyhhp: no way it is going to be ready or safe for 4.4
   
 
* PCI hole resize support hvmloader/qemu-traditional/qemu-upstream with PCI/GPU passthrough
 
* PCI hole resize support hvmloader/qemu-traditional/qemu-upstream with PCI/GPU passthrough
  +
> http://bugs.xenproject.org/xen/bug/28
 
> http://lists.xen.org/archives/html/xen-devel/2013-05/msg02813.html
 
> http://lists.xen.org/archives/html/xen-devel/2013-05/msg02813.html
 
> Where Stefano writes:
 
> Where Stefano writes:
Line 161: Line 190:
 
> (xen.ram) to make room for the bigger pci_hole
 
> (xen.ram) to make room for the bigger pci_hole
   
  +
status: not going to be fixed for 4.4 either. Created bug #28.
* qemu memory leak?
 
> http://lists.xen.org/archives/html/xen-users/2013-03/msg00276.html
 
   
 
</pre>
 
</pre>
Line 229: Line 257:
 
=== Big ticket items ===
 
=== Big ticket items ===
 
<pre>
 
<pre>
* PVH dom0 (w/ Linux)
+
* PVH dom0 (w/ Linux)
  +
blocker
 
owner: mukesh@oracle, george@citrix
 
owner: mukesh@oracle, george@citrix
 
status (Linux): Acked, waiting for ABI to be nailed down
 
status (Linux): Acked, waiting for ABI to be nailed down
status (Xen): v5 posted; considered a blocker
+
status (Xen): v6 posted; no longer considered a blocker
   
 
* libvirt/libxl integration (external)
 
* libvirt/libxl integration (external)
Line 242: Line 271:
 
- integration w/ libvirt's lock manager
 
- integration w/ libvirt's lock manager
 
- improved concurrency
 
- improved concurrency
 
* libxl: Spice usbredirection support for upstream qemu
 
> Includes usb2/3 support
 
status: v2 acked
 
prognosis: good
 
 
</pre>
 
</pre>
   
Line 485: Line 509:
   
 
</pre>
 
</pre>
  +
  +
[[Category:Xen]]
  +
[[Category:Xen 4.4]]

Latest revision as of 18:17, 10 February 2014

Rather than try to predict precisely what will make it into what release (which was something of a disaster last release), I'm just going to borrow a term from the Agile world and call all uncompleted features the "Backlog". I'll still track who is doing what, and when we get close, what state things seem to be in.

As mentioned in another e-mail, we'll also be working on improving the regression tester. Feel free to join us.

And as always, if you are working on a feature / bug that you want tracked, please respond to this e-mail.

Timeline

As discussed elsewhere, I am proposing a 6-month release cycle. Xen 4.3 was released on 9 July. That would give us a release on 9 January 2014. This is fairly close after the Christmas season, so I propose to make the estimated release date later, on 21 January, giving a few extra weeks for the holiday season:

  • Feature freeze: 18 October 2013
  • Code freezing point: 18 November 2013
  • First RCs: 6 December 2013 <<== WE ARE HERE
  • Release: When It's Ready 2014. (Probably mid-to-end of February.)

Feedback on the estimated dates is welcome.

Last updated: 10 February

Exception guidelines for after the code freeze

We have now reached the code freeze, so we need to be thinking carefully about every patch accepted. Below is a brief overview of the criteria the maintainers and the release coordinator will use to make decisions; you can help us out a lot by including your own analysis to any attached patch. Remember that the more conservative you are in your own analysis, the less strict we need to be in our analysis.

Our goal for the release are, in this order:

  1. A bug-free release
  2. An awesome release
  3. An on-time release

Accepting any patch at this point may fix some bugs or enable some functionality, but has a risk of introducing other bugs (breaking other functionality). That bug may be found before the release (threatening #3), or it may not be found until after the release (threatening #1).

The "expected value" of a risk is how bad the risk is times the probability of that risk.

So when considering a bug fix, three questions need to be asked:

  1. What functionality is being fixed / enabled by this patch?
  2. If there was a bug in this patch, what functionality might be broken?
  3. What is the probability that this patch has a bug?
  4. If the patch had a bug, what is the probability it would be found before the release?
  5. Given the above benefit and risk, is this patch worth it?

When asking #2, I think we need to think 95th percentile weighted by probability. It is always conceivable that a minor change will cause a cascading series of failures leading to global thermonuclear war and the annihilation of the human race; but if we always thought like that we'd never do anything at all.

When considering #3, consider things like the complexity of the patch, complexity of the underlying code, reviewer confidence.

You can find a short example of this kind of analysis here: http://marc.info/?l=xen-devel&m=138617980623176

Completed

* Event channel scalability (FIFO event channels)

* Non-udev scripts for driver domains (non-Linux driver domains)

* Multi-vector PCI MSI (Hypervisor side)

* Improved Spice support on libxl
 - Added Spice vdagent support
 - Added Spice clipboard sharing support
 - Spice usbredirection support for upstream qemu 

* PHV domU (experimental only)

* pvgrub2 checked into grub upstream

* ARM64 guest

* Guest EFI booting (tianocore)

* kexec

* Testing: Xen on ARM

* Update to SeaBIOS 1.7.3.1

* Update to qemu 1.6.2

* SWIOTLB (in Linux 3.13)

* Disk: indirect descriptors (in 3.11)

* Reworked ocaml bindings 

Resolved since last update

* qemu-* parses "008" as octal in USB bus.addr format

* Claim mode and PoD

* Disable IOMMU if no southbridge

* osstest windows-install failures

* libxl / libvirt races

Open

* Win2k3 SP2 RTC infinite loops
   > Regression introduced late in Xen-4.3 development
   owner: andrew.cooper@citrix
   status: patches posted, undergoing review.

* PVH regression

* dirty vram / IOMMU bug
 > http://bugs.xenproject.org/xen/bug/38
 status: Patch posted

* RHEL 7 pygrub patches
 > http://bugs.xenproject.org/xen/bug/39
 status: Wait for 4.4.1?

* credit2 runqueues
 > http://bugs.xenproject.org/xen/bug/36

* RHEL 5.x ocaml build bug
  status: patch posted

* libxl / xl does not handle failure of remote qemu gracefully
  > Related to http://bugs.xenproject.org/xen/bug/30
  > Easiest way to reproduce: 
  >  - set "vncunused=0" and do a local migrate
  >  - The "remote" qemu will fail because the vnc port is in use
  > The failure isn't the problem, but everything being stuck afterwards is
 Ian J investigating

* qemu memory leak?
  > http://lists.xen.org/archives/html/xen-users/2013-03/msg00276.html

(Open, not for 4.4)

* qemu-upstream not freeing pirq 
 > http://www.gossamer-threads.com/lists/xen/devel/281498
 > http://marc.info/?l=xen-devel&m=137265766424502
 status: patches posted; latest patches need testing
 it hasn't been tested because of the other passthrough issues.

 Not a blocker.

* Race in PV shutdown between tool detection and shutdown watch
 > http://www.gossamer-threads.com/lists/xen/devel/282467
 > Nothing to do with ACPI
 status: Patches posted, need more work, will be stalled for some time
 The fix is to the Linux side of things.
 Not a blocker.

* xl does not support specifying virtual function for passthrough device
 > http://bugs.xenproject.org/xen/bug/22
 Too much work to be a blocker.

* xl does not handle migrate interruption gracefully
  > If you start a localhost migrate, and press "Ctrl-C" in the middle,
  > you get two hung domains
 Ian J investigated -- can of worms, too big to be a blocker for 4.4

* HPET interrupt stack overflow (when using hpet_broadcast mode and MSI
capable HPETs)
  owner: andyh@citrix
  status: patches posted, undergoing review iteration.
  > andyhhp: I have more work to do on the HPET series
  > andyhhp: no way it is going to be ready or safe for 4.4

* PCI hole resize support hvmloader/qemu-traditional/qemu-upstream with PCI/GPU passthrough
  > http://bugs.xenproject.org/xen/bug/28
  > http://lists.xen.org/archives/html/xen-devel/2013-05/msg02813.html
  > Where Stefano writes:
  > 2) for Xen 4.4 rework the two patches above and improve
  > i440fx_update_pci_mem_hole: resizing the pci_hole subregion is not
  > enough, it also needs to be able to resize the system memory region
  > (xen.ram) to make room for the bigger pci_hole

  status: not going to be fixed for 4.4 either. Created bug #28.

Backlog

Testing coverage

* new libxl w/ previous versions of xl
 @IanJ

* Host S3 suspend
 @bguthro, @dariof

* Default [example] XSM policy
 @Stefano to ask Daniel D	

* Storage driver domains
 @roger

* HVM pci passthrough
 @anthony

* PV pci passthrough
 @konrad (or @george if he gets to it first)

* Network driver domains
 @George

* Nested virt?
 @intel (chased by George)

* Fix SRIOV test (chase intel)
 @ianj

* Fix bisector to e-mail blame-worthy parties
 @ianj
 
* Fix xl shutdown 
  @ianj

* stub domains
  @athony

* performance benchmarks
  @dario

Meta-items (composed of other items)

* Meta: PVIO NUMA improvements
 - NUMA affinity for vcpus (4.4 possible)
 - PV guest NUMA interface (4.4 possible)
 - Sensible dom0 NUMA layout 
 - Toolstack pinning backend thread / virq to appropraite d0 vcpu
 - NUMA-aware ballooning

* xend still in tree (x)
 - xl list -l on a dom0-only system
 - xl list -l doesn't contain tty console port
 - xl Alternate transport support for migration*
 - xl PVSCSI support
 - xl PVUSB support

Big ticket items

* PVH dom0 (w/ Linux) 
  blocker
  owner: mukesh@oracle, george@citrix
  status (Linux): Acked, waiting for ABI to be nailed down
  status (Xen): v6 posted; no longer considered a blocker

* libvirt/libxl integration (external)
 - owner: jfehlig@suse, dario@citrix
 - patches posted (should be released before 4.4)
  - migration
  - PCI pass-through
 - In progress
  - integration w/ libvirt's lock manager
  - improved concurrency

Missed the feature freeze

* libxl network buffering support for Remus
   @shriram
   status: patches posted
   prognosis: fair

* xencrashd
   owner: don@verizon
   status: v2 posted
  > http://lists.xen.org/archives/html/xen-devel/2013-11/msg02569.html

* ARM Live Migration Support
  owner: Jaeyong Yoo <jaeyong.yoo@samsung.com>
  status: Not for 4.4

* soft affinity for vcpus (was NUMA affinity for vcpus)
    owner: Dario
    status: v2 posted

* PV guest NUMA interface
    owner: Elena 
    status: v3 posted

* xl USB pass-through for HVM guests using Qemu USB emulation
  owner: George
  status: v6 patch series posted

* Sensible dom0 NUMA layout

* Toolstack pinning backend thread / virq to appropraite d0 vcpu

* NUMA Memory migration 
  owner: dario@citrix
  status: In progress

* NUMA-aware ballooning
   owner: Li Yechen
   status: in progress

* xl migrate transport improvements
 owner: None
 > See discussion here: http://bugs.xenproject.org/xen/bug/19
 - Option to connect over a plain TCP socket rather than ssh
 - xl-migrate-recieve suitable for running in inetd
 - option for above to redirect log output somewhere useful
 - Documentation for setting up alternate transports

* HVM guest NUMA
  owner: Matt Wilson@amazon
  status: in progress (?)

* qemu-upstream stubdom, Linux
   owner: anthony@citrix
   status: in progress
   prognosis: ?
   qemu-upstream needs a more fully-featured libc than exists in
   mini-os.  Either work on a minimalist linux-based stubdom with
   glibc, or port one of the BSD libcs to minios.

* qemu-upstream stubdom, BSD libc
  prognosis: ?
  owner: ianj@citrix

* Network performance improvements
  owner: wei@citrix

* Xen EFI feature: Xen can boot from grub.efi
 owner: Daniel Kiper
 status: in progress

* Default to credit2
 status: Probably not for 4.4
 - cpu pinning
 - NUMA affinity
 - cpu "reservation"

* xenperf
  prognosis: Deferred to 4.5
  Owner: Boris Ostrovsky
  status: v2 patches posted

* Nested virtualization on Intel

* Nested virtualization on AMD

* Multi-vector PCI MSI (upstream Linux)
  owner: boris@oracle

* xl: passing more defaults in configuration in xl.conf
  owner: ?
  There are a number of options for which it might be useful to pass a
  default in xl.conf.  For example, if we could have a default
  "backend" parameter for vifs, then it would be easy to switch back
  and forth between a backend in a driver domain and a backend in dom0.

* xl PVUSB pass-through for PV guests
* xl PVUSB pass-through for HVM guests
  owner: George
  status: ?
  xm/xend supports PVUSB pass-through to guests with PVUSB drivers (both PV and HVM guests).
  - port the xm/xend functionality to xl.
  - this PVUSB feature does not require support or emulation from Qemu.
  - upstream the Linux frontend/backend drivers. Current work-in-progress versions are in Konrad's git tree.
  - James Harper's GPLPV drivers for Windows include PVUSB frontend drivers.

* Xen EFI feature: pvops dom0 able to make use of EFI run-time services (external)
 owner: Daniel Kiper
 status: Just begun

Clean-ups


* mac address changes on reboot if not specified in config file
  > Needs a robust way to "add" to the config

* ACPI WAET table vs RTC emulation mode
  owner: jan@suse
  prognosis: ?
 > An overly simplified fix was posted a while ago
 > (http://lists.xenproject.org/archives/html/xen-devel/2013-07/msg00122.html),
 > but Tim's objection is rather valid. I can't, however, estimate
 > if/when I would find time to learn what tools side changes are
 > necessary to accommodate a new HVM param, and hence this is
 > currently stalled.  The current solution (as of 3fa7fb8b ["x86/HVM:
 > RTC code must be in line with WAET flags passed by hvmloader"])
 > isn't desirable to be kept for 4.4.

* Polish up xenbugtool
  owner: wei.liu2@citrix.com

* Sort out better memory / ballooning / dom0 autoballooning thing
 > Don't forget NUMA angle
 - Inaccurate / incomplete info from HV

* Implement Xen hypervisor dmesg log entry timestamps
 > https://xenorg.uservoice.com/forums/172169-xen-development/suggestions/3924048-implement-xen-hypervisor-dmesg-log-entry-timestamp
 > Request seems to be for a shorter stamp (seconds-only, rather than full date)

* Make network driver domains easier to set up / more useful
 - Make it easy to make a device assignable (in discussion)
 - Automatically start/shutdown (xendomains?)
 - Pause booting of other domains until network driver domain is up (necessary?)

* libxl: More fine-grained control over when to pass through a device
 > Some IOMMUs are secure; some are merely functional, some are not present.
 > Allow the adminitrator to set the default 

* qxl
  > http://bugs.xenproject.org/xen/bug/11
  - Uninitialized struct element in qemu
  - Revert 5479961 to re-enable qxl in xl,libxl
  - Option in Xen top-level to enable qxl support in qemu tree
  - Fix sse2 MMIO issue
   - make word size arbitrary

* libxl config file
  > "non-xl toolstacks which use libxl could specify configuration
  > options for some things.... things like locations of binaries come
  > to mind; maybe so that distros could package up libxl and say
  > where things were, and other programs could like against it..."
  > "There some settings that you'd want to be host wide for any libxl
  > using toolstacks sharing a host (e.g. xl and xapi). Default
  > vif-scripts and policy WRT selecting disk backends are two which
  > spring to mind.  Probably a great deal of xl.conf actually belongs
  > in libxl.conf"

* libxl: Don't use RAW format for "URL"-based qdisks (e.g., rbd:rbd/foo.img)
  - Figure out whether to use a generic URL or have a specific type for each one
  - Check existence of disk file for all RAW

* acpi-related xenstore entries not propagated on migrate
 > http://www.gossamer-threads.com/lists/xen/devel/282466
 > Only used by hvmloader; only a clean-up, not a bug.


Wishlist / someday


* Make storage migration possible
  owner: ?
  status: none
  There needs to be a way, either via command-line or via some hooks,
  that someone can build a "storage migration" feature on top of libxl
  or xl.

* Full-VM snapshotting
  owner: ?
  status: none
  Have a way of coordinating the taking and restoring of VM memory and
  disk snapshots.  This would involve some investigation into the best
  way to accomplish this.

* VM Cloning
  owner: ?
  status: none
  Again, a way of coordinating the memory and disk aspects.  Research
  into the best way to do this would probably go along with the
  snapshotting feature.

* xl vm-{export,import}
  owner: ?
  status: none
  Allow xl to import and export VMs to other formats; particularly
  ovf, perhaps the XenServer format, or more.
  
* Memory: Replace PoD with paging mechanism
  owner: george@citrix
  status: none

* PV audio (audio for stubdom qemu)
  owner: stefano.panella@citrix
  status: ?

* Wait queues for mm
 > Needed for more advanced paging schemes
  owner: ?
  status: Draft posted Feb 2012; more work to do.

* V4V: Inter-domain communication
  owner (Xen): dominic.curran@citrix.com
  status (Xen): patches submitted
  owner (Linux driver):  stefano.panella@citrix
  status (Linux driver): in progress

* Serial console improvements
  owner: ?
  status: Stalled (see below)
  -xHCI debug port (Needs hardware)
  -Firewire (needs hardware)