Design Sessions 2018

From Xen
Revision as of 10:41, 9 August 2019 by Lars.kurth (talk | contribs) (Moved from main page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

2018: Developer and Design Summit

Icon todo.png To Do:

Session hosts, please send notes to xen-devel@ and update the wiki or send the notes to community dot manager at xenproject dot org.

Architecture

A year ago Wei presented two projects about reworking x86 Xen. A lot of things happened since then. This session aims to give a quick update on the progress and asks stakeholders for suggestions and opinions for future development.
Previously, when there is PCI device being passed through, the QEMU can only run in privileged mode. This design is to let QEMU always run in de-privileged mode.
  • change to xen, mainly in libdevicemodel to add the DM-ops for passing through PCI device in xen-domid-restrict mode.
  • change to libxl, to pass PCI config fd to QEMU.
  • change to QEMU, to read configuration and avoid reading from /dev/mem directly.
  • change to toolstack, to allow QEMU read PCI info from sysfs.
Need further discusion about reading form /dev/mem part.
  • what device / OS will perform this operation(read from /dev/mem)
  • can mmapping of /sys/bus/pci/devices/0000:XX:XX.X/resouceX replace the reading from /dev/mem.
  • (NO NOTES) Resource mapping, PV-IOMMU and page ownership in Xen (Paul Durrant, Citrix)
Icon Info.png Comment by host: I don’t think anyone took notes at my session. I got what I needed out of it though… and it’s probably not of particular interest to anyone who was not there.
The recent series to add direct resource mapping into Xen highlighted areas where the current status quo of PV domains being able to map any page assigned to them is problematic from a security PoV. There are pages that constitute a resource, which should probably be accounted to a domain without that domain having the privilege to map the resource. The current scheme does not allow for this. Thus it would be useful to discuss ideas on how we might improve the situation.
Page ownership also creates problems with PV-IOMMU when dealing with grant mapped foreign pages.

Intel Specific

Slides are available here
Intel Processor Trace is a hardware feature that recording information about software execution with minimal impact to system execution. Existing hardware is unfriendly to enable Intel PT in guest because the implementation of shadow ToPA is very complex. Intel PT VMX improvements will treat PT output addresses as Guest Physical Addresses (GPAs) and translate them using EPT that serves to simplify the process of Intel PT virtualization for using by a guest software. This discussion is intended for the deep dive introduction of Intel Processor Trace and design discuss of SYSTEM mode implementation, Intel PT introspection, new qualification of Intel PT output, nested, live migration and so on.
To better support HPC, Intel has launched a product, code named Knight Mill, which supports up to 288 logical CPUs and a high-bandwidth on-die memory called MCDRAM. We have been working on supporting Xen to build HPC clouds. One main task is to enlarge the maximum number of vCPUs in a HVM guest to 288. Although we have sent out several versions of patches for this purpose, not all problems are revealed and discussed. In this design session, we want to discuss these problems and reach an agreement on how to deal with them.
Icon todo.png To Do:

Update this once the new version of the NVDIMM DOC is available.

Non-Volatile Dual In-line Memory Module or NVDIMM is a type of memory device that can provide persistent storage and retain data across power cycles/failures. This discussion is about the design to support NVDIMM in Xen.
The Notes will be included in an updated version of the NVDIMM DOC. Slides are available here
Software Guard Extensions (SGX) is Intel's unique security feature which has been present in Intel's processors since Skylake generation. Existing HW/SW solutions hypervisor does not protect tenants against the cloud provider and thus the supplied operating system and hardware. Intel SGX solves this by using enclave, which is a protected portion of userspace application where the code/data cannot be accessed directly from outside by any software, including privileged ones, such as BIOS and VMM. This discussion is intended for the deep dive introduction to SGX, and the design discussion of adding SGX virtualization to Xen. We will start with SGX deep dive, and then go into SGX virtualization design, from high level design to details, such as EPC management/virtualization, CPUID handling, interaction with VMX, live migration support, etc.

Embedded and Safety

  • [TODO Dom0less and static partitioning] (Stefano Stabellini, XILINX)
Running Xen without Dom0
  • [TODO A Strawman Plan to Make Safety Certification for Xen Easier] (Lars Kurth, Xen Project)
The Plan
Hypervisors were once seen as purely cloud and server technologies, but have slowly seeped into the embedded space. This is in particular true for the Xen Project, which is being used by a number of vendors to build automotive stacks.
However, to be successful in automotive (as well as other future market segments where Xen could be useful), the project needs to be easily certifiable. To facilitate this, we have developed a straw-man plan, which focusses on the following topics
  • Reducing code size significantly using Kconfig
  • Coding standards
  • An RTOS based Dom0, or dom0-less Hypervisor
  • Etc.
In this session, we will share the high-level plan, with the goal to identify any collaborators and get community feedback. The session will also touch briefly on longer term challenges.
Feedback received
We will also share feedback from others so far, such as feedback from Genivi AMM, Platform Security Summit, Linaro and others.
Status Update
How much progress have we made
There are an increasing interest to share the GPU between multiple domain. This is an open session to discuss on possibility to support different GPU (Mail, PowerVR,...) with Xen.

Working Practice, Process, ...

Using Docker containers to provide "official" build environments with known dependencies that can be used to build Xen and build all of its components. Using GitLab to build every commit to help catch regressions early. Looking to discuss how to best do this and the end goal with some time frames to make this happen.
Could we automate some tests for submitted series to the ML?
  • [TODO (Automated?) Performance Testing in Virtualization] (Dario Faggioli, Suse)
Detecting performance regressions, and identifying what causes them, is particularly hard, in virtualization. In fact, what benchmarks shall be used? In what kind of VMs do we run them? How many VMs, and how large? All equally large? Same benchmarks in all VMs? Also, what do we want to measure: virtualization overhead? The impact of a change/feature, or of a particular configuration of the hypervisor, the host OS or the guest OS? Or maybe we want to compare different virtualization solutions?
Also, with so many moving parts, automation is a must, but may also be problematic. E.g., hosts and VMs need being provisioned and benchmarks run concurrently in VMs.
And what about comparing different runs, reaching statistical significance...
This session goes over these challenges, explains what is being done, both within SUSE and in the community, and tries to envision how to improve things.
Release Cadence: 2 years ago, we moved to a 6 monthly release cadence. The idea was to help companies getting features into Xen in a more predictable way. This appears not to have worked. At the same time, the number of releases is creating problems for the security team and some downstreams. I wanted to collect views to kick-start an e-mail discussion.
Security Process: See https://lists.xenproject.org/archives/html/xen-devel/2018-05/msg01127.html
Other changes that may be worth highlighting ...

Performance

The Xen domU create/destroy and device hotplug rely on xenwatch kernel thread to run xenwatch event callback function for each subscribed xenstore node update. Any event callback function hang would stall the only single xenwatch thread and forbid further domU create/destroy or device hotplug. This talk presents how Xenwatch Multithreading can address the xenwatch stall issue. In addition to the default xenwatch thread, the dom0 will create a per-domU kernel thread for each domU to handle their own xenwatch event. Therefore, domU create/destroy or device hotplug are still allowed even when a specific per-domU xenwatch thread is stalled. This talk first discusses the limitation in single-threaded xenwatch design with some case studies, then explains the basic knowledge on paravirtual driver, and finally presents the challenge, design and implementation of xenwatch multithreading.

Security

  • [TODO Silo mode for extra defence in depth] (Xin Li, Citrix)
workloads, with an expectation of no cross communication. Therefore, the default in Xen of allowing arbitrary communication is an unnecessary set of attack surfaces. We'd like to support, by default, rather more restrictions in use cases like this.
  • [TODO Panopticon: See no secrets, leak no secrets] (George Dunlap, Citrix)
This is a follow-on from the Spectre/Meltdown issues, where it would be a very good idea to get rid of the Directmap/etc, and we should think about doing per-domain heaps/etc. to reduce the quantity of "non-relevant" data mapped in context, to reduce the risk of data leakage.
  • [TODO What is OpenXT and the Xen Security Community Doing - this was primarily about measured boot and Win10 support] (Lars Kurth, Xen Project & Rich Persaud, OpenXT)
Paul Durrant and Lars Kurth were at https://www.platformsecuritysummit.com/ this year. Lars is happy to walk those who are interested over the highlights, expected contributions, etc. from the event and answer questions that you may have.

Other

  • [TODO USB pass-through on Xenserver] (Xin Li, Citrix)
Previously user can only passthrough the whole USB controller(as PCI device) via cmdline. This feature will allow user pass through different physical USB devices to different VMs. Current solution is based on QEMU. To support all guest OS (both HVM and PV), there's a solution alternative to implement PVUSB. To use PVUSB, we need usbfront in guest OS (Windows and Linux), and usbback in dom0. Previously there's ever PVUSB frontend/backend drivers in SLES11, but they were removed then. So now there's no Linux kernel support for PVUSB (neither usbfront nor usbback). There's no Windows usbfront for PVUSB either. We'd like to raise this topic and discuss:
  • compare our phase 1 solution - QEMU based USB passthrough and PVUSB;
  • the issues of the ever existing PVUSB solution (SLES11);
  • the plan to implement PVUSB and address the issues above.
  • [TODO From Hobbyist to Maintainer, Why and How] (Wei Liu, Citrix)
Open source projects like Xen and Linux kernel have become the corner stones of our modern infrastructure. In this session Wei is going to explain why one, as a software engineer, would want to invest in building up technical competence and soft skill to ultimately become a maintainer in those established projects, how this can help personal career goal and business development, and finally what is involved in getting maintainership.
  • [TODO Unikraft: Design and Use Cases] (Florian Schmidt, NEC)
We can discuss the architecture of unikraft, and collect suggestions from the community. Let's also collect use cases that people use Mini-OS for, to see what functionality is still needed to eventually replace Mini-OS with unikraft.