Difference between revisions of "Xen Project Schedulers"

From Xen
(RTDS is still marked experimental in xen/common/sched_rt.c (HEAD))
 
(10 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
__TOC__
 
__TOC__
 
= Overview =
 
= Overview =
The Xen Project Hypervisor supports several different schedulers with different properties.
+
The Xen Project Hypervisor supports several different virtual CPU schedulers, with different properties.
   
  +
The job of an hypervisor's scheduler is to decide, among all the various vCPUs of the various virtual machines, which ones should run on the host's physical CPUs (pCPUs), at any given time.
Different schedulers can be assigned to
 
* an entire host
 
* a [[Cpupools_Howto|pool]] of physical CPU’s on a host (VMs need to be assigned to a pool or pinned to a CPU)
 
   
  +
It also supports having more schedulers ''active'' at the same time, on disjoint groups of pCPUs (see [[Cpupools_Howto|cpupool]])
Scheduler parameters can be modified per
 
* an entire host
 
* a CPU pool
 
* A Virtual Machine
 
   
  +
[[File:Sched2.jpg|none|400px]]
<gallery>
 
File:Sched1.jpg
 
File:Sched2.jpg
 
File:Sched3.jpg
 
</gallery>
 
   
  +
In this case, each pool has its own scheduler. In fact, even if two pools use the ''same'' scheduler, this means they're using two completely different and isolated '''instances''' of the same scheduling algorithm.
== Schedulers in Xen 4.5 and beyond ==
 
  +
''Legend:''
 
  +
The user interacts with and affects the behaviour of the scheduler by:
* {{Tick}} likely in 4.6
 
  +
* checking or changing a scheduler's global parameters,
* {{HalfDone}} possible in 4.6
 
  +
* checking or changing a VM's scheduling parameters.
<br>
 
  +
  +
[[File:Sched3.jpg|none|400px]]
  +
  +
= Currently Available Schedulers =
  +
  +
== The Credit Scheduler ==
  +
  +
[[Credit Scheduler|Credit]] is a general purpose, weighted fair share scheduler, and is the current default.
  +
  +
== The Credit2 Scheduler ==
  +
  +
[[Credit2 Scheduler Development|Credit2]] is the evolution of Credit, more scalable and better with latency sensitive workload, while still being based on a general purpose, weighted fair share, scheduling algorithm.
  +
  +
== The RTDS Scheduler ==
  +
  +
[[RTDS-Based-Scheduler|RTDS]] is a real-time scheduler, meant at supporting real-time workloads in the cloud, as well as embedded and mobile virtualization use cases.
  +
  +
== The ARINC653 Scheduler ==
  +
  +
[[ARINC653 Scheduler|ARINC653]] is an embedded (automotive and avionics) real-time scheduler.
  +
  +
= Use cases and Support Status =
   
 
{|class="prettytable" style="text-align: left;" valign="top"
 
{|class="prettytable" style="text-align: left;" valign="top"
 
!style="width: 15%;"|Scheduler
 
!style="width: 15%;"|Scheduler
 
!style="width: 30%;"|Use-cases
 
!style="width: 30%;"|Use-cases
!style="width: 15%;"|Xen 4.5
+
!style="width: 15%;"|Xen < 4.7
!style="width: 30%;"|Plans for 4.6+
+
!style="width: 15%;"|Xen 4.8
  +
!style="width: 15%;"|Xen 4.9
  +
!style="width: 15%;"|Xen 4.12
 
|-
 
|-
 
|[[Credit_Scheduler|Credit]]
 
|[[Credit_Scheduler|Credit]]
 
|General Purpose
 
|General Purpose
  +
|{{Tick}} Supported<br>{{Tick}} '''Default'''
  +
|Supported<br>'''Default'''
 
|Supported<br>'''Default'''
 
|Supported<br>'''Default'''
 
|Supported
 
|Supported
 
|-
 
|-
|[[Credit2_Scheduler_Development|Credit 2]]
+
|[[Credit2_Scheduler_Development|Credit2]]
 
|General Purpose<br>
 
|General Purpose<br>
Optimized for lower latency, high VM density
+
Optimized for low latency, scalability, high VM density
|Experimental
+
|{{Tick}} Experimental
|{{Tick}} Supported<br>{{HalfDone}} '''Default'''
+
|{{Tick}} Supported
  +
|Supported
  +
|Supported<br>'''Default'''
 
|-
 
|-
 
|[[RTDS-Based-Scheduler|RTDS]]
 
|[[RTDS-Based-Scheduler|RTDS]]
|Soft & Firm Real-time<br>Multicore<br>Embedded, Automotive, Graphics & Gaming in the Cloud, Low Latency Workloads
+
|Soft & Firm Real-time<br>Embedded, mobile & automotive<br>Graphics & Gaming in the Cloud
  +
|{{Tick}} Experimental
  +
|{{Tick}} Improved xl support<br>Experimental
  +
|Experimental
 
|Experimental
 
|Experimental
|{{Tick}} Hardening<br>{{Tick}} Optimization<br>{{Tick}} Better XL support<br>{{Tick}} <1μs granularity<br>{{HalfDone}} Supported
 
 
|-
 
|-
 
|[[ARINC653_Scheduler|ARINC 653]]
 
|[[ARINC653_Scheduler|ARINC 653]]
|Hard Real-time <br>Single core<br>Avionics, Drones, Medical
+
|Hard Real-time <br>Avionics, Drones, Medical
  +
|[https://lists.xenproject.org/archives/html/xen-devel/2015-06/msg00972.html Supported?]
|Supported<br>Compile time
 
  +
|?
|{{Tick}} No change
 
  +
|?
  +
|?
 
|}
 
|}
   
  +
= Historical Xen Schedulers =
== Also See ==
 
* '''sched''' heading in [http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html Xen unstable boot options] (for other versions of Xen, see [[:Category:ManPage]])
 
* [[XL]] scheduler commands under '''SCHEDULER SUBCOMMANDS''' in [http://xenbits.xen.org/docs/unstable/man/xl.1.html#scheduler_subcommands XL unstable man pages] (for other versions of Xen, see [[:Category:ManPage]])
 
* [[Credit Scheduler]]
 
* [[Credit2 Scheduler Development]]
 
* [[RTDS-Based-Scheduler]]
 
* [[ARINC653 Scheduler]]
 
* [[:Category:Scheduler]]
 
* [[:Category:Resource Management]]
 
* [[:Category:Performance]]
 
   
  +
== simple Earliest Deadline First (sEDF) ==
= History of Xen Schedulers =
 
   
  +
Quoting from sEDF (not any longer) in-tree documentation, "this scheduler provides weighted CPU sharing in an intuitive way and uses real-time
This content was originally compiled by [http://jacobmathai.blogspot.com Jacob Mathai].
 
  +
algorithms to ensure time guarantees."
   
  +
The real-time algorithm used was [http://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling Earliest Deadline First (EDF)], although it was modified for being used as a general purpose scheduler too. It could work in both work conserving and non-work conserving modes.
== 1. Borrowed Virtual Time (Xen 2.0/3.0) ==
 
   
  +
It was introduced in Xen 3.0, and was the default for a while. The scheduler was never properly adapted for dealing with SMP systems and multi vCPUs VMs. Both were working, but behavior and performance were unideal and unreliable. It was eventually removed from Xen 4.6.
<pre><nowiki>
 
sched=bvt
 
Global Parameters
 
ctx_allow - The context switch allowance is similar to the ''quantum'' in traditional schedulers.
 
It is the minimum time that a scheduled domain will be allowed to run before being preempted.
 
   
  +
== Borrowed Virtual Time (BVT)==
Per-domain parameters
 
mcuadv - the MCU (Minimum Charging Unit) advance determines the proportional share of the CPU
 
that a domain receives. It is set inversely proportionally to a domain's sharing weight.
 
warp - the amount of `virtual time' the domain is allowed to warp backwards
 
warpl - the warp limit is the maximum time a domain can run warped for
 
warpu - the unwarp requirement is the minimum time a domain must run unwarped for before it can warp again
 
</nowiki></pre>
 
   
  +
A ''virtual time'' based fair-share, general purpose, scheduler in use in Xen 2.0 and 3.0. Domains's shares of CPU time were determined by their weights. What it is traditionally called ''quantum'', or ''timeslice'', was known there as '''context switch allowance''', and was configurable. It was SMP enabled, but lacked a non-work conserving mode.
== 2. Atropos (Xen 2.0) ==
 
   
  +
== Atropos ==
<pre><nowiki>
 
sched=atropos
 
Atropos is a soft real time scheduler. It provides guarantees about absolute shares of the CPU,
 
with a facility for sharing slack CPU time on a best-effort basis. It can provide timeliness
 
guarantees for latency-sensitive domains.
 
   
  +
A soft real-time scheduler, capable of providing guarantees on the absolute shares of CPU time, and allowing using the ''slack'' on a best-effort basis. Of course (as it's always the case in RT schedulers) CPU slices were only really guaranteed in absence of CPU over-commitment.
Every domain has an associated period and slice. The domain should receive `slice' nanoseconds
 
every `period' nanoseconds. This allows the administrator to configure both the absolute share
 
of the CPU a domain receives and the frequency with which it is scheduled.
 
   
  +
It was in use in Xen 2.0.
Note: don't over-commit the CPU when using Atropos (i.e. don't reserve more CPU than is
 
available -- the utilization should be kept to slightly less than 100% in order to ensure predictable
 
behavior).
 
   
  +
== Round Robin ==
Per-domain parameters :
 
period - The regular time interval during which a domain is guaranteed to receive its allocation of CPU time.
 
slice - The length of time per period that a domain is guaranteed to run for (in the absence of voluntary yielding of the CPU).
 
latency - The latency hint is used to control how soon after waking up a domain it should be scheduled.
 
xtratime - This is a boolean flag that specifies whether a domain should be allowed a share of the system slack time.
 
</nowiki></pre>
 
   
  +
It was... well... [https://en.wikipedia.org/wiki/Round-robin_scheduling Round Robin]! IT was there as a simple demonstration of Xen's internal scheduler API, not for real production use.
== 3. Round Robin (Xen 2.0) ==
 
   
  +
== Also See ==
<pre><nowiki>
 
sched=rrobin
 
The round robin scheduler is included as a simple demonstration of Xen's internal scheduler
 
API. It is not intended for production use.
 
 
Global Parameters
 
rr_slice - The maximum time each domain runs before the next scheduling decision is made.
 
</nowiki></pre>
 
 
== 4. sEDF scheduler (Xen 3.0) ==
 
 
<pre><nowiki>
 
sched=sedf
 
(from docs/misc/sedf_scheduler_mini-HOWTO.txt)
 
This scheduler provides weighted CPU sharing in an intuitive way and uses realtime-algorithms
 
to ensure time guarantees.
 
 
Per-domain parameters
 
use "xm sched-sedf <dom-id> <period> <slice> <latency-hint> <extra> <weight>"
 
-period/slice are the normal EDF scheduling parameters in nanosecs
 
-latency-hint is the scaled period in case the domain is doing heavy I/O
 
(unused by the currently compiled version)
 
-extra is a flag (0/1), which controls whether the domain can run in extra-time
 
-weight is mutually exclusive with period/slice and specifies another way of setting a domains cpu slice
 
See wikipedia for a short intro to EDF:
 
http://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling
 
</nowiki></pre>
 
   
  +
* '''sched=''' boot parameter in [http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html Xen unstable boot options]
== 5. ARINC 653 (Xen 4.0) ==
 
  +
* [[XL|xl]] [http://xenbits.xen.org/docs/unstable/man/xl.1.html#SCHEDULER-SUBCOMMANDS scheduler subcommands]
 
  +
* [[:Category:Scheduler]]
<pre><nowiki>
 
  +
* [[:Category:Resource Management]]
sched=arinc653
 
  +
* [[:Category:Performance]]
The arinc653 scheduler follows the ARINC 653 specification for scheduling, giving each partition (domain) a
 
fixed, dedicated time slot for execution.
 
 
Note: Current implementation does not support multicore, so 'maxcpus=1' must be set at boot.
 
</nowiki></pre>
 
 
= System Calls and Scheduling =
 
 
<pre><nowiki>
 
Some Scheduling System Calls
 
/schedule.c
 
SCHEDOP_yield
 
SCHEDOP_block
 
SCHEDOP_shutdown
 
*nice( )
 
getpriority( )
 
setpriority( )
 
sched_getscheduler( )
 
sched_setscheduler( )
 
sched_getparam( )
 
sched_setparam( )
 
sched_yield( )
 
sched_get_ priority_min( )
 
sched_get_ priority_max( )
 
sched_rr_get_interval( )
 
</nowiki></pre>
 
 
A related wiki topic on Real Time Applications & [[Preemption]] .
 
   
 
[[Category:Xen]]
 
[[Category:Xen]]

Latest revision as of 04:30, 7 December 2019

Overview

The Xen Project Hypervisor supports several different virtual CPU schedulers, with different properties.

The job of an hypervisor's scheduler is to decide, among all the various vCPUs of the various virtual machines, which ones should run on the host's physical CPUs (pCPUs), at any given time.

It also supports having more schedulers active at the same time, on disjoint groups of pCPUs (see cpupool)

Sched2.jpg

In this case, each pool has its own scheduler. In fact, even if two pools use the same scheduler, this means they're using two completely different and isolated instances of the same scheduling algorithm.

The user interacts with and affects the behaviour of the scheduler by:

  • checking or changing a scheduler's global parameters,
  • checking or changing a VM's scheduling parameters.
Sched3.jpg

Currently Available Schedulers

The Credit Scheduler

Credit is a general purpose, weighted fair share scheduler, and is the current default.

The Credit2 Scheduler

Credit2 is the evolution of Credit, more scalable and better with latency sensitive workload, while still being based on a general purpose, weighted fair share, scheduling algorithm.

The RTDS Scheduler

RTDS is a real-time scheduler, meant at supporting real-time workloads in the cloud, as well as embedded and mobile virtualization use cases.

The ARINC653 Scheduler

ARINC653 is an embedded (automotive and avionics) real-time scheduler.

Use cases and Support Status

Scheduler Use-cases Xen < 4.7 Xen 4.8 Xen 4.9 Xen 4.12
Credit General Purpose Supported
Default
Supported
Default
Supported
Default
Supported
Credit2 General Purpose

Optimized for low latency, scalability, high VM density

Experimental Supported Supported Supported
Default
RTDS Soft & Firm Real-time
Embedded, mobile & automotive
Graphics & Gaming in the Cloud
Experimental Improved xl support
Experimental
Experimental Experimental
ARINC 653 Hard Real-time
Avionics, Drones, Medical
Supported? ? ? ?

Historical Xen Schedulers

simple Earliest Deadline First (sEDF)

Quoting from sEDF (not any longer) in-tree documentation, "this scheduler provides weighted CPU sharing in an intuitive way and uses real-time algorithms to ensure time guarantees."

The real-time algorithm used was Earliest Deadline First (EDF), although it was modified for being used as a general purpose scheduler too. It could work in both work conserving and non-work conserving modes.

It was introduced in Xen 3.0, and was the default for a while. The scheduler was never properly adapted for dealing with SMP systems and multi vCPUs VMs. Both were working, but behavior and performance were unideal and unreliable. It was eventually removed from Xen 4.6.

Borrowed Virtual Time (BVT)

A virtual time based fair-share, general purpose, scheduler in use in Xen 2.0 and 3.0. Domains's shares of CPU time were determined by their weights. What it is traditionally called quantum, or timeslice, was known there as context switch allowance, and was configurable. It was SMP enabled, but lacked a non-work conserving mode.

Atropos

A soft real-time scheduler, capable of providing guarantees on the absolute shares of CPU time, and allowing using the slack on a best-effort basis. Of course (as it's always the case in RT schedulers) CPU slices were only really guaranteed in absence of CPU over-commitment.

It was in use in Xen 2.0.

Round Robin

It was... well... Round Robin! IT was there as a simple demonstration of Xen's internal scheduler API, not for real production use.

Also See