https://wiki.xenproject.org/api.php?action=feedcontributions&user=Franciozzy&feedformat=atomXen - User contributions [en]2024-03-28T10:36:09ZUser contributionsMediaWiki 1.31.3https://wiki.xenproject.org/index.php?title=XAPI_HVM_Platform_Keys&diff=6441XAPI HVM Platform Keys2013-02-13T14:09:17Z<p>Franciozzy: Placeholder and very initial draft content.</p>
<hr />
<div>== HVM Platform Keys ==<br />
<br />
Xapi currently has certain defaults in terms of what hardware to emulate for HVM guests. Some of these are configurable through the VM record while others are not. They can be translated into command line options for qemu-dm. In other cases no option is added (falling back to qemu-dm defaults).<br />
<br />
This page lists which platform keys are currently considered and they state.<br />
<br />
{| class="wikitable"<br />
|-<br />
! width="10%" | Feature<br />
! width="10%" | Current Status<br />
! width="15%" | Desired Status<br />
! width="65%" | Observations<br />
|-<br />
| Parallel Ports<br />
| Ignored<br />
| Configurable (in "platform:parallel")<br />
| Xapi doesn't instruct qemu-dm at all, leaving it to the default (emulate LPT1). Using "-parallel none" causes qemu-dm not to emulate any parallel ports.<br />
|-<br />
| Serial Ports || Configurable (in "other-config:hvm_serial") || Configurable (in "platform:serial") || Xapi passes "-serial pty" by default. <br />
|-<br />
|}</div>Franciozzyhttps://wiki.xenproject.org/index.php?title=File:Xom_linux.jpg&diff=2769File:Xom linux.jpg2012-03-01T16:20:30Z<p>Franciozzy: This is a picture of my Macbook after booting a successful Linux Squeeze installation.</p>
<hr />
<div>== Summary ==<br />
This is a picture of my Macbook after booting a successful Linux Squeeze installation.<br />
== Licensing: ==<br />
{{PD}}</div>Franciozzyhttps://wiki.xenproject.org/index.php?title=File:Xom_part.jpg&diff=2765File:Xom part.jpg2012-02-29T22:18:48Z<p>Franciozzy: This is a picture of my MacBook during the installation of Debian Squeeze, suggesting a partition scheme for running Xen.</p>
<hr />
<div>== Summary ==<br />
This is a picture of my MacBook during the installation of Debian Squeeze, suggesting a partition scheme for running Xen.<br />
== Licensing: ==<br />
{{PD}}</div>Franciozzyhttps://wiki.xenproject.org/index.php?title=File:Xom_refit_cd.jpg&diff=2759File:Xom refit cd.jpg2012-02-29T15:51:36Z<p>Franciozzy: This is a picture of my MacBook after booting rEFIt and having a Debian Squeeze installation CD in the drive.</p>
<hr />
<div>== Summary ==<br />
This is a picture of my MacBook after booting rEFIt and having a Debian Squeeze installation CD in the drive.<br />
== Licensing: ==<br />
{{PD}}</div>Franciozzyhttps://wiki.xenproject.org/index.php?title=File:Xom_refit.jpg&diff=2758File:Xom refit.jpg2012-02-29T15:32:30Z<p>Franciozzy: This is a picture of my MacBook after installing rEFIt on the EFI partition and rebooting for the first time.</p>
<hr />
<div>== Summary ==<br />
This is a picture of my MacBook after installing rEFIt on the EFI partition and rebooting for the first time.<br />
== Licensing: ==<br />
{{PD}}</div>Franciozzyhttps://wiki.xenproject.org/index.php?title=File:Xom_systemprofiler.png&diff=2756File:Xom systemprofiler.png2012-02-29T14:50:48Z<p>Franciozzy: This is a screenshot of the System Profiler on OSX. Showing details of the hardware we used in the Xen On MacBook tutorial.</p>
<hr />
<div>== Summary ==<br />
This is a screenshot of the System Profiler on OSX. Showing details of the hardware we used in the Xen On MacBook tutorial.<br />
== Licensing: ==<br />
{{PD}}</div>Franciozzyhttps://wiki.xenproject.org/index.php?title=File:Xom_diskutil.png&diff=2755File:Xom diskutil.png2012-02-29T14:49:39Z<p>Franciozzy: This is a screenshot of the output of OSX's diskutil, showing the default partitioning scheme prior to Linux installation (as a preparation to run Xen).</p>
<hr />
<div>== Summary ==<br />
This is a screenshot of the output of OSX's diskutil, showing the default partitioning scheme prior to Linux installation (as a preparation to run Xen).<br />
== Licensing: ==<br />
{{PD}}</div>Franciozzyhttps://wiki.xenproject.org/index.php?title=File:Xom_macbook.jpg&diff=2754File:Xom macbook.jpg2012-02-29T14:47:35Z<p>Franciozzy: This is a photograph of my MacBook, to be used on the Xen On MacBook tutorial.</p>
<hr />
<div>== Summary ==<br />
This is a photograph of my MacBook, to be used on the Xen On MacBook tutorial.<br />
== Licensing: ==<br />
{{PD}}</div>Franciozzyhttps://wiki.xenproject.org/index.php?title=Network_Throughput_and_Performance_Guide&diff=2593Network Throughput and Performance Guide2012-02-10T12:15:20Z<p>Franciozzy: /* Open vSwitch */</p>
<hr />
<div><!-- MoinMoin name: Network_Throughput_Guide --><br />
<!-- Comment: Added installation steps for network throughput measurement tools, and linked to the general network configuration guide. --><br />
<!-- WikiMedia name: Network Throughput Guide --><br />
<!-- Page revision: 00000020 --><br />
<!-- Original date: Fri Oct 7 10:29:59 2011 (1317983399000000) --><br />
<br />
<!-- ! TOC here --><br />
<br />
== Introduction ==<br />
<br />
Setting up an efficient network in the world of virtual machines can be a<br />
daunting task. Hopefully, this guide will be of some help, and allow you to make<br />
good use of your network resources.<br />
<br />
The guide applies to XCP 1.0 and later, and to [[XenServer]] 5.6 FP1 and later. Much<br />
of it is applicable to earlier versions, too.<br />
<br />
For a general guide on [[XenServer]] network configurations, see<br />
[http://support.citrix.com/article/CTX129320 Designing XenServer Network Configurations].<br />
<br />
== Contributing ==<br />
<br />
If you would like to contribute to this guide, please submit your feedback to<br />
[mailto:rok.strnisa@citrix.com Rok Strniša], or get an account and edit the page yourself.<br />
<br />
If you would like to be notified about updates to this guide, please "Create account" and "Watch" to this page.<br />
<br />
== Scenarios ==<br />
<br />
There are many possible scenarios where network throughput can be relevant. The major ones that we have identified are:<br />
* '''dom0 throughput''' The traffic is sent/received directly by `dom0`.<br />
* '''single-VM throughput''' The traffic is sent/received by a single VM.<br />
* '''multi-VM throughput''' The traffic is sent/received by multiple VMs, concurrently. Here, we are interested in aggregate network throughput.<br />
* '''single-VCPU VM throughput''' The traffic is sent/received by a single-VCPU VMs.<br />
* '''single-VCPU single-TCP-thread VM throughput''' The traffic is sent/received by a single TCP thread in single-VCPU VMs.<br />
* '''multi-VCPU VM throughput''' The traffic is sent/received by a multi-VCPU VMs.<br />
* '''network throughput for storage''' The traffic sent/received originates from/is stored on a storage device.<br />
<br />
== Technical Overview ==<br />
<br />
Sending network traffic to and from a VM is a fairly complex process. The figure<br />
applies to PV guests, and to HVM guests with PV drivers.<br />
<br />
[[File:Network_Throughput_Guide.png]]<br />
<br />
Therefore, when a process in a VM, e.g. a VM with <tt>domID</tt> equal to <tt>X</tt>, wants to<br />
send a network packet, the following occurs:<br />
# A process in the VM generates a network packet '''P''', and sends it to a VM's virtual network interface (VIF), e.g. <tt>ethY_n</tt> for some network <tt>Y</tt> and some connection <tt>n</tt>.<br />
# The driver for that VIF, <tt>netfront</tt> driver, then shares the memory page (which contains the packet '''P''') with the backend domain by establishing a new grant entry. A grant reference is part of the request pushed onto the transmit shared ring (<tt>Tx Ring</tt>).<br />
# <tt>netfront</tt> then notifies, via an event channel (not on the diagram), one of <tt>netback</tt> threads in <tt>dom0</tt> (the one responsible for <tt>ethY_n</tt>) where in the shared pages the packet '''P''' is stored. ([[XenStore]] is used to setup the initial connection between the front-end and the back-end, deciding on what event channel to use, and where the shared rings are.)<br />
# <tt>netback</tt> (in <tt>dom0</tt>) fetches '''P''', processes it, and forwards it to <tt>vifX.Y_n</tt>;<br />
# The packet is then handed to the back-end network stack, where it is treated according to its configuration just like any other packet arriving on a network device.<br />
<br />
When a VM is to receive a packet, the process is almost the reverse of the<br />
above. The key difference is that on receive there is a copy being made: it<br />
happens in <tt>dom0</tt>, and is a copy from back-end owned memory into a <tt>Tx Buf</tt>, which the guest has granted to the back-end domain. The grant references to these buffers are in the request on the <tt>Rx Ring</tt> (not <tt>Tx Ring</tt>).<br />
<br />
== Symptoms, probable causes, and advice ==<br />
<br />
There are many potential bottlenecks. Here is a list of symptoms (and associated<br />
probable causes and advice):<br />
<br />
* I/O is extremely slow on my Hardware Virtualised Machine (HVM), e.g. a Windows VM.<br />
** '''Verifying the symptom''': Compare the results of an I/O speed test on the problem VM and a healthy VM; they should be at least an order of magnitude different.<br />
** '''Probable cause''': The HVM does not have PV drivers installed.<br />
** '''Background''': With PV drivers, an HVM can make direct use of some of the underlying hardware, leading to better performance.<br />
** '''Recommendation''': Install PV drivers.<br />
<br />
* VM's VCPU is fully utilised.<br />
** '''Verifying the symptom''': Run <tt>xentop</tt> in <tt>dom0</tt> --- this should give a fairly good estimate of aggregate usage for all VCPUs of a VM; pressing '''V''' reveals how many seconds were spent in which VM's VCPU. Running VCPU measurement tools inside the VM ''does not'' give reliable results; they can only be used to find rough relative usage between applications in a VM.<br />
** '''Background''': When a VM sends or receives network traffic, it needs to do some basic packet processing.<br />
** '''Probable cause''': There is too much traffic for that VCPU to handle.<br />
*** '''Recommendation 1''': Try enabling NIC offloading --- see Tweaks (below) on how to do this.<br />
*** '''Recommendation 2''': Try running the application that does the sending/receiving of network traffic with multiple threads. This will give the OS a chance to distribute the workload over all available VCPUs.<br />
<br />
* HVM VM's first (and possibly only) VCPU is fully utilised.<br />
** '''Verifying the symptom''': ''Same as above.''<br />
** '''Background''': Currently, only VM's first VCPU can process the handling of interrupt requests.<br />
** '''Probable cause''': The VM is receiving too many packets for its current setup.<br />
*** '''Recommendation 1''': If the VM has multiple VCPUs, try to associate application processing with non-first VCPUs.<br />
*** '''Recommendation 2''': Use more (1 VCPU) VMs to handle receive traffic, and a workload balancer in front of them.<br />
*** '''Recommendation 3''': If the VM has multiple VCPUs and there's no definite need for it to have multiple VCPUs, create multiple 1-VCPU VMs instead (see '''Recommendation 2''').<br />
** '''Plans for improvement:''' Underlying architecture needs to be improved so that VM's non-first VCPUs can process interrupt requests.<br />
<br />
* In <tt>dom0</tt>, a high percentage of a single VCPU is spent processing system interrupts.<br />
** '''Verifying the symptom''': Run <tt>top</tt> in <tt>dom0</tt>, then press <tt>z</tt> (for colours) and <tt>1</tt> (to show VCPU breakdown). Check if there is a high value for <tt>si</tt> for a single VCPU.<br />
** '''Background''': When packets are sent to a VM on a host, its <tt>dom0</tt> needs to process interrupt requests associated with the interrupt queues that correspond to the device the packets arrived on.<br />
** '''Probable cause''': <tt>dom0</tt> is set up to process all interrupt requests for a specific device on a specific <tt>dom0</tt> VCPU.<br />
*** '''Recommendation 1''': Check in <tt>/proc/interrupts</tt> whether your device exposes multiple interrupt queues. If the device supports this feature, make sure that it is enabled.<br />
*** '''Recommendation 2''': If the device supports multiple interrupt queues, distribute the processing of them either automatically (by using <tt>irqbalance</tt> daemon), or manually (by setting <tt>/proc/irq/<irq-no>/smp_affinity</tt>) to all (or a subset of) <tt>dom0</tt> VCPUs.<br />
*** '''Recommendation 3''': Otherwise, make sure that an otherwise relatively-idle <tt>dom0</tt> VCPU is set to process the interrupt queue (by manually setting the appropriate <tt>/proc/irq/<irq-no>/smp_affinity</tt>).<br />
<br />
* In <tt>dom0</tt>, a VCPU is fully occupied with a <tt>netback</tt> process.<br />
** '''Verifying the symptom''': Run <tt>top</tt> in <tt>dom0</tt>. Check if there is a <tt>netback</tt> process, which appears to be taking almost 100%. Then, run <tt>xentop</tt> in <tt>dom0</tt>, and check VCPU usage for <tt>dom0</tt>: if it reads about 120% +/- 20% when there is no other significant process in <tt>dom0</tt>, then there's a high chance that you have confirmed the symptom.<br />
** '''Background''': When packets are sent from or to a VM on a host, the packets are processed by a <tt>netback</tt> process, which is <tt>dom0</tt>'s side of VM network driver (VM's side is called <tt>netfront</tt>).<br />
** '''General Recommendation''': Try enabling NIC offloading --- see Tweaks (below) on how to do this.<br />
** '''Possible cause 1''': VMs' VIFs are not correctly distributed over the available <tt>netback</tt> threads.<br />
*** '''Recommendation''': Read the [http://support.citrix.com/article/CTX127970 related KB article].<br />
** '''Possible cause 2''': Too much traffic is being sent of a single VIF.<br />
*** '''Recommendation''': Create another VIF for the corresponding VM, and setup the application(s) within the VM to send/receive traffic over both VIFs. Since each VIF should be associated with a different <tt>netback</tt> process (each of which is linked to a different <tt>dom0</tt> VCPU), this should remove the associated <tt>dom0</tt> bottleneck. If every <tt>dom0</tt> <tt>netback</tt> thread is taking 100% of a <tt>dom0</tt> VCPU, increase the number of <tt>dom0</tt> VCPUs and <tt>netback</tt> threads first --- see Tweaks (below) on how to do this.<br />
<br />
* There is a VCPU bottleneck either in a <tt>dom0</tt> or in a VM, and I have control over both the sending and the receiving side of the network connection.<br />
** '''Verifying the symptom''': (See notes about <tt>xentop</tt> and <tt>top</tt> above.)<br />
** '''Background''': (Roughly) Each packet generates an interrupt request, and each interrupt request requires some VCPU capacity.<br />
** '''Recommendation''': Enable Jumbo Frames (see Tweaks (below) for more information) for the whole connection. This should decrease the number of interrupts, and therefore decrease the load on the associated VCPUs (for a specific amount of network traffic).<br />
<br />
* There is obviously no VCPU bottleneck either in a <tt>dom0</tt> or in a VM --- why is the framework not making use of the spare capacity?<br />
** '''Verifying the symptom''': (See notes about <tt>xentop</tt> and <tt>top</tt> above.)<br />
** '''Background''': There are ''many'' factors involved when doing network performance, and many more when using virtual machines.<br />
** '''Possible cause 1''': Part of the connection has reached its physical throughput limit.<br />
*** '''Recommendation 1''': Verify that all network components in the connection path physically support the desired network throughput.<br />
*** '''Recommendation 2''': If a physical limit has been reached for the connection, add another network path, setup appropriate PIFs and VIFs, and configure the application(s) to use both/all paths.<br />
** '''Possible cause 2''': Some parts of the software associated with network processing might not be completely parallelisable, or the hardware cannot make use of its parallelisation capabilities if the software doesn't follow certain patterns of behaviour.<br />
*** '''Recommendation 1''': Setup the application used for sending or receiving network traffic to use multiple threads. Experiment with the number of threads.<br />
*** '''Recommendation 2''': Experiment with the TCP parameters, e.g. window size and message size --- see Tweaks (below) for recommended values.<br />
*** '''Recommendation 3''': If IOMMU is enabled on your system, try disabling it. See Tweaks for a section on how to disable IOMMU.<br />
*** '''Recommendation 4''': Try switching the network backend. See the Tweaks section on how do that.<br />
<br />
== Making throughput measurements ==<br />
<br />
When making throughput measurements, it is a good idea to start with a simple<br />
environment. For example, if testing VM-level receive throughput, try sending<br />
traffic from a bare-metal (Linux) host to VM(s) on another (XCP/XenServer) host,<br />
and vice-versa when testing VM-level transmit throughput. Transmitting traffic<br />
is less demanding on the resources, and is therefore expected to produce<br />
substantially better results.<br />
<br />
The following sub-sections provide more information about how to use some of the<br />
more common network performance tools.<br />
<br />
=== Iperf 2.0.5 ===<br />
<br />
==== Installation ====<br />
<br />
===== Linux =====<br />
Make sure the following packages are installed on your system: <tt>gcc</tt>, <tt>g++</tt>,<br />
<tt>make</tt>, and <tt>subversion</tt>.<br />
<br />
Iperf can be installed from Iperf's SVN repository:<br />
<br />
<pre><br />
svn co https://iperf.svn.sourceforge.net/svnroot/iperf iperf<br />
cd iperf/trunk<br />
./configure<br />
make<br />
make install<br />
cd<br />
iperf --version # should mentioned pthreads<br />
</pre><br />
<br />
You might also be able to install it via a package manager, e.g.:<br />
<br />
<pre><br />
apt-get install iperf<br />
</pre><br />
<br />
When using the <tt>yum</tt> package manager, you can install it via [http://wiki.centos.org/AdditionalResources/Repositories/RPMForge RPMForge].<br />
<br />
===== Windows =====<br />
<br />
You can use the following executable: [http://downloads.xen.org/Wiki/Network_Throughput_Guide/iperf.exe iperf.exe]<br />
<br />
Note that we are ''not'' the authors of the above executable. Please use your<br />
anti-virus software to scan the file before using it.<br />
<br />
==== Usage ====<br />
<br />
We recommend the following usage of <tt>iperf</tt>:<br />
* make sure that firewall is disabled/allows <tt>iperf</tt> traffic;<br />
* set <tt>iperf</tt> what units to report the results in, e.g. by using <tt>-f m</tt> --- if not set explicitly, <tt>iperf</tt> will change units based on the result;<br />
* an <tt>iperf</tt> test should last at least 20 seconds, e.g. <tt>-t 20</tt>;<br />
* experiment with multiple communication threads, e.g. <tt>-P 4</tt>;<br />
* repeat a test in a specific context at least 5 times, calculating an average, and making notes of any anomalies;<br />
* experiment with TCP window size and buffer size settings --- using <tt>-w 256K -l 256K</tt> for both the receiver and the sender worked well for us;<br />
* use a shell/batch script to start multiple <tt>iperf</tt> processes simultaneously (if required), and possibly to automate the whole testing process.<br />
* when running <tt>iperf</tt> on a Windows VM:<br />
** run it in non-daemon mode on the receiver, since daemon mode tends to (it's still unclear as to when exactly) create a service. Having an <tt>iperf</tt> service is undesirable, since one cannot as easily control which VCPU it executes on, and with what priority. Also, you cannot have multiple receivers with a service running (in case you wanted to experiment with them);<br />
* run <tt>iperf</tt> with "realtime" priority, and on a non-first VCPU (if you are executing on a multi-VCPU VM) for reasons explained in the section above.<br />
<br />
Here are the simplest commands to execute on the receiver, and then the sender:<br />
<br />
<pre><br />
# on receiver<br />
iperf -s -f m -w 256K -l 256K<br />
<br />
# on sender<br />
iperf -c <receiver-IP> -f m -w 256K -l 256K -t 20<br />
</pre><br />
<br />
To measure aggregate receive throughput of multiple VMs where the data is sent<br />
from a single source (e.g., a different physical machine), use:<br />
<br />
<pre><br />
#!/bin/bash<br />
<br />
VMS=$1<br />
THREADS=$2<br />
TIME=$3<br />
TMP=`mktemp`<br />
<br />
for i in `seq $VMS`; do<br />
VM_IP="192.168.1.$i" # use your IP scheme here<br />
echo "Starting iperf for $VM_IP ..."<br />
iperf -c $VM_IP -w 256K -l 256K -t $TIME -f m -P $THREADS | grep -o "[0-9]\+ Mbits/sec" | awk -vn=$i '{print n, $1}' >> $TMP &<br />
done<br />
<br />
sleep $((TIME + 3))<br />
cat $TMP | sort<br />
cat $TMP | awk '{sum+=$2}END{print "Average: ", sum}'<br />
rm -rf $TMP<br />
</pre><br />
<br />
=== Netperf 2.5.0 ===<br />
<br />
<tt>netperf</tt>'s <tt>TCP_STREAM</tt> test also tends to give reliable results. However, since<br />
this version (the only version we recommend using) does not automatically<br />
parallelise over the available VCPUs, such parallelisation needs to be done<br />
manually in order to make better use of the available VCPU capacity.<br />
<br />
==== Installation ====<br />
<br />
===== Linux =====<br />
Make sure the following packages are installed on your system: <tt>gcc</tt>, <tt>g++</tt>,<br />
<tt>make</tt>, and <tt>wget</tt>.<br />
<br />
Then run the following commands:<br />
<br />
<pre><br />
wget ftp://ftp.netperf.org/netperf/netperf-2.5.0.tar.gz<br />
tar xzf netperf-2.5.0.tar.gz<br />
cd netperf-2.5.0<br />
./configure<br />
make<br />
make check<br />
make install<br />
</pre><br />
<br />
The receiver side can then be stated manually with <tt>netserver</tt>, or you can<br />
configure it as a service:<br />
<br />
<pre><br />
# these commands may differ depending on your OS<br />
echo "netperf 12865/tcp" >> /etc/services<br />
echo "netperf stream tcp nowait root /usr/local/bin/netserver netserver" >> /etc/inetd.conf<br />
/etc/init.d/openbsd-inetd restart<br />
</pre><br />
<br />
===== Windows =====<br />
<br />
You can use the following executables: <br />
* [http://downloads.xen.org/Wiki/Network_Throughput_Guide/netserver.exe netserver.exe]<br />
* [http://downloads.xen.org/Wiki/Network_Throughput_Guide/netclient.exe netclient.exe]<br />
<br />
Note that we are ''not'' the authors of the above executables. Please use your<br />
anti-virus software to scan the files before using them.<br />
<br />
==== Usage ====<br />
<br />
Here, we describe the usage of the Linux version of Netperf. The syntax for the<br />
Windows version is sometimes different; please see <tt>netclient.exe -h</tt> for more<br />
information.<br />
<br />
With <tt>netperf</tt> installed on both sides, the following script can be used on<br />
either side to determine network throughput for transmitting traffic:<br />
<br />
<pre><br />
#!/bin/bash<br />
<br />
THREADS=$1<br />
TIME=$2<br />
DST=$3<br />
TMP=`mktemp`<br />
<br />
for i in `seq $THREADS`; do<br />
netperf -H $DST -t TCP_STREAM -P 0 -c -l $TIME >> $TMP &<br />
done<br />
<br />
sleep $((TIME + 3))<br />
cat $TMP | awk '{sum+=$5}END{print sum}'<br />
rm $TMP<br />
</pre><br />
<br />
=== NTttcp (Windows only) ===<br />
<br />
The program can be installed by running this installer: [http://downloads.xen.org/Wiki/Network_Throughput_Guide/NTttcp.msi NTttcp.msi]<br />
<br />
Note that we are ''not'' the authors of the above installer. Please use your<br />
anti-virus software to scan the file before using it.<br />
<br />
After completing the installation, go to the installation directory, and make<br />
two copies of <tt>ntttcp.exe</tt>:<br />
* <tt>ntttcpr.exe</tt> --- use for receiving traffic<br />
* <tt>ntttcps.exe</tt> --- use for sending traffic<br />
<br />
For usage guidelines, please refer to the guide in the installation directory.<br />
<br />
== Diagnostic tools ==<br />
<br />
There are many diagnostic tools one can use:<br />
* Performance tab in VM's Task Manager;<br />
* Performance tab for the VM in [[XenCenter]];<br />
* Performance tab for the VM's host in [[XenCenter]];<br />
* <tt>top</tt> (with '''z''' and '''1''' pressed) in VM's host's <tt>dom0</tt>; and,<br />
* <tt>xentop</tt> in VM's host's <tt>dom0</tt>.<br />
<br />
It is sometimes also worth observing <tt>/proc/interrupts</tt> in <tt>dom0</tt>, as well as<br />
<tt>/proc/irq/<irqno>/smp_affinity</tt>.<br />
<br />
== Recommended configurations ==<br />
<br />
When reading this section, please see the Tweaks below it for reference.<br />
<br />
=== CPU bottleneck ===<br />
<br />
All network throughput tests were, in the end, bottlenecked by VCPU<br />
capacity. This means that machines with better physical CPUs are expected to<br />
achieve higher network throughputs for both <tt>dom0</tt> and VM tests.<br />
<br />
=== Number of VM pairs and threads ===<br />
<br />
If one is interested in achieving a high aggregate network throughput of VMs on<br />
a host, it is crucial to consider both ''the number of VM pairs'' and ''the<br />
number of network transmitting/receiving threads in each VM''. Ideal values for<br />
these numbers vary from OS to OS due to different networking stack<br />
implementations, so some experimentation is recommended --- finding a good<br />
balance can have a drastic effect on network performance (mainly due to better<br />
VCPU utilisation). Our research shows that 8 pairs with 2 <tt>iperf</tt> threads per<br />
pair works well for Debian-based Linux, while 4 pairs with 8 <tt>iperf</tt> threads per<br />
pair works well for Windows 7.<br />
<br />
=== Allocation of NICs over <tt>netback</tt> threads ===<br />
<br />
All results above assume equal distribution of used NICs over available<br />
<tt>netback</tt> threads, which may not always be possible --- see<br />
[http://support.citrix.com/article/CTX127970 a KB article] for more<br />
information. For VM network throughput, it is important to get as close as<br />
possible to equal distribution in order to make efficient use of the available<br />
VCPUs.<br />
<br />
=== Using irqbalance ===<br />
<br />
The <tt>irqbalance</tt> daemon is enabled by default. It has been observed that this<br />
daemon can improve VM network performance by about 16% --- note that this is<br />
much less than the potential gain of the getting the other points described in<br />
this section right. The reason why <tt>irqbalance</tt> can help is that it distributes<br />
the processing of <tt>dom0</tt>-level interrupts across all available <tt>dom0</tt> VCPUs, not<br />
just the first one.<br />
<br />
=== Optimising Windows VMs (and other HVM guests) ===<br />
<br />
It appears that Xen currently feeds all interrupts for a guest to the guest's<br />
first VCPU, i.e. <tt>VCPU0</tt>. Initial observations show that more CPU cycles are<br />
spent processing the interrupt requests than actually processing the received<br />
data (assuming there is no disk I/O, which is slow). This means that, on a<br />
Windows VM with 2 VCPUs, all processing of the received data should be done on<br />
the second VCPU, i.e. <tt>VCPU1</tt>: ''Task Manager > Processes > Select Process > Set<br />
CPU affinity > 1'' --- in this case, <tt>VCPU0</tt> will be fully used, whereas <tt>VCPU1</tt><br />
will probably have some spare cycles. While this is acceptable, it is more<br />
efficient to use 2 guests (1 VCPU each), which makes full use of both<br />
VCPUs. Therefore, to avoid this bottleneck altogether, one should probably use<br />
"<tt><number of host CPUs> - 4</tt>" VMs, each with 1 VCPU, and combine their<br />
capabilities with a [[NetScaler]] Appliance.<br />
<br />
=== Offloading some network processing to NICs ===<br />
<br />
Network offloading is not officially supported, since there are known issues<br />
with some drivers. That said, if your NIC supports offloading, try to use it,<br />
especially ''Generic Receive Offload'' (GRO). However, please verify carefully<br />
that it works for your NIC+driver before using it in a production environment.<br />
<br />
If performing mainly <tt>dom0</tt>-to-<tt>dom0</tt> network traffic, turning on GRO setting<br />
for the NICs involved can be highly beneficial when combined with the<br />
<tt>irqbalance</tt> daemon (see above). This configuration can easily be combined with<br />
Open vSwitch (the default option), since the performance is either equal or<br />
faster than with a Linux Bridge. Turning on the ''Large Receive Offload'' (LRO)<br />
setting tends to, in general, decrease <tt>dom0</tt> network throughput.<br />
<br />
Our initial test results indicate that turning on either of the two offload<br />
settings (GRO or LRO) in dom0 can give mixed results based on the context. Feel<br />
free to experiment and let us know your findings.<br />
<br />
=== Jumbo frames ===<br />
<br />
Note that jumbo frames for the connection from A to B only work when every part<br />
of the connection supports (and has enabled) MTU 9000. See the Tweaks section<br />
below for information on how to enable this in some contexts.<br />
<br />
We have observed network performance gains for VM-to-VM traffic (where VMs are<br />
on different hosts). Where the VMs were Linux PV guests, we were able to enable<br />
GRO offloading in hosts' dom0, which provided a further speedup.<br />
<br />
=== Open vSwitch ===<br />
<br />
In the various tests that we performed, we observed no statistically significant<br />
difference in network performance for dom0-to-dom0 traffic. We observed from 3%<br />
(Linux PV guests, no <tt>irqbalance</tt>) to about 10% (Windows HVM guests, with<br />
<tt>irqbalance</tt>) worse performance for VM-to-VM traffic.<br />
<br />
=== TCP settings ===<br />
<br />
Our experiments show that tweaking TCP settings inside the <tt>VM</tt>(s) can lead to<br />
substantial network performance improvements. The main reason for this is that<br />
most systems are still by default configured to work well on 100Mb/s or 1Gb/s,<br />
not 10Gb/s, NICs. The Tweaks section below contains a section about the<br />
recommended TCP settings for a VM.<br />
<br />
=== Using SR-IOV ===<br />
<br />
SR-IOV is currently a double-edged sword. This section explains what SR-IOV is,<br />
what are its down sides, and what its benefits.<br />
<br />
Single Root I/O Virtualisation (SR-IOV) is a PCI device virtualisation<br />
technology that allows a single PCI device to appear as multiple PCI devices on<br />
the physical PCI bus. The actual physical device is known as a Physical Function<br />
(PF) while the others are known as Virtual Functions (VF). The purpose of this<br />
is for the hypervisor to directly assign one or more of these VFs to a Virtual<br />
Machine (VM) using SR-IOV technology: the guest can then use the VF as any other<br />
directly assigned PCI device. Assigning one or more VFs to a VM allows the VM to<br />
directly exploit the hardware. When configured, each VM behaves as though it is<br />
using the NIC directly, reducing processing overhead and improving performance.<br />
<br />
SR-IOV can be used only with architectures that support IOMMU and NICs that<br />
support SR-IOV; there could be further compatibility constraints by the<br />
architecture or the NIC. Please contact support or ask on forums about<br />
recommended/officially supported configurations.<br />
<br />
If your VM has an SR-IOV VF, functions that require VM mobility,<br />
for example, Live Migration, Workload Balancing, Rolling Pool Upgrade, High<br />
Availability and Disaster Recovery, are not possible. This is because the VM is<br />
directly tied to the physical SR-IOV enabled NIC VF. In addition, VM network<br />
traffic sent via an SR-IOV VF bypasses the vSwitch, so it is not possible to<br />
create ACLs or view QoS.<br />
<br />
Our experiments show that a single-VCPU VM using SR-IOV on a modern system can<br />
(together with the usual NIC offloading features enabled) saturate (or nearly<br />
saturate) a 10Gbps connection when ''receiving traffic''. Furthermore, the<br />
impact on <tt>dom0</tt> is negligible.<br />
<br />
The Tweaks section below contains a section about how to enable SR-IOV.<br />
<br />
== Tweaks ==<br />
<br />
=== Automatic IRQ Balancing in Dom0 ===<br />
<br />
<tt>irqbalance</tt> is enabled by default.<br />
<br />
If IRQ balancing service is already installed, you can enable it by running:<br />
<br />
<pre><br />
service irqbalance start<br />
</pre><br />
<br />
Otherwise, you need to install it first with:<br />
<br />
<pre><br />
yum --disablerepo=citrix --enablerepo=base,updates install -y irqbalance<br />
</pre><br />
<br />
=== Manual IRQ Balancing in Dom0 ===<br />
<br />
While <tt>irqbalance</tt> does the job in most situations, manual IRQ balancing can<br />
prove better in some situations. If we have a <tt>dom0</tt> with 4 VCPUs, the following<br />
script disables <tt>irqbalance</tt>, and evenly distributes specific interrupt queues<br />
(1272--1279) among the available VCPUs:<br />
<br />
<pre><br />
service irqbalance stop<br />
for i in `seq 0 7`; do<br />
queue=$((1272 + i));<br />
aff=$((1 << i % 4));<br />
printf "%x" $aff > /proc/irq/$queue/smp_affinity;<br />
done<br />
</pre><br />
<br />
To find out how many <tt>dom0</tt> VCPUs a host has, use <tt>cat /proc/cpuinfo</tt>. To find<br />
out what interrupt queues correspond to which interface, use `cat<br />
/proc/interrupts`.<br />
<br />
=== Changing the Number of Dom0 VCPUs ===<br />
<br />
To check the current number of <tt>dom0</tt> VCPUs, run <tt>cat /proc/cpuinfo</tt>.<br />
<br />
The desired number of <tt>dom0</tt> VCPUs can be set in <tt>/etc/sysconfig/unplug-vcpus</tt>.<br />
<br />
For this to take effect, you can either restart the host, or (only in the case<br />
where the number of VCPUs in <tt>dom0</tt> is decreasing) run:<br />
<br />
<pre><br />
/etc/init.d/unplug-vcpus start<br />
</pre><br />
<br />
=== Changing the Number of Netback Threads in Dom0 ===<br />
<br />
By default, the number of netback threads in <tt>dom0</tt> equals<br />
<tt>min(4,<number_of_vcpus_in_dom0>)</tt>. Therefore, increasing the number of <tt>dom0</tt><br />
VCPUs above 4, will by default not increase the number of netback threads.<br />
<br />
To increase the threshold number of netback threads to 12, write<br />
<tt>xen-netback.netback_max_groups=12</tt> into <tt>/boot/extlinux.conf</tt> under section<br />
labelled <tt>xe-serial</tt> just after the assignment <tt>console=hvc0</tt>.<br />
<br />
=== Enabling NIC Offloading ===<br />
<br />
Please see the "Offloading some network processing to NICs" section above.<br />
<br />
You can use <tt>ethtool</tt> to enable/disable NIC offloading.<br />
<br />
<pre><br />
ETH=eth6 # the conn. for which you want to enable offloading<br />
ethtool -k $ETH # check what is currently enabled/disabled<br />
ethtool -K $ETH gro on # enable GRO<br />
</pre><br />
<br />
Note that changing offload settings directly via <tt>ethool</tt> will not persist the<br />
configuration through host reboots; to do that, use <tt>other-config</tt> of the <tt>xe</tt><br />
command.<br />
<br />
<pre><br />
xe pif-param-set uuid=<pif_uuid> other-config:ethtool-gro=on<br />
</pre><br />
<br />
=== Enabling Jumbo Frames ===<br />
<br />
Suppose <tt>eth6</tt> and <tt>xenbr6</tt> are the device and the bridge corresponding to the<br />
10 GiB/sec connection used.<br />
<br />
Shut down user domains:<br />
<br />
<pre><br />
VMs=$(xe vm-list is-control-domain=false params=uuid --minimal | sed 's/,/ /g')<br />
for uuid in $VMs; do xe vm-shutdown uuid=$uuid; done<br />
</pre><br />
<br />
Set network MTU to 9000, and re-plug relevant PIFs:<br />
<br />
<pre><br />
net_uuid=`xe network-list bridge=xenbr6 params=uuid --minimal`<br />
xe network-param-set uuid=$net_uuid MTU=9000<br />
PIFs=$(xe pif-list network-uuid=$net_uuid --minimal | sed 's/,/ /g')<br />
for uuid in $PIFs; do xe pif-unplug uuid=$uuid; xe pif-plug uuid=$uuid; done<br />
</pre><br />
<br />
Start user domains (you might want to make sure that VMs are started one after<br />
another to avoid potential VIF static allocation problems):<br />
<br />
<pre><br />
VMs=$(xe vm-list is-control-domain=false params=uuid --minimal | sed 's/,/ /g')<br />
for uuid in $VMs; do xe vm-start uuid=$uuid; done<br />
</pre><br />
<br />
Set up the connections you will use inside the user domains to use MTU 9000. For<br />
Linux VMs, this is done with:<br />
<br />
<pre><br />
ETH=eth1 # the user domain connection you are concerned with<br />
ifconfig $ETH mtu 9000 up<br />
</pre><br />
<br />
Verifying:<br />
<br />
<pre><br />
xe vif-list network-uuid=$net_uuid params=MTU --minimal<br />
</pre><br />
<br />
=== Linux TCP parameter settings ===<br />
<br />
==== Default in Dom0 ====<br />
<br />
<pre><br />
ETH=eth6 # the connection you are concerned with<br />
sysctl -w net.core.rmem_max=131071<br />
sysctl -w net.core.wmem_max=131071<br />
sysctl -w net.ipv4.tcp_rmem="4096 87380 3080192"<br />
sysctl -w net.ipv4.tcp_wmem="4096 16384 3080192"<br />
sysctl -w net.core.netdev_max_backlog=1000<br />
sysctl -w net.ipv4.tcp_congestion_control=reno<br />
ifconfig $ETH txqueuelen 1000<br />
ethtool -K $ETH gro off<br />
sysctl -w net.ipv4.tcp_timestamps=1<br />
sysctl -w net.ipv4.tcp_sack=1<br />
sysctl -w net.ipv4.tcp_fin_timeout=60<br />
</pre><br />
<br />
==== Default for a Demo Etch Linux VM ====<br />
<br />
<pre><br />
ETH=eth1 # the connection you are concerned with<br />
sysctl -w net.core.rmem_max=109568<br />
sysctl -w net.core.wmem_max=109568<br />
sysctl -w net.ipv4.tcp_rmem="4096 87380 262144"<br />
sysctl -w net.ipv4.tcp_wmem="4096 16384 262144"<br />
sysctl -w net.core.netdev_max_backlog=1000<br />
sysctl -w net.ipv4.tcp_congestion_control=bic<br />
ifconfig $ETH txqueuelen 1000<br />
ethtool -K $ETH gso off<br />
sysctl -w net.ipv4.tcp_timestamps=1<br />
sysctl -w net.ipv4.tcp_sack=1<br />
sysctl -w net.ipv4.tcp_fin_timeout=60<br />
</pre><br />
<br />
==== Recommended TCP settings for Dom0 ====<br />
<br />
Changing these settings in only relevant if you want to optimise network<br />
connections for which one of the end-points is <tt>dom0</tt> (not a user domain). Using<br />
settings recommended for a user domain (VM) will work well for <tt>dom0</tt> as well.<br />
<br />
==== Recommended TCP settings for a VM ====<br />
<br />
<pre><br />
Bandwidth Delay Product (BDP) = Route Trip Time (RTT) * Theoretical Bandwidth Limit<br />
</pre><br />
<br />
For example, if RTT = 100ms = .1s, and theoretical bandwidth is 10Gbit/s, then:<br />
<br />
<pre><br />
BDP = (.1s) * (10 * 10^9 bit/s) = 10^9 bit = 1 Gbit ~= 2^30 bit = 134217728 B<br />
</pre><br />
<br />
<pre><br />
ETH=eth6<br />
# ESSENTIAL (large benefit)<br />
sysctl -w net.core.rmem_max=134217728 # BDP<br />
sysctl -w net.core.wmem_max=134217728 # BDP<br />
sysctl -w net.ipv4.tcp_rmem="4096 87380 134217728" # _ _ BDP<br />
sysctl -w net.ipv4.tcp_wmem="4096 65536 134217728" # _ _ BDP<br />
sysctl -w net.core.netdev_max_backlog=300000<br />
modprobe tcp_cubic<br />
sysctl -w net.ipv4.tcp_congestion_control=cubic<br />
ifconfig $ETH txqueuelen 300000<br />
# OPTIONAL (small benefit)<br />
ethtool -K $ETH gso on<br />
sysctl -w net.ipv4.tcp_sack=0 # for reliable networks only<br />
sysctl -w net.ipv4.tcp_fin_timeout=15 # claim resources sooner<br />
sysctl -w net.ipv4.tcp_timestamps=0 # does not work with GRO on in dom0<br />
</pre><br />
<br />
Checking existing settings:<br />
<br />
<pre><br />
ETH=eth6<br />
sysctl net.core.rmem_max<br />
sysctl net.core.wmem_max<br />
sysctl net.ipv4.tcp_rmem<br />
sysctl net.ipv4.tcp_wmem<br />
sysctl net.core.netdev_max_backlog<br />
sysctl net.ipv4.tcp_congestion_control<br />
ifconfig $ETH | grep -o "txqueuelen:[0-9]\+"<br />
ethtool -k $ETH 2> /dev/null | grep "generic.segmentation.offload"<br />
sysctl net.ipv4.tcp_timestamps<br />
sysctl net.ipv4.tcp_sack<br />
sysctl net.ipv4.tcp_fin_timeout<br />
</pre><br />
<br />
=== Pinning a VM to specific CPUs ===<br />
<br />
While this does not necessarily improve performance (it can easily make<br />
performance worse, in fact), it is useful when debugging CPU usage of a VM. To<br />
assign a VM to CPUs 3 and 4, run the following in <tt>dom0</tt>:<br />
<br />
<pre><br />
xe vm-param-set uuid=<vm-uuid> VCPUs-params:mask=3,4<br />
</pre><br />
<br />
=== Switching between Linux Bridge and Open VSwitch ===<br />
<br />
To see what network backend you are currently using, run in <tt>dom0</tt>:<br />
<br />
<pre><br />
cat /etc/xensource/network.conf<br />
</pre><br />
<br />
To switch to using the Linux Bridge network backend, run in <tt>dom0</tt>:<br />
<br />
<pre><br />
xe-switch-network-backend bridge<br />
</pre><br />
<br />
To switch to using [[Open vSwitch]] network backend, run in <tt>dom0</tt>:<br />
<br />
<pre><br />
xe-switch-network-backend openvswitch<br />
</pre><br />
<br />
=== Enabling/disabling IOMMU ===<br />
<br />
This is, in fact, not a tweak, but a requirement when using SR-IOV (see below).<br />
<br />
Some versions of Xen have IOMMU enabled by default. If disabled, you can enable<br />
it by editing <tt>/boot/extlinux.conf</tt>, and adding <tt>iommu=1</tt> to Xen parameters<br />
(i.e. just before the first <tt>---</tt> of your active configuration). If enabled by<br />
default, you can disable it by using <tt>iommu=0</tt>, instead.<br />
<br />
=== Enabling SR-IOV ===<br />
<br />
Make sure that IOMMU is enabled in the version of Xen that you are running ---<br />
see section above.<br />
<br />
In <tt>dom0</tt>, use <tt>lspci</tt> to display a list of Virtual Functions (VFs). For<br />
example,<br />
<br />
<pre><br />
07:10.0 Ethernet controller: Intel Corporation 82559 Ethernet Controller Virtual Function (rev 01)<br />
</pre><br />
<br />
In the example above, <tt>07:10.0</tt> is the <tt>bus:device.function</tt> address of the VF.<br />
<br />
Assign a free (non-assigned) VF to the target VM by running:<br />
<br />
<pre><br />
xe vm-param-set other-config:pci=0/0000:<bus:device.function> uuid=<vm-uuid><br />
</pre><br />
<br />
(Re-)Start the VM, and install the appropriate VF driver (inside your VM) for<br />
your specific NIC.<br />
<br />
You can assign multiple VFs to a single VM; however, the same VF cannot be<br />
shared across multiple VMs.<br />
<br />
== Acknowledgements ==<br />
<br />
While this guide was mostly written by Rok Strniša, it could not have been<br />
nearly as good without the help and advice from many of his colleagues,<br />
including (in alphabetic order) Alex Zeffertt, Dave Scott, George Dunlap,<br />
Ian Campbell, James Bulpin, Jonathan Davies, Lawrence Simpson, Marcus Granado,<br />
Mike Bursell, Paul Durrant, Rob Hoes, Sally Neale, and Simon Rowe.<br />
<br />
[[Category:XCP]]<br />
[[Category:Tutorial]]<br />
[[Category:Users]]<br />
[[Category:Developers]]<br />
[[Category:Performance]]</div>Franciozzyhttps://wiki.xenproject.org/index.php?title=User:Franciozzy&diff=2380User:Franciozzy2012-01-12T14:17:11Z<p>Franciozzy: Created page with "Please visit my personal website for information about myself. [http://www.paradoxo.org www.paradoxo.org]"</p>
<hr />
<div>Please visit my personal website for information about myself.<br />
<br />
[http://www.paradoxo.org www.paradoxo.org]</div>Franciozzy