Difference between revisions of "COLO - Coarse Grain Lock Stepping"

From Xen
(Current Status)
(Test environment prepare)
Line 143: Line 143:
 
<pre>
 
<pre>
 
# cd ~/iptables
 
# cd ~/iptables
  +
# git checkout 8efcddaf
 
# ./autogen.sh && ./configure
 
# ./autogen.sh && ./configure
 
# make && make install
 
# make && make install

Revision as of 08:18, 15 November 2016

COLO or Coarse Grain Lock Stepping is an High Availability solution that builds on top of Remus. Remus was prepared for use with COLO in Xen 4.5. The COLO Manager component is part of Xen 4.7, while other components will eventually be part of QEMU.

Background

COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service) project is a high availability solution. Both primary VM (PVM) and secondary VM (SVM) run in parallel. They receive the same request from client, and generate response in parallel too. If the response packets from PVM and SVM are identical, they are released immediately. Otherwise, a VM checkpoint (on demand) is conducted. The idea is presented in Xen summit 2012, and 2013, and academia paper in SOCC 2013.

Components

  • COLO Manager:
COLO Checkpoint/Failover Controller
Modifications of save/restore flow to realize continuous migration, to make sure the state of VM in Secondary side
always be consistent with VM in Primary side.
When primary VM writes data into image, the colo disk manger captures this data
and send it to secondary VM’s which makes sure the context of secondary VM's image is consentient with
the ontext of primary VM 's image.
  • COLO Proxy:
We need an module to compare the packets returned by Primary VM and Secondary VM
and decide whether to start a checkpoint according to some rules. It is a linux kernel module
for host.

Current Status

COLO (based on xm) has already been in development for over three years. A paper has come out at 2013. Since XEN has deprecated xm and turn to xl, we are implementing COLO on xl now. The overall status of COLO:


Icon Info.png Note that the COLO Block and COLO Proxy components are implemented in Qemu. The necessary patches have been posted for review, but are not yet released in Qemu. If you want to follow the progress of these patches, check out
  • For COLO Block: [1] & [2]
  • For COLO Proxy: [3]


Requirements

Hardware requriements

There is at least one directly connected nic to forward the network requests from client to secondary vm. The directly connected nic must not be used by any other purpose. If your guest has more than one nic, you should have directly connected nic for each guest nic. If you don't have enouth directly connected nic, you can use vlan.

Dom0 requirements

  1. Kernel with dom0 support
  2. If your host os has OEM-released xen tools, please uninstall it
  3. kernel module
nf_conntrack
nf_conntrack_ipv4
nf_nat
libnl-tools >= 3.0.

Guest requirements

Only HVM guest(without pv extensions) is supported now. If you want to use OEM released guest os, please use SUSE. REDHAT and Ubuntu is not supported now because I don't find any way to disable pv extensions. If you want to use REDHAT or Ubuntu, you need to build the newest kernel which has the parameter xen_nopv.

Setup COLO environment

Network link topology

=================================normal ======================================
                                +--------+
                                |client  |
         master                 +----+---+                    slave
-------------------------+           |            + -------------------------+
   PVM                   |           +            |                          |
+-------+         +----[eth0]-----[switch]-----[eth0]---------+              |
|guest  |     +---+-+    |                        |       +---+-+            |
|     [tap0]--+ br0 |    |                        |       | br0 |            |
|       |     +-----+  [eth1]-----[forward]----[eth1]--+  +-----+     SVM    |
+-------+                |                        |    |            +-------+|
                         |                        |    |  +-----+   | guest ||
                       [eth2]---[checkpoint]---[eth2]  +--+br1  |-[tap0]    ||
                         |                        |       +-----+   |       ||
                         |                        |                 +-------+|
-------------------------+                        +--------------------------+
e.g.
master:
br0: 192.168.0.33
eth1: 192.168.1.33
eth2: 192.168.2.33

slave:
br0: 192.168.0.88
br1: no ip address
eth1: 192.168.1.88
eth2: 192.168.2.88
===========================after failover=====================================
                                +--------+
                                |client  |
    master (dead)               +----+---+                 slave (alive)
-------------------------+           |            ---------------------------+
  PVM                    |           +            |                          |
+-------+         +----[eth0]-----[switch]-----[eth0]-------+                |
|guest  |     +---+-+    |                        |     +---+-+              |
|     [tap0]--+ br0 |    |                        |     | br0 +--+           |
|       |     +-----+  [eth1]-----[forward]----[eth1]   +-----+  |     SVM   |
+-------+                |                        |              |  +-------+|
                         |                        |     +-----+  |  | guest ||
                       [eth2]---[checkpoint]---[eth2]   |br1  |  +[tap0]    ||
                         |                        |     +-----+     |       ||
                         |                        |                 +-------+|
-------------------------+                        +--------------------------+

Test environment prepare

On both Primary/Secondary hosts:

  • checkout necessary repos:
# cd ~
# git clone https://github.com/Pating/xen
# git clone https://github.com/Pating/qemu
# git clone https://github.com/Pating/iptables
# git clone https://github.com/Pating/colo-proxy
# git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  • Prepare host kernel
colo-proxy kernel module need cooperate with linux kernel.
# cd ~/colo-proxy/
# git checkout 405527cb
# cd ~/linux/
# git checkout v4.0
# patch -p1 < ~/colo-proxy/colo-patch-for-kernel.patch
Then compile kernel with Dom0 requirements configs and intall the new kernel, finally reboot.
  • Proxy module
proxy module is used for network packets compare.
# cd ~/colo-proxy/
# make
# make install
  • Modified iptables
We have added a new rule to iptables command.
# cd ~/iptables
# git checkout 8efcddaf
# ./autogen.sh && ./configure
# make && make install
  • Build and install xen (standard xen branch also works)
# cd ~/xen
# git checkout -b changlox/colo_v14_full
# ./autogen.sh; ./configure --enable-debug
# make dist-xen; make install-xen
# make dist-tools; make install-tools
  • Build qemu
# cd ~/qemu
# git checkout -b colo-xen-v2
# cd ~/xen/tools/qemu-xen-dir
# ./configure --enable-xen --target-list=x86_64-softmmu \
              --extra-cflags="-I~/xen/tools/include -I~/xen/tools/libxc -I~/xen/tools/xenstore" \
              --extra-ldflags="-L~/xen/tools/libxc -L~/xen/tools/xenstore"
# make $(grep -c processor /proc/cpuinfo)

Note: You must use qemu that we provide, qemu-xen and qemu-xen-traditional are not supported.


On Primary host:

  • guest config
Add "xen_platform_pci=0" and below disk/net config into the guest configfile.
disk = [ 'format=raw,devtype=disk,access=w,backendtype=qdisk,vdev=hda,colo,colo-host=192.168.2.88,colo-port=9001,colo-export=qdisk1,active-disk=/mnt/ramfs/active_disk.img,hidden-disk=/mnt/ramfs/hidden_disk.img,target=/home/changlox/suse-64hvm.img']
vif = [ 'mac=00:16:4f:00:00:11, bridge=br0, model=e1000, forwarddev=eth1' ]
  • Copy physical machine disk image from Primary to Secondary,and make sure their absolute path are the same

Note: colo-port is the secondary host's IP, colo-port is the secondary host's NBD server port, forwarddev is the directly connected nic.

Run COLO

On both Primary/Secondary hosts:

# service xencommons start

On Secondary host: excute the following script:

#! /bin/bash

modprobe xt_SECCOLO

active_disk=/mnt/ramfs/active_disk.img
hidden_disk=/mnt/ramfs/hidden_disk.img
local_img=~/suse-64hvm.img
tmp_disk_size=`./qemu-colo/qemu-img info $local_img |grep 'virtual size' |awk  '{print $3}'

function create_image()
{
    ~/qemu/qemu-img create -f qcow2 $1 $tmp_disk_size 
}

function prepare_temp_images()
{
    grep -q "^none /mnt/ramfs ramfs" /proc/mounts
    if [[ $? -ne 0 ]]; then
        mount -t ramfs none /mnt/ramfs/ -o size=2G
    fi

    if [[ ! -e $active_disk ]]; then
        create_image $active_disk      
    fi

    if [[ ! -e $hidden_disk ]]; then
        create_image $hidden_disk
    fi
}

prepare_temp_images

Note: It is recommended to put active disk and hidden disk in ramdisk.

On Primary host:

# modprobe nf_conntrack_ipv4
# modprobe xt_PMYCOLO sec_dev=eth1
# xl create -p <domconfig>
# xl remus -c -u <dom> 192.168.2.88

Known problems

  • Secondary vm may crash due to triple fault.

Note: this problem doesn't happen every time. So you can run colo again to avoid this problem.

Trouble shooting

  • If there's some error happend when staritng COLO, you can do:
  1. Make sure you have all necessary modules that DOM0 needed on both side.
  2. Make sure you have followed all the instructions in this README.
  3. Try to reboot both primary and secondary host.
  4. If you still have problems, collect the error logs and contact Wen Congyang(wency@cn.fujitsu.com) Xie Changlong(xiecl.fnst@cn.fujitsu.com) Yang Hongyang(imhy.yang@gmail.com) for help.

Example

If you use SLES11.3, you can get the detailed steps from the wiki: Setup COLO on SLES11 SP3

An example guest config:

builder='hvm'
memory='1024'
vcpus=2
cpus=['2','3']

name='hvm_nopv_colo'
device_model_version='qemu-xen'
device_model_override='/home/changlox/qemu/x86_64-softmmu/qemu-system-x86_64'

disk = [ 'format=raw,devtype=disk,access=w,backendtype=qdisk,vdev=hda,colo,colo-host=192.168.2.88,colo-port=9001,colo-export=qdisk1,active-disk=/mnt/ramfs/active_disk.img,hidden-disk=/mnt/ramfs/hidden_disk.img,target=/home/changlox/suse-64hvm.img']
vif = [ 'mac=00:16:4f:00:00:11, bridge=br0, model=e1000, forwarddev=eth1' ]

#-----------------------------------------------------------------------------
# boot on floppy (a), hard disk (c), Network (n) or CD-ROM (d) 
# default: hard disk, cd-rom, floppy

boot='c'
sdl=0
vnc=1
vnclisten=''
stdvga = 0 
serial='pty'
apic=1
apci=1
pae=1
extid=0
keymap='en-us'
localtime=1
hpet=1
usbdevice='tablet'
xen_platform_pci = 0 

Man Pages

Links

For more information see: