Difference between revisions of "Dm-thin for local storage"

From Xen
Line 1: Line 1:
 
The Storage Manager (SM) currently supports 2 kinds of local storage:
 
The Storage Manager (SM) currently supports 2 kinds of local storage:
  
1. .vhd files on an ext3 filesystem on an LVM LV on a local disk
+
# .vhd files on an ext3 filesystem on an LVM LV on a local disk
2. vhd-format data written directly to LVM LVs on a local disk
+
# vhd-format data written directly to LVM LVs on a local disk
  
 
We can also directly import and export .vhd-format data using HTTP PUT and GET operations, see [[Disk import/export]].
 
We can also directly import and export .vhd-format data using HTTP PUT and GET operations, see [[Disk import/export]].
  
 
In all cases the data path uses "blktap" (the kernel module) and "tapdisk" (the user-space process). This means that:
 
In all cases the data path uses "blktap" (the kernel module) and "tapdisk" (the user-space process). This means that:
1. constant maintenance is required because blktap is an out-of-tree kernel module
+
# constant maintenance is required because blktap is an out-of-tree kernel module
2. every I/O request incurs extra latency due to kernelspace/userspace transitions, a big problem on fast flash devices (PCIe)
+
# every I/O request incurs extra latency due to kernelspace/userspace transitions, a big problem on fast flash devices (PCIe)
3. we only support vhd, and not vmdk or qcow2 (and in future direct access to object stores?)
+
# we only support vhd, and not vmdk or qcow2 (and in future direct access to object stores?)
  
 
= Analysis =
 
= Analysis =
  
 
We currently use the vhd format and blktap/tapdisk implementation for 2 distinct purposes:
 
We currently use the vhd format and blktap/tapdisk implementation for 2 distinct purposes:
1. as a convenient, reasonably efficient, standard format for sharing images such as templates
+
# as a convenient, reasonably efficient, standard format for sharing images such as templates
2. as a means of implementing thin provisioning on the data path: where blocks are allocated on demand, and storage is over provisioned
+
# as a means of implementing thin provisioning on the data path: where blocks are allocated on demand, and storage is over provisioned
  
 
If instead of using vhd format and blktap/tapdisk everywhere we
 
If instead of using vhd format and blktap/tapdisk everywhere we
1. use a tool (e.g. qemu-img) which reads and writes vhd, qcow2, vmdk and which can be mounted as a block device on an unmodified kernel (e.g. via NBD)
+
# use a tool (e.g. qemu-img) which reads and writes vhd, qcow2, vmdk and which can be mounted as a block device on an unmodified kernel (e.g. via NBD)
2. use device-mapper modules to provide thin provisioning and low-latency access to the data
+
# use device-mapper modules to provide thin provisioning and low-latency access to the data
  
 
then we
 
then we
1. avoid the blktap kernel module maintenance
+
# avoid the blktap kernel module maintenance
2. reduce the common-case I/O request latency by keeping it all in-kernel
+
# reduce the common-case I/O request latency by keeping it all in-kernel
3. extend the number of formats we support, and make it easier to support direct object store access in future.
+
# extend the number of formats we support, and make it easier to support direct object store access in future.

Revision as of 13:53, 14 July 2014

The Storage Manager (SM) currently supports 2 kinds of local storage:

  1. .vhd files on an ext3 filesystem on an LVM LV on a local disk
  2. vhd-format data written directly to LVM LVs on a local disk

We can also directly import and export .vhd-format data using HTTP PUT and GET operations, see Disk import/export.

In all cases the data path uses "blktap" (the kernel module) and "tapdisk" (the user-space process). This means that:

  1. constant maintenance is required because blktap is an out-of-tree kernel module
  2. every I/O request incurs extra latency due to kernelspace/userspace transitions, a big problem on fast flash devices (PCIe)
  3. we only support vhd, and not vmdk or qcow2 (and in future direct access to object stores?)

Analysis

We currently use the vhd format and blktap/tapdisk implementation for 2 distinct purposes:

  1. as a convenient, reasonably efficient, standard format for sharing images such as templates
  2. as a means of implementing thin provisioning on the data path: where blocks are allocated on demand, and storage is over provisioned

If instead of using vhd format and blktap/tapdisk everywhere we

  1. use a tool (e.g. qemu-img) which reads and writes vhd, qcow2, vmdk and which can be mounted as a block device on an unmodified kernel (e.g. via NBD)
  2. use device-mapper modules to provide thin provisioning and low-latency access to the data

then we

  1. avoid the blktap kernel module maintenance
  2. reduce the common-case I/O request latency by keeping it all in-kernel
  3. extend the number of formats we support, and make it easier to support direct object store access in future.