Compressing VM Exports

From Xen
Jump to: navigation, search

Note: this has been implemented and is documented in the XenAPI reference

Compressing VM exports

Background

There are two kinds of VM exports in XCP:

  1. metadata only
  2. full VM: metadata plus disk blocks

An export is a tar format archive where the metadata is always the first file and has name "ova.xml". The metadata is stored in XML format and includes a bunch of existing version numbers:

key
product_version
build_number
xapi_major
xapi_minor
export_vsn

The CLI commands involved are:

 xe vm-export vm=... filename=...
 xe vm-import filename=...

Exports and imports can also be processed by HTTP GET and PUT.

Problem statement

The "full VM" exports can be very large since they contain raw disk blocks. Their size makes them difficult to store and distribute over the network.

Proposal

The disk blocks in a "full VM" export often contain very little user data and compress very well with gzip. I propose to:

  1. add a --compress option to the CLI which will cause VM exports to be compressed via gzip
  2. on import, auto-detect whether a VM requires decompression and decompress it first.

Auto-detection algorithm

Approximately:

  1. Read the first 512 bytes (a tar header length)
  2. Check if the tar header is well-formed and refers to an ova.xml file
    1. f so: consider as uncompressed
    2. If not: open a pipe to zcat, in a background thread write the first 512 bytes and then the rest of the file. Read from the output of zcat and look for a tar header

Version number discussion

The export format has two layers:

  1. the outermost ("envelope"?) layer which currently contains: tar with an xml file containing version numbers
  2. the innermost ("payload"?) layer which currently contains: the rest of the VM metadata and optional disk blocks

The changes anticipated in the future all concern the "innermost" layer and hence are covered by the existing version numbers. For example I anticipate

  1. adding new VM metadata
  2. changing the disk block format e.g. to vhd

Adding another version number to cover the outermost layer seems like overkill because:

  1. it is simple enough to tell the difference between tar and tar.gz automatically
  2. no future changes to this layer are anticipated
  3. if we change the outermost layer then we'll simply extend the auto-detect code.

Implications

Old (uncompressed) exports will continue to import on new servers, thanks to the auto-detection.

New compressed exports will not import on old hosts. They will fail with a generic "IMPORT_ERROR" exception (probably containing the text 'Failure "int_of_string"'). A simple workaround is to uncompress the file first and then import it.