From Xen
Jump to: navigation, search

QEMU Process Space Mirroring

Memory Layout of Our Design Architecture


Figure 1: Idea for process mirroring

Figure 1 shows a memory layout in our idea. Our architecture occupies the highest virtual addresses. QEMU process has its own related mirrored area. The native and the mirrored virtual area are mapped to different physical memory to ensure that data is replicated physically. This figure is a simple layout, the actual memory layout will be more complicated, e.g., a multi-threaded program has more stack areas, and the dynamic link area is also not shown in this figure.

Design of Process Space Mirror

In our previous work, we sucessfully partition memory layout as: reliable zone, normal zone and DMA zone. Thus, based on this work, we can set original memory in the normal zone, and the mirror memory in the DMA zone. The key idea is: when memory management module (MMU) allocates memory space for a process, it creates the mirror space for the same process. We hope to guarantee the write synchronization using binary translation technique.

Initialization Workflow

According to Figure 1, the process of creation mirror space:

  • Step 1: When QEMU process is initialized, a block of physical memory space is reserved as a mirror area in the reliable zone. The size of this area is the same as the memory size to the QEMU memory space (this can be implemented with malloc(), kmalloc(), mmap() etc.).
  • Step 2: We modify the MMU to intercepts the page table-related operations of a process. When a native page table is created, the related mirror page table is also created.
  • Step 3: If occuring process write in the native space, mirror write instruction is then replicated in the mirror space by binary translation, and redundant data is written by the mirror instruction.

Recovery Workflow

When memory failure occurs, error detection mechanism (e.g., ECC, healthlog and stresslog) notifies the host by invoking a machine check exception (MCE). We can modify MCE mechnaism to refuse to restart the whole system, the system quickly and effectively retrieves the corrupted data using the following steps:

  • A new page is allocated to recover the data in the normal zone.
  • Remap the new page: Corrupted PTE is rewritten and mapped to the new page that was just allocated.
  • Data is copied from mva (mirror virtual address) to nva (native virtual address). After error recovery, the program continues to execute.

Mirror Code Generator with Code Translation

Overall proc.jpg

Figure 2: Code generator to generate mirror code

To achieve Step 3 in the initialized workflow, we hope to design a code generator to generate all mirror code corresponding to the native QEMU code on the fly. For example, at a time, a native code movq $4, 144(%rdi) is generated from the compiler from the QEMU process memory, we hope to use code generator to generate movq $4, offset+144(%rdi) in the mirror memory at the same time.

Therefore, code generator must guarantee data synchronization between native and mirror memory while QEMU process is running. As shown in Figure 2, when an instruction changes the data in native memory, code generator must capture it. Then, a mirror code must be implemented to update the corresponding data in mirror memory. Basically, we can modify the GCC procedure to perform static code translation, analyzing the assembly source files before the assembler (GAS) handles them. All work starts by identifying memory write instructions from the native instructions according to various addressing modes. The QEMU native memory addresses can be retrieved from the native instructions. Code generator then calculates the mirror memory addresses to determine where to replicate the data. Mirror instructions are finally implemented into source files that are processed by the assembler as executable files.

Mirror VM for QEMU process

Still think about it ......


Still think about it ......


Still think about it ......

Checkpoint Setting

I'm still reading paper for checkpoint setting. Hope to find some basic idea about checkpoint.