building HVM Xen guests

On my Xen systems I’ve run pretty much 99% of my Linux guests paravirtualized (PV). Mostly this was because I’m lazy. Setting up a PV guest is super simple. No need for partitions, boot loaders or any of that complicated stuff. Setting up a PV Linux guest is generally as simple as setting up a chroot. You don’t even need to install a kernel.

There’s been a lot of work over the past 5+ years to add stuff to processors and Xen to make the PV extensions to Linux unnecessary. After checking out a presentation by Stefano Stabilini a few weeks back I decided I’m long overdue for some HVM learning. Since performance of HVM guests is now better than PV for most cases it’s well worth the effort.

This post will serve as my documentation for setting up HVM Linux guests. My goal was to get an HVM Linux installed using typical Linux tools and methods like LVM and chroots. I explicitly was trying to avoid using RDP or anything that isn’t a command-line utility. I wasn’t completely successful at this but hopefully I’ll figure it out in the next few days and post an update.

Disks and Partitions

Like every good Linux user LVMs are my friend. I’d love a more flexible disk backend (something that could be sparsely populated) but blktap2 is pretty much unmaintained these days. I’ll stop before I fall down that rabbit hole but long story short, I’m using LVMs to back my guests.

There’s a million ways to partition a disk. Generally my VMs are single-purpose and simple so a simple partitioning scheme is all I need. I haven’t bothered with extended partitions as I only need 3. The layout I’m using is best described by the output of sfdisk:

# partition table of /dev/mapper/myvg-hvmdisk
unit: sectors

/dev/mapper/myvg-hvmdisk1 : start=     2048, size=  2097152, Id=83
/dev/mapper/myvg-hvmdisk2 : start=  2099200, size=  2097152, Id=82
/dev/mapper/myvg-hvmdisk3 : start=  4196352, size= 16775168, Id=83
/dev/mapper/myvg-hvmdisk4 : start=        0, size=        0, Id= 0

That’s 3 partitions, the first for /boot, the second for swap and the third for the rootfs. Pretty simple. Once the partition table is written to the LVM volume we need to get the kernel to read the new partition table to create devices for these partitions. This can be done with either the partprobe command or kpartx. I went with kpartx:

$ kpartx -a /dev/mapper/myvg-hvmdisk

After this you’ll have the necessary device nodes for all of your partitions. If you use kpartx as I have these device files will have a digit appended to them like the output of sfdisk above. If you use partprobe they’ll have the letter ‘p’ and a digit for the partition number. Other than that I don’t know that there’s a difference between the two methods.

Then get the kernel to refresh the links in /dev/disk/by-uuid (we’ll use these later):

$ udevadm trigger

Now we can set up the filesystems we need:

$ mkfs.ext2 /dev/mapper/myvg-hvmdisk1
$ mkswap /dev/mapper/myvg-hvmdisk2
$ mkfs.ext4 /dev/mapper/myvg-hvmdisk3

Install Linux

Installing Linux on these partitions is just like setting up any other chroot. First step is mounting everything. The following script fragment

# mount VM disks (partitions in new LV)
if [ ! -d /media/hdd0 ]; then mkdir /media/hdd0; fi
mount /dev/mapper/myvg-hvmdisk3 /media/hdd0
if [ ! -d /media/hdd0/boot ]; then mkdir /media/hdd0/boot; fi
mount /dev/mapper/myvg-hvmdisk1 /media/hdd0/boot

# bind dev/proc/sys/tmpfs file systems from the host
if [ ! -d /media/hdd0/proc ]; then mkdir /media/hdd0/proc; fi
mount --bind /proc /media/hdd0/proc
if [ ! -d /media/hdd0/sys ]; then mkdir /media/hdd0/sys; fi
mount --bind /sys /media/hdd0/sys
if [ ! -d /media/hdd0/dev ]; then mkdir /media/hdd0/dev; fi
mount --bind /dev /media/hdd0/dev
if [ ! -d /media/hdd0/run ]; then mkdir /media/hdd0/run; fi
mount --bind /run /media/hdd0/run
if [ ! -d /media/hdd0/run/lock ]; then mkdir /media/hdd0/run/lock; fi
mount --bind /run/lock /media/hdd0/run/lock
if [ ! -d /media/hdd0/dev/pts ]; then mkdir /media/hdd0/dev/pts; fi
mount --bind /dev/pts /media/hdd0/dev/pts

Now that all of the mounts are in place we can debootstrap an install into the chroot:

$ sudo debootstrap wheezy /media/hdd0/ http://http.debian.net/debian/

We can then chroot to the mountpoint for our new VMs rootfs and put on the finishing touches:

$ chroot /media/hdd0

Bootloader

Unlike a PV guest, you’ll need a bootloader to get your HVM up and running. A first step in getting the bootloader installed is figuring out which disk will be mounted and where. This requires setting up your fstab file.

At this point we start to run into some awkward differences between our chroot and what our guest VM will look like once it’s booted. Our chroot reflects the device layout of the host on which we’re building the VM. This means that the device names for these disks will be different once the VM boots. On our host they’re all under the LVM /dev/mapper/myvg-hvmdisk and once the VM boots they’ll be something like /dev/xvda.

The easiest way to deal with this is to set our fstab up using UUIDs. This would look something like this:

# / was on /dev/xvda3 during installation
UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /               ext4    errors=remount-ro 0       1
# /boot was on /dev/xvda1 during installation
UUID=yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy /boot           ext2    defaults        0       2
# swap was on /dev/xvda2 during installation
UUID=zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz none            swap    sw              0       0

By using UUIDs we can make our fstab accurate even in our chroot.

After this we need to set up the /etc/mtab file needed by lots of Linux utilities. I found that when installing Grub2 I needed this file in place and accurate.

Some data I’ve found on the web says to just copy or link the mtab file from the host into the chroot but this is wrong. If a utility consults this file to find the device file that’s mounted as the rootfs it will find the device holding the rootfs for the host, not the device that contains the rootfs for our chroot.

The way I made this file was to copy it off of the host where I’m building the guest VM and then modify it for the guest. Again I’m using UUIDs to identify the disks / partitions for the rootfs and /boot to keep from having data specific to the host platform leak into the guest. My final /etc/mtab looks like this:

rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,relatime,size=10240k,nr_inodes=253371,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=203892k,mode=755 0 0
/dev/disk/by-uuid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx / ext4 rw,relatime,errors=remount-ro,user_xattr,barrier=1,data=ordered 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
tmpfs /run/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=617480k 0 0
/dev/disk/by-uuid/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy /boot ext2 rw,relatime,errors=continue,user_xattr,acl 0 0

Finally we need to install both a kernel and the grub2 bootloader:

$ apt-get install linux-image-amd64 grub2

Installing Grub2 is a pain. All of the additional disks kicking around in my host confused the hell out of the grub installer scripts. I was given the option to install grub on a number of these disks and none were the one I wanted to install it on.

In the end I had to select the option to not install grub on any disk and fall back to installing it by hand:

$ grub-install --force --no-floppy --boot-directory=/boot /dev/disk/by-uuid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

And then generate the grub config file:

update-grub

If all goes well the grub boot loader should now be installed on your disk and you should have a grub config file in your chroot /boot directory.

Final Fixups

Finally you’ll need to log into the VM. If you’re confident it will boot without you having to do any debugging then you can just configure the ssh server to start up and throw a public key in the root homedir. If you’re like me something will go wrong and you’ll need some boot logs to help you debug. I like enabling the serial emulation provided by qemu for this purpose. It’ll also allow you to login over serial which is convenient.

This is pretty standard stuff. No paravirtual console through the xen console driver. The qemu emulated serial console will show up at ttyS0 like any physical serial hardware. You can enable serial interaction with grub by adding the following fragment to /etc/default/grub:

GRUB_TERMINAL_INPUT=serial
GRUB_TERMINAL_OUTPUT=serial
GRUB_SERIAL_COMMAND="serial --speed=38400 --unit=0 --word=8 --parity=no --stop=1"

To get your kernel to log to the serial console as well set the GRUB_CMDLINE_LINUX variable thusly:

GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,38400n8"

Finally to get init to start a getty with a login prompt on the console add the following to your /etc/inittab:

T0:23:respawn:/sbin/getty -L ttyS0 38400 vt100

Stefano Stabilini has done another good write-up on the details of using both the PV and the emulated serial console here: http://xenbits.xen.org/docs/4.2-testing/misc/console.txt. Give it a read for the gory details.

Once this is all done you need to exit the chroot, unmount all of those bind mounts and then unmount your boot and rootfs from the chroot directory. Once we have a VM config file created this VM should be bootable.

VM config

Then we need a configuration file for our VM. This is what my generic HVM template looks like. I’ve disabled all graphical stuff: sdl=0, stdvga=0, and vnc=0, enabled the emulated serial console: serial='pty' and set xen_platform_pci=1 so that my VM can use PV drivers.

The other stuff is standard for HVM guests and stuff like memory, name, and uuid that should be customized for your specific installation. Things like uuid and the mac address for your virtual NIC should be unique. There are websites out there that will generate these values. Xen has it’s own prefix for MAC addresses so use a generator to make a proper one.

builder = "hvm"
memory = "2048"
name = "myvm"
uuid = "uuuuuuuu-uuuu-uuuu-uuuu-uuuuuuuuuuuu"
vcpus = 1
cpus = '0-7'
pae=1
acpi=1
apic=1
boot='c'
xen_platform_pci=1
sdl=0
vnc=0
vnclisten='0.0.0.0'
stdvga=0
serial='pty'

disk = [
    '/dev/ssdraid1/wwwhome,raw,xvda,rw'
]
vif = [
    'mac=XX:XX:XX:XX:XX:XX,model=e1000',
]

Boot

Booting this VM is just like booting any PV guest:

xl create -c /etc/xen/vms/myvm.cfg

I’ve included the -c option to attach to the VMs serial console and ideally we’d be able to see grub and the kernel dump a bunch of data as the system boots.

TODO

I’ve tested these instructions twice now on a Debian Wheezy system with Xen 4.3.1 installed from source. Both times Grub installs successfully but fails to boot. After enabling VNC for the VM and connecting with a viewer it’s apparent that the VM hangs when SEABIOS tries to kick off grub.

As a work-around both times I’ve booted the VM from a Debian rescue ISO, setup a chroot much like in these instructions (the disk is now /dev/xvda though) and re-installed Grub. This does the trick and rebooting the VM from the disk now works. So I can only conclude that either something from my instructions w/r to installing Grub is wrong but I think that’s unlikely as they’re confirmed from numerous other “install grub in a chroot” instructions on the web.

The source of the problem is speculation at this point. Part of me wants to dump the first 2M of my disk both after installing it using these instructions and then again after fixing it with the rescue CD. Now that I think about it the version of Grub installed in my chroot is probably a different version than the one on the rescue CD so that could have something to do with it.

Really though, I’ll probably just install syslinux and see if that works first. My experiences with Grub have generally been bad any time I try to do something out of the ordinary. It’s incredibly complicated and generally I just want something simple like syslinux to kick off a very simple VM.

I’ll post an update once I’ve got to the bottom of this mystery. Stay tuned.