Author Archive

The Appliance Builder That Isn’t

November 10th, 2009 @ 2:19 am UTC

I’ve said it before – there are a lot of appliance builders out there. With virtualization and the cloud being the hot ticket items of the day, everybody wants to try their hand at writing the software to provision those VMs.

Unfortunately, they all seem to suck. At least, the Debian/Ubuntu ones do. I haven’t found a VM or appliance builder application that I like, mostly because they all seem to be bad knock-offs of the actual debian-installer or ubuntu-installer.

The appliance builder I want has four key features:

  1. It should run unattended.

    This one is kind of obvious, but rules out options like just running the debian-installer by hand and answering the questions as they come up. I do a lot of repetitive installs, and it’s important that I can hand my appliance builder a pre-crafted config file and get a customized, but totally unattended install.

  2. It should run trivially in a virtual environment, and seamlessly supports multiple hypervisors.

    All of the appliance builders that anybody uses, or at least the ones I’ve attempted to use (VMBuilder and xen-create-image) run in the hypervisor. This is anywhere from an inconvenience to an actual security threat.

    I want to be able to offer users a high degree of customizability, but my users are generally untrusted, and you simply can’t allow any flexibility when the appliance installer runs as root on your hypervisor. You certainly can’t allow your users to install packages out of their own apt repositories, including PPAs – a targeted attacker can easily break out of the chroot they’re put into when their package installs, and any package can include code that runs as root. Even if you don’t allow your users to customize appliances, the principal of least privilege says you shouldn’t be running the installs as root when you can run them as not-root, and you pretty clearly can.

    Therefore, being able to run the appliance builder in a VM is an absolute must, regardless of the performance hit. We were able to adapt xen-create-image to do this for Invirt, but it wasn’t pretty, it took a lot of shoehorning, and it’s still pretty fragile.

    Not only do I want to be able to install my appliances in a guest, but I also want to be able to run that guest under various virtualization environments. Many of my deployments are still heavily dependent on Xen. I have other deployments using KVM. Ideally, I’d like my appliance builder to work fairly transparently with multiple virtualization environments, although it’s probably OK for me if the resulting appliance image only works with the particular hypervisor that created it.

  3. It should use the distributions installer mechanism instead of jerry-rigging its own.

    All of the appliance building applications I know of use their own installation code. For Debian/Ubuntu installers, this means running debootstrap and then frobbing the output. Even kiwi, the software behind the very shiny SUSE Studio effectively starts by unpacking a list of RPMs by hand.

    There’s a lot of complexity in the Debian/Ubuntu installers. When you try to duplicate it, you will get it wrong. The resulting system will not be equivalent to the same system installed using a CD. I’ve certainly seen cases before where an installer-built image was different than an appliance-builder-built image, and it’s incredibly frustrating. Maybe this is something that could be fixed by actively developing the appliance builder (Ubuntu’s VMBuilder seems to be getting help from the ubuntu-installer developers), but it inherently seems like a waste of time to have this kind of code duplication.

  4. It should have a layer of abstraction that keeps me from repeating myself.

    Simply booting the debian-installer or ubuntu-installer with a preseed file would certainly address the first three points. However, the preseed file needed simply to get an unattended Ubuntu install with no other bells and whistles is more than 20 lines long. Even if I have a template I can copy around, it’s gross from a DRY perspective.

    I want my appliance builder to be configured through a config format that abstracts that away. I only want to specify that which can’t be reasonably guessed, not everything that I might want to have a say about.

All of the virtualization projects I’m involved in right now – Invirt, Virtigo, and some smaller personal projects – could really benefit from this kind of infrastructure piece, which means I’m likely to attempt to write it if it doesn’t exist. And as far as I know, this kind of appliance building application doesn’t exist for Debian and Ubuntu, at the very least. I’ll admit that I know almost nothing about other Linux distributions. Do any of them get this more right?

Installing GRUB onto a Disk Image

August 4th, 2009 @ 6:22 pm UTC

As part of my summer internship, I needed to write an installer for VMs. For various reasons, I wasn’t able to use the multitude of VM installers already out there, but one thing I noticed is that most of them don’t actually install a bootloader. They create a /boot/grub/menu.lst, but never run grub-install.

Turns out this is because it’s hard to do. grub-install is very complicated and seems to be pretty explicitly designed for the case of running in an installer environment, where all of the disks and block devices are laid out the same way as they will be the next time you boot. When you’re installing in a host into a loop mount or something, that’s definitely not the case.

In trying to make this work, I discovered a few core issues:

  • grub-install assumes that the block device you’re installing onto “looks like” the sort of device you’d normally install GRUB onto (i.e. is named like a hard disk or floppy – hda, sda, fd0, etc.)
  • grub-install uses df to determine the block device a given file or directory’s filesystem is on. That works really poorly when you’re already chrooting into your loop mount.

If you read my wording carefully, you might see where I’m going with this. In order to get grub-install to work, I needed to convince it it’s installing onto a hard drive, and I needed to run it outside of the loop mount.

The former is obviously a bit more challenging, and to accomplish that, I used the device-mapper to create a node named something like /dev/mapper/hda.

I’ve only tested this on an Ubuntu Jaunty host so far, so I can’t guarantee that it works on Debian or even other Ubuntu versions, but I think it should. I’d love to hear if you have good or bad experiences on other Linux versions.

Here’s roughly how it works (you’ve probably performed some of these steps already in the process of running an installer):

  1. Loop mount your partitioned disk image:
    mathias:~ evan$ sudo losetup --show --find disk.img
    /dev/loop0
  2. To setup the device map, you’ll need the major and minor numbers of the loop device, and the size (in bytes) of the disk. The latter is easiest to get from the disk image file, instead of from the loop device (emphasis mine):
    mathias:~ evan$ ls -l /dev/loop0
    brw-rw---- 1 root disk 7, 0 2009-07-18 11:27 /dev/loop0
    mathias:~ evan$ ls -l disk.img
    -rw-r--r-- 1 evan evan 10737418240 2009-08-04 15:28 disk.img
  3. Create a device-mapper node. Any name of the form hd[a-z], sd[a-z], or vd[a-z] will work. Others might as well. The size of the disk should be converted to 512-byte sectors, and the device numbers for the loop device should be in the form major:minor. This will create a new device node in /dev/mapper:
    mathias:~ evan$ echo '0 20971520 linear 7:0 0' | sudo dmsetup create hda
    mathias:~ evan$ ls -l /dev/mapper/hda
    brw-rw---- 1 root disk 252, 4 2009-08-04 15:36 /dev/mapper/hda
    
  4. Use kpartx to create device-mapper nodes for the partitions on the disk image:
    mathias:~ evan$ sudo kpartx -a /dev/mapper/hda
    mathias:~ evan$ ls -l /dev/mapper/hda*
    brw-rw---- 1 root disk 252, 4 2009-08-04 15:36 /dev/mapper/hda
    brw-rw---- 1 root disk 252, 5 2009-08-04 15:38 /dev/mapper/hda1
    brw-rw---- 1 root disk 252, 6 2009-08-04 15:38 /dev/mapper/hda2
  5. Mount the root partition onto a tempdir (note: this is not a loop mount, because the kernel already thinks this is a real block device):
    mathias:~ evan$ mktemp -d
    /tmp/tmp.MPUXeJWqpn
    mathias:~ evan$ sudo mount /dev/mapper/hda1 /tmp/tmp.MPUXeJWqpn
  6. Create a fake device.map for grub-install to use (yeah, this is a bad use of tee, but I’m trying to be clear about what I’m doing):
    mathias:~ evan$ echo '(hd0) /dev/mapper/hda' | sudo tee /tmp/tmp.MPUXeJWqpn/boot/grub/device.map
    (hd0) /dev/mapper/hda
  7. And now, for the grand finale, actually install GRUB from outside the chroot:
    mathias:~ evan$ sudo grub-install --root-directory=/tmp/tmp.MPUXeJWqpn /dev/mapper/hda
    grub-probe: error: no mapping exists for `hda1'
    [: 494: =: unexpected operator
    Installing GRUB to /dev/mapper/hda as (hd0)...
    Installation finished. No error reported.
    This is the contents of the device map /tmp/tmp.MPUXeJWqpn/boot/grub/device.map.
    Check if this is correct or not. If any of the lines is incorrect,
    fix it and re-run the script `grub-install'.
    
    (hd0) /dev/mapper/hda

    (You don’t need to worry about those two errors at the beginning of the output – it’s some logic specialized for XFS filesystems)

  8. Cleanup the mess you made:
    mathias:~ evan$ sudo umount /tmp/tmp.MPUXeJWqpn
    mathias:~ evan$ sudo rm -rf /tmp/tmp.MPUXeJWqpn
    mathias:~ evan$ sudo kpartx -d /dev/mapper/hda
    mathias:~ evan$ sudo dmsetup remove hda
    mathias:~ evan$ sudo losetup -d /dev/loop0
  9. Finally, examine your disk image, and see that it definitely has GRUB installed:
    mathias:~ evan$ file disk.img
    disk.img: x86 boot sector; GRand Unified Bootloader, stage1 version 0x3, 1st sector stage2 0x884009; partition 1: ID=0x83, active, starthead 0, startsector 1, 18876374 sectors; partition 2: ID=0x82, starthead 254, startsector 18876375, 2088450 sectors

And there you have it! You will, of course, still need to write out GRUB’s menu.lst through some other means (such as Debian/Ubuntu’s update-grub).

Another Summer in California

July 16th, 2009 @ 8:39 pm UTC

I haven’t updated in a while. That seems to be the norm. I may stop commenting on it every time I don’t update for 6 months. It turns out that daily updates on my life aren’t usually that interesting, so it really seems better to have a lot of news to share at once.

But now that I’ve just finished my mid-term evaluations for my summer job, it seems like a great time to say what I’m doing this year!

This summer, I’m working at Google, in Corporate Engineering, which is totally less glamorous than Engineering proper, but because most of Corp Eng uses more traditional technologies, it’s also more directly applicable to the world outside of Google.

I can thank Tim Abbott for referring me for this job. Since I also blame him for dragging me into SIPB in the first place (which subsequently absorbed my life), I figure that puts us at about even.

Working for Google really is a lot more fun than I expected it to be. I think a lot of that is that (and this is the great irony, really), in spite of working for the biggest web company in the world, I’m not doing web development. The environment is very casual, my supervisor is cool, the food is great (I totally underestimated how much the food contributes to making things pleasant). Before coming here, I don’t think I would have considered trying to get a full-time job here, but I’m definitely giving it serious consideration now.

As for what I’m doing, I can actually talk about that. I’m spending the summer writing a new type of test framework. Because of how traditional software testing works, it can be incredibly difficult to test software when it wasn’t written from the ground up to be testable. This is a real problem for a lot of common open-source projects. And even if the code is kind of testable, it frequently is so thoroughly stubbed and mocked and neutered that it’s hard to draw any conclusions about the functionality of the code from whether it passes the tests. The moral is that there is just no substitute for actually running code in situ.

So that’s what I’m trying to do – make it easier to run code in situ. The goal of my summer project is to let developers build the entire environment under test in a series of virtual machines. These machines have the OS installed on demand, and are network isolated from the host and the outside world. At the end of the test, the resulting state of the VMs is destroyed, so tests can be arbitrarily destructive. They’re repeatable and safe, even if the test itself isn’t trusted. Most importantly, they let you come pretty arbitrarily close to running your code live.

The project is open source (GPL2), and currently being hosted on code.google.com. I think this is way better than working on something internal to Google. I can point people at what I’ve actually done, I can try to get it picked up outside of here (as well as within), and I can keep contributing once I’ve left (which I plan to do, because I’m convinced that it’s worth spending time on).

The one catch is the name. For various reasons, we’re grouping my project under Debmarshal, which was originally designed for managing repositories of Debian packages. But both my supervisor and I want to switch this to another name…we just don’t have one to switch to. So this is where you, my two readers, come in – what’s a good name for a virtualization-based testing framework? It needs to be something relatively unused. I tried the common trick for virtualization projects: taking a word with “vert” in it and replacing that with “virt”, but all the reasonable names seem to be taken (divirt, convirt, revirt, Invirt, and even IntroVirt). So…any suggestions? Come up with something we use and I’ll buy you lunch the next time I see you, or something.

So that’s work. I’m living right in the same area as last summer (within a few mile radius of Google), although more in the Palo Alto area than Mountain View. I’m living with friends from school, which always trumps random roommates off of Craigslist. We have a giant house, with a pool. It’s a pretty good setup.

And…looks like it’s dinner time here, so I’m out.

No Need for Virtualization

May 1st, 2009 @ 7:57 pm UTC

Today Duncan Keefe, Senior Manager in Apple’s Information Systems and Technology, presented on campus about how Apple’s IT department functions.

Somewhat understandably, there was a lot of sales pitch for Mac OS X Server in there incorporated in the talk, not to mention a lot of how Apple’s IT department is as awesome as the rest of Apple (which certainly seemed to be true based on the numbers we saw today). But some of the discussion on how to effectively communicate with your userbase would have been interesting for anyone who works in support, and there were a couple of interesting technical tidbits in there as well, and one in particular that still has me excited.

For example, did you know that Apple’s IT infrastructure is 71% based on open-source solutions? While I know as well as anyone that a lot of pieces of OS X itself are open source, they’re making use of a lot of enterprise-grade systems like SAP, which I thought would offset that number more.

Or another interesting fact: after migrating large parts of their infrastructure to Mac OS X Server and Xserves instead of Solaris or AIX or other systems, the sysadmin to server ration for OS X servers is 1:276. To compare that to the organization I know, SIPB has about 30 or 40 servers in its machine room, and there are about 20 people who have access to the machine room, not counting people without physical access who maintain servers on XVM – or people who just don’t have physical access. Now granted, Apple’s number is for maintaining a network with completely homogenous hardware and operating systems, and our tiny farm of servers probably runs more services per server than theirs, but the idea of a single person being able to run the entire SIPB machine room is…stunning.

But the truly interesting thing that Duncan mentioned was in response to someone’s question about virtualization. He responded that Apple currently doesn’t use virtualization for their IT infrastructure. Instead, they developed an in-house app that allows them to dynamically shuffle services around their servers based on the resources those servers need. Apparently their average server utilization is 60%.

And that is the dream of the cloud – by providing an environment large enough to contain your entire enterprise, you can smooth out what would otherwise be debilitating spikes in individual services. And in particular, this is the perfect answer to server virtualization.

If you look at your average, non-virtualized, single-purpose server, it’s probably at about 10% resource utilization, which makes it hard to justify buying a new server for each individual application. Virtualization is often touted as the solution to this problem – you run a bunch of single-purpose virtual machines on a single physical host. You can take advantages of features like guest migration to balance load dynamically. But if a service doesn’t need to exist in a completely independent instance of the operating system, you’re probably losing on the operating system overhead, in terms of disk space and RAM and probably processor usage as well. I’m willing to guess that the cost could be as much as 5% or 10%, which matters when you have hundreds of systems.

By dynamically shuffling applications without the extra overhead of full OS virtualization, you can take take advantages of the economies of scale without that overhead. Which is just awesome. And the 60% average utilization? Also amazing. It’s just about the perfect number: high enough that your servers aren’t twiddling their thumbs, but low enough that any one server should be able to handle a sudden spike.

I’ve been kind of excited about this idea all day, although I can’t really think of a scenario that I could apply the concept to. MIT’s infrastructure is too heterogenous in terms of both hardware and operating system to benefit, and all of the servers I maintain for SIPB are too specialized, or too heavily used already to benefit, or are run services that are un-migratable – or some combination thereof.

But it’s fun to think about what a system like that would take to implement – you’d have to be sure to never assume that a service lived on a fixed IP address. How often do you re-balance services? Unfortunately, I was a little too busy dragging my jaw across the floor to actually ask any interesting questions while Duncan was there.

Anyway, that was my exciting tech story for today.

Wireless Bootloading

January 28th, 2009 @ 3:14 pm UTC

I've been meaning to come up with a way to flash microcontrollers over the air for a while – almost even had a use for it for a project recently. This is an awesome trick, and it looks like they did a good job of making it robust, too.

Spam, Eggs, Bacon, Sausage, and Spam

January 14th, 2009 @ 12:01 pm UTC

Last year I taught SIPB’s IAP Introduction to Python class. It was a really good experience, so I decided to do it again this year, and last night was the first class. But with Mystery Hunt coming up and other things having my focus, I didn’t really start prepping for the class until the day of.

Apparently Python is trendy these days! I was in a room that seats 65 or so people, and there were lots of people sitting in the aisles.

In spite of being somewhat intimidated by the large audience, and struggling to hit my stride during the beginning of the talk, I think I did a much better job this year once I actually started covering linguistic constructs. In spite of not spending a lot of time preparing this year, I had good notes for the first lecture from last year. But more importantly, I felt like I knew Python better, which really made the difference. I thought that my discussion was more focused, more concise, and more clear.

The room was an impressive mix of skill levels – several people who had never programmed and a few people who had clearly been doing it for years. Both groups asked very good questions as well.

Hopefully I’ll get to spend more time preparing for the next class, because my notes from last year get worse as I get further into the class.

Culinary Accomplishments

January 12th, 2009 @ 1:42 am UTC

Two accomplishments for today:

First, I made tortilla soup for dinner, using Mom’s recipe. It turned out really well, although it was very, very filling.

Second, for the first time, I actually made something that could be said to resemble latte art. Ever since I got a cappuccino machine at the apartment, I’ve been trying to pour latte art, and I finally pulled it off tonight. It looked very similar to this random image I found on the internet, although the leaves were curling around the edge of the cup, and there weren’t quite as many of them.

Unfortunately, I didn’t take a picture. It was on the first cup of coffee I made tonight, and I was convinced that I could do it again, only better – a few more leaves, a little more bunched together. Of course, the other two cups were nowhere close. Maybe next time?

Hmm…it’s cold out. Also snow…

January 10th, 2009 @ 10:05 pm UTC

Hmm…it’s cold out. Also snowing. Maybe I should head home soon

Sourcing Information

January 9th, 2009 @ 6:44 pm UTC

I’ve been having a fun time with the idea of pulling information from other sources into my website, and I think this morning I was able to put together a toolkit that makes it fairly easy going forward.

Just about all of the information sources I might want to pull from provide RSS feeds, so I started looking at Yahoo Pipes as a way to filter, adjust, and otherwise correct feeds before they’re published here. It turns out that Pipes is far more powerful than I expected, and it’s fun to use, too. It does have the same kind of feel as piping data through a bunch of Unix commands, but it also has a bit of a functional programming feel to it too (although I tend to regard anything that has map and reduce as functional-like).

In any case, my first creation is a pipe to process an RSS feed from github. I use github as a hosting service for open source or otherwise public source code that I’m working on, assuming that I’m the one that gets to make that decision.

I decided that including all events from github would be a bit overwhelming, so instead I’m only showing events where I push local changes that I have. I then re-write the title of the post to be a little shorter. You can see the pipe at http://pipes.yahoo.com/ebroder/githubpush. I’ve generalized it a bit so that you can substitute any username.

Finally, to pull in the result of the output from Yahoo Pipes, I use WP-o-Matic to pull the RSS entries into here.

I’m looking forward to poking at some other sites with Yahoo Pipes and seeing how much I can collate into this site.

Oh – a few other details. First, I’m categorizing all entries based on their source. So far, that means “posts”, “github”, or “twitter”. If you only want to see original content on this site, just go to the posts category page.

Second, with the new evil plan to write more, but also restrict access more, it doesn’t make sense any more to crosspost all of my entries publically to LiveJournal. LJ folks, you’ll just have to deal with visiting a site outside your little circle.

Excited about the Palm Prē – …

January 9th, 2009 @ 2:56 am UTC

Excited about the Palm Prē – I can finally get a new phone now!