Occasionally the test framework would fail with a timeout due to a virtual machine not phoning home in time. This seems to be happen whenever qemu can't bind the VNC or SSH ports for a virtual machine. This was fixed by taking the following actions: 1. Don't listen on VNC unless the `-use-vnc` flag is passed, this removes the need to listen on VNC at all in most cases. The option to use VNC is still left in for debugging virtual machines, but removing this makes it easier to deal with (VNC uses this odd system of "displays" that are mapped to ports above 5900, and qemu doesn't offer a decent way to use a normal port number, so we just disable VNC by default as a compromise). 2. Use a (hopefully) inactive port for SSH. In an ideal world I'd just have the VM's SSH port be exposed via a Unix socket, however the QEMU documentation doesn't really say if you can do this or not. While I do more research, this stopgap will have to make do. 3. Strictly tie more VM resource lifetimes to the tests themselves. Previously the disk image layers for virtual machines were only cleaned up at the end of the test and existed in the parent test-scoped temporary folder. This can make your tmpfs run out of space, which is not ideal. This should minimize the use of temporary storage as much as I know how to. 4. Strictly tie the qemu process lifetime to the lifetime of the test using testing.T#Cleanup. Previously it used a defer statement to clean up the qemu process, however if the tests timed out this defer was not run. This left around an orphaned qemu process that had to be killed manually. This change ensures that all qemu processes exit when their relevant tests finish. Signed-off-by: Christine Dodrill <xe@tailscale.com>
End-to-End VM-based Integration Testing
This test spins up a bunch of common linux distributions and then tries to get
them to connect to a
testcontrol
server.
Running
This test currently only runs on Linux.
This test depends on the following command line tools:
This test also requires the following:
- about 10 GB of temporary storage
- about 10 GB of cached VM images
- at least 4 GB of ram for virtual machines
- hardware virtualization support (KVM) enabled in the BIOS
- the
kvm
module to be loaded (modprobe kvm
) - the user running these tests must have access to
/dev/kvm
(being in thekvm
group should suffice)
This optionally requires an AWS profile to be configured at the default
path.
The S3 bucket is set so that the requester pays. Please keep this in mind when
running these tests on your machine. If you are uncomfortable with the cost from
downloading from S3, you should pass the -no-s3
flag to disable downloads from
S3. However keep in mind that some distributions do not use stable URLs for each
individual image artifact, so there may be spurious test failures as a result.
If you are using Nix, you can run all of the tests with the correct command line tools using this command:
$ nix-shell -p openssh -p go -p qemu -p cdrkit --run "go test . --run-vm-tests --v --timeout 30m"
Keep the timeout high for the first run, especially if you are not downloading VM images from S3. The mirrors we pull images from have download rate limits and will take a while to download.
Because of the hardware requirements of this test, this test will not run
without the --run-vm-tests
flag set.
Other Fun Flags
This test's behavior is customized with command line flags.
Don't Download Images From S3
If you pass the -no-s3
flag to go test
, the S3 step will be skipped in favor
of downloading the images directly from upstream sources, which may cause the
test to fail in odd places.
Distribution Picking
This test runs on a large number of distributions. By default it tries to run
everything, which may or may not be ideal for you. If you only want to test a
subset of distributions, you can use the --distro-regex
flag to match a subset
of distributions using a regular expression
such as like this:
$ go test -run-vm-tests -distro-regex centos
This would run all tests on all versions of CentOS.
$ go test -run-vm-tests -distro-regex '(debian|ubuntu)'
This would run all tests on all versions of Debian and Ubuntu.
Ram Limiting
This test uses a lot of memory. In order to avoid making machines run out of
memory running this test, a semaphore is used to limit how many megabytes of ram
are being used at once. By default this semaphore is set to 4096 MB of ram
(about 4 gigabytes). You can customize this with the --ram-limit
flag:
$ go test --run-vm-tests --ram-limit 2048
$ go test --run-vm-tests --ram-limit 65536
The first example will set the limit to 2048 MB of ram (about 2 gigabytes). The second example will set the limit to 65536 MB of ram (about 65 gigabytes). Please be careful with this flag, improper usage of it is known to cause the Linux out-of-memory killer to engage. Try to keep it within 50-75% of your machine's available ram (there is some overhead involved with the virtualization) to be on the safe side.