Packer-Virt-Sysprep: Example Packer Build

This repository demonstrates building and 'sysprepping' a basic CentOS VM with Packer and Packer-Virt-Sysprep.

This repository is intended as a reference only. The resultant VM should not be used!

It should be fairly easy to cherry pick the bits of config required to use the Packer-Virt-Sysprep operations within your own builds. Alternatively, you can use the templates and files included here as a starting point and go from there.

Packer-Virt-Sysprep Operations

sysprep-op-bash-history: Delete bash history for root and all users under /home
sysprep-op-crash-data: Delete any crash data created by kexec-tools
sysprep-op-dhcp-client-state: Delete any DHCP lease information
sysprep-op-firewall-rules: Delete custom rules and firewall customisations
sysprep-op-logfiles: Ensures the resultant image is devoid of log files
sysprep-op-machine-id: Deletes the machine-id. This ensures a unique id is created the next time the machine is booted.
sysprep-op-mail-spool: Removes any mail from the local spool
sysprep-op-package-manager-cache: Removes cache files associated with the guests package manager. Should work for apt, dnf, yum and zypper.
sysprep-op-rpm-db: Removes host specific RPM database files. RPM will recreate these files automatically when needed. Clearly intended for RPM based distro's (but should be safe to run against non RPM distro's as well)
sysprep-op-ssh-hostkeys: Delete the host ssh keys. A new set of keys will be auto-generated by the host at next boot.
sysprep-op-tmp-files: Ensures the resultant image is devoid of any temp files
sysprep-op-yum-uuid: Remove the yum package manager UUID associated with the guest. A new UUID will be automatically generated the next time yum is run

Quick start

The Packer-Virt-Sysprep scripts are incorporated under the scripts directory as a submodule so don't forget the --recurse-submodules option when cloning. It is assumed that you have Packer and Virtualbox installed. All testing has been done with Virtualbox 5.0.26.

$ git clone --recurse-submodules https://github.com/DanHam/packer-virt-sysprep-example.git
$ cd packer-virt-sysprep-example

Open the centos.json build template with your favourite editor.
Optionally adjust which Packer-Virt-Sysprep options will be executed or skipped over by setting true or false for each of the sysprep-op-* user variables in the variables section at the head of the template.

Build the box

$ packer build centos.json

The truly impatient can race ahead to the testing section below to see if the configured options worked correctly.

Coverage

The scripts provided have been tested on CentOS 6.x and 7.x and Debian 8.x. While not tested, the scripts or 'operations' should also work on Red Hat 6 and 7 without issue. However, the software is provided as is and you should test thoroughly to ensure the results of running the scripts are as expected! In other words, there is no implied warranty of any kind!!

Usage on any other OS, such as SUSE or Ubuntu, may work but will require thorough testing. Let me know if you find, after thorough testing, that the scripts work for you on another OS. Similarly, if you adapt the scripts to make them work on another platform, or simply make some improvements, please feel free to issue a PR to incorporate the changes.

Basic Usage

Some basic familiarity with Packer is assumed. If something below doesn't make sense, please read through the Packer documentation.

Generally speaking the packer-virt-sysprep operations should be among the last provisioning scripts you run against your build before shutting it down.

Each script or virt-sysprep style operation can be used individually or in conjuction with any or all of the other operations.

The example below will run the operations that ensure each machine created from the generated image will have a unique machine-id and host ssh keys. Be warned that you may need to change the "execute_command" to fit with how you do things in your build e.g. run with sudo. Additionally, note that all packer-virt-sysprep scripts expect to be run within a Bash shell!

{
  "builders": [
    {
        ...
        VMware ISO builder
        Virtualbox ISO builder
        etc
        ...
    }
  ],

  "provisioners": [
    {
      "type": "shell"
      "execute_command": "{{ .Vars }} $(command -v bash) '{{ .Path }}'",
      "scripts": [
        "scripts/packer-virt-sysprep/sysprep-op-machine-id.sh",
        "scripts/packer-virt-sysprep/sysprep-op-ssh-hostkeys.sh"
      ]
    }
  ]
}

Recommended Usage

Rather than referencing each operation script individually it is much better to run all packer-virt-sysprep operations from a wrapper or master control script. This helps to keep the Packer template fairly tidy since only the wrapper script needs to be referenced within the template. When used in conjunction with user variables and exported environment variables, the use of a wrapper script provides a convenient way to control what operations are performed without having to constantly rearrange the template.

Be warned that by default Packer will use /tmp as the directory to which provisioning scripts are uploaded to and subsequently executed from. Since the sysprep-op-tmp-files operation aims to delete all files under tmp, it's probably not a good idea to have Packer use /tmp when running the packer-virt-sysprep operation scripts! Thankfully, Packer provides a mechanism for customising which directory the scripts will be uploaded to and run from.

All of this is probably best demonstrated through example - which, after all, is why this 'example' repo exists in the first place...

Taking each section of the Packer template in turn:

{
  "variables": {
	...

    "packer_virt_sysprep_dir": "/packer-virt-sysprep",

    "sysprep_op_bash_history": "true",
    "sysprep_op_crash_data":   "true",
    "sysprep_op_dhcp_client_state": "true",
    "sysprep_op_firewall_rules": "true",
    "sysprep_op_logfiles": "true",
    "sysprep_op_machine_id": "true",
    "sysprep_op_mail_spool": "true",
    "sysprep_op_package_manager_cache": "true",
    "sysprep_op_rpm_db": "true",
    "sysprep_op_ssh_hostkeys": "true",
    "sysprep_op_tmp_files": "true",
    "sysprep_op_yum_uuid": "true"
  },

These user variables are referenced in the environment variables section of the remote shell provisioner.

The sysprep_op_* variables provide the means by which the master control script decides which operations should be executed and which operations will be skipped.
The packer_virt_sysprep_dir variable is used to specify the directory within the guest to which the packer-virt-sysprep scripts will be uploaded. It is also the directory the control script is uploaded to and then executed from.

The builders section of the template is fairly standard. There's nothing of particular relevance here with regard to the use of the packer-virt-sysprep scripts.

The provisioners section is a little more involved, but breaking it down into it's constituent parts makes it easily understandable.

The first shell provisioner creates the vagrant user and sets up sudoers. All fairly standard stuff and nothing really to do with packer-virt-sysprep. Moving on...

  ...
  "provisioners": [
    {
      "type": "shell",
      "execute_command": "{{ .Vars }} $(command -v bash) '{{.Path }}'",
      "scripts": [
        "scripts/01-create-vagrant-user.sh",
        "scripts/02-configure-sudoers.sh"
      ]
    },
  ...

The following provisioner creates the directory to which we'll be uploading all the packer-virt-sysprep scripts. As stated above, the packer_virt_sysprep_dir user variable is referenced in the environment_vars section and sets us up for running the mkdir command in the inline section. All fairly obvious stuff. Moving on...

  ...
    {
      "type": "shell",
      "execute_command": "{{ .Vars }} $(command -v bash) '{{.Path }}'",
      "environment_vars": [
        "PACKER_VIRT_SYSPREP_DIR={{user `packer_virt_sysprep_dir`}}"
      ],
      "inline": [
        "mkdir $PACKER_VIRT_SYSPREP_DIR"
      ]
    },
  ...

The provisioner shown below uploads all of the scripts under our repo's scripts/packer-virt-sysprep directory to the directory we created in the previous step. Now our scripts are present on the target machine we can proceed to the next provisioner...

  ...
     {
      "type": "file",
      "source": "scripts/packer-virt-sysprep/",
      "destination": "{{user `packer_virt_sysprep_dir`}}"
    },

This is where the real action occurs.

The remote_folder stanza is important. This is where the control script will be uploaded to and executed from. Using the custom directory we specified earlier keeps the control script out of the way - saving it from potentially being obliterated by the sysprep-op-tmp-files operation.

All of the user variables are referenced and exported as environment variables of the same name in the environment_vars section. All of the environment variables are subsequently referenced and used in the control script.

The script stanza tells Packer to run our control script. The script is very simple - it will run the operation or script if the corresponding user variable is set to true. The operation will be skipped otherwise.

  ...
    {
      "type": "shell",
      "remote_folder": "{{user `packer_virt_sysprep_dir`}}",
      "environment_vars": [
        "PACKER_VIRT_SYSPREP_DIR={{user `packer_virt_sysprep_dir`}}",
        "SYSPREP_OP_BASH_HISTORY={{user `sysprep_op_bash_history`}}",
        "SYSPREP_OP_CRASH_DATA={{user `sysprep_op_crash_data`}}",
        "SYSPREP_OP_DHCP_CLIENT_STATE={{user `sysprep_op_dhcp_client_state`}}",
        "SYSPREP_OP_FIREWALL_RULES={{user `sysprep_op_firewall_rules`}}",
        "SYSPREP_OP_LOGFILES={{user `sysprep_op_logfiles`}}",
        "SYSPREP_OP_MACHINE_ID={{user `sysprep_op_machine_id`}}",
        "SYSPREP_OP_MAIL_SPOOL={{user `sysprep_op_mail_spool`}}",
        "SYSPREP_OP_PACKAGE_MANAGER_CACHE={{user `sysprep_op_package_manager_cache`}}",
        "SYSPREP_OP_RPM_DB={{user `sysprep_op_rpm_db`}}",
        "SYSPREP_OP_SSH_HOSTKEYS={{user `sysprep_op_ssh_hostkeys`}}",
        "SYSPREP_OP_TMP_FILES={{user `sysprep_op_tmp_files`}}",
        "SYSPREP_OP_YUM_UUID={{user `sysprep_op_yum_uuid`}}"
      ],
      "execute_command": "{{ .Vars }} $(command -v bash) '{{.Path }}'",
      "script": "scripts/99-packer-virt-sysprep-control-script.sh"
    },
  ...

The final provisioner simply cleans everything up.

  ...
    {
      "type": "shell",
      "execute_command": "{{ .Vars }} $(command -v bash) '{{.Path }}'",
      "environment_vars": [
        "PACKER_VIRT_SYSPREP_DIR={{user `packer_virt_sysprep_dir`}}"
      ],
      "inline": [
        "rm -rf $PACKER_VIRT_SYSPREP_DIR"
      ]
    }
  ...

Testing

One of the more challenging steps in performing operations of this kind is testing that the script or operation actually did what you wanted it to. For example, it's all well and good seeing the sysprep-op-logfiles.sh script run, but has it actually done its stuff? Is the image we just created with Packer now devoid of all its log files?

Clearly, if you boot the machine back up again to take a look, all of the logs (or at least some) will be present because a running system will recreate those logs as and when it needs them. The same thing goes for the hosts ssh keys or the machine ID.

What we need is a way to look inside the image (actually the images hard disk file) without actually starting it. Thankfully, there is an excellent tool available for doing just this - guestfish. Again, this is part of the libguestfs toolset. Unfortunately, libguestfs is not available natively on every platform. However...

If you're using Packer then you are undoubtedly aware of (and probably using) Vagrant. It's trivially easy to grab a box and then install the libguestfs tools we need. libguestfs tools is available on both Debian and CentOS. Note that there are a few limitations with the libguestfs package on CentOS that are not present in the Debian package. As such I would recommend using Debian for your testing.

Getting libguestfs Tools on Debian

Create a working directory for the box; initialise, start, and then ssh into the box.

$ mkdir ~/bento-debian
$ cd ~/bento-debian
$ vagrant init bento/debian-8.6
$ vagrant ssh

Once we're on the box we can install the tools we need:

$ sudo apt-get install -y libguestfs-tools

Debconf will fire up asking you to configure mdadm. In the dialog box asking about MD arrays needed for the root file system: delete all and enter none. That's it.

Getting libguestfs Tools on CentOS

Create a working directory for the box; initialise, start, and then ssh into the box.

$ mkdir ~/bento-centos
$ cd ~/bento-centos
$ vagrant init bento/centos-7.2
$ vagrant ssh

Once we're on the box we can install the tools we need:

$ sudo yum install -y libguestfs-tools-c

That's it.

Preparing Our Images vmdk for use with Guestfish

If you've deleted the virtualbox-iso builder from the example template and are instead using the vmware-iso builder, the good news is that there is nothing to do here. Long and short - vmdk files produced by the vmware-iso builder are good to go.

However, if you've stuck with the virtualbox-iso builder you will first need to convert the exported vmdk file into a format that Guestfish can understand. To reduce the size of the exported image, Virtualbox exports the machines hard disk in a compressed format that Guestfish cannot work with. Thankfully, the VBoxManage tool provides a way to convert the disk file so that we can work with it.

If you've not created the box yet, now is the time to do so:

$ packer build centos.json

Once the process is complete you should have a Virtualbox VM in OVF format under the output-virtualbox-iso directory. The VM is comprised of two files - the OVF file itself and the virtual hard disk or vmdk file. We need to convert the vmdk file:

$ cd output-virtualbox-iso
$ VBoxManage clonehd \
             packer-virtualbox-iso-1480432683-disk1.vmdk \
             --format VMDK \
             --variant Standard \
             centos-guestfish.vmdk

The name of the vmdk file under the output-virtualbox-iso directory may differ from that shown above - just change the command appropriately. After the command has finished you should have created the centos-guestfish.vmdk file. This will be somewhat larger than the original vmdk but, importantly for us, will be usable with Guestfish.

Inspecting the VMDK with Guestfish

The first step is to copy the vmdk file so that it is accessible within the Vagrant box. Simply copying the vmdk file to the root of Vagrants working directory should be all that's needed - the vmdk will then be visible under the /vagrant directory within the guest.

Next step is to change to the Vagrant boxes working directory and ssh into the box. From there change to the /vagrant directory. The vmdk file should be there.

Using the Debian Vagrant box example above:

$ cp centos-guestfish.vmdk ~/bento-debian/
$ cd ~/bento-debian
$ vagrant ssh
$ cd /vagrant

We can finally start guestfish and take a look at the contents of the vmdk. The preparations or start up procedure for Guestfish are slightly different depending on whether you are using Guestfish on Debian or CentOS.

First Debian:

$ guestfish

You should see a small greeting message and be presented with the Guestfish prompt. Add the vmdk file and initialise with the following commands

><fs> add centos-guestfish.vmdk
><fs> run

You should be presented with a progress bar as Guestfish does its stuff and reads in the vmdk.

Now CentOS. Under the covers Guestfish makes use of the qemu-img binary to access and manipulate virtual disks. For whatever reason the functionality of the qemu-img vmdk driver is limited to read-only on CentOS. For whatever reason, we also need to tell libguestfs that is should use qemu without going through libvirt...

$ export LIBGUESTFS_BACKEND=direct
$ guestfish

Same as under Debian, you should see the greeting and be presented with the Guestfish prompt. When adding the vmdk under CentOS we need to specify that the vmdk should be loaded up read-only:

><fs> add centos-guestfish.vmdk readonly:true
><fs> run

Again, you should be presented with a progress bar as Guestfish reads in the vmdk. If you see an error at this point exit out of the box and issue a vagrant reload to reboot. Log back in and try running Guestfish again - it should succeed. If not... blow the box away and use Debian instead!!!

At this point you should be at the same stage whatever OS you have chosen to install Guestfish on.

List the filesystems on the vmdk and mount the second one - for the example build this is the root file system. If you wish you can then go ahead and mount the first filesystem as well - this is actually /boot. Note that if you don't do this you will simply have an empty /boot directory under /

>fs<> list-filesystems
/dev/sda1: xfs
/dev/sda2: xfs
>fs<> mount /dev/sda2 /
>fs<> mount /dev/sda1 /boot

While some what limited when compared to the usual suite of tools available from a normal shell, Guestfish helpfully provides similes of some of the most useful shell commands. Assuming the sysprep-op-ssh-hostkeys variable was set to true in the Packer template, we should see that the host keys under /etc/ssh have indeed been removed:

>fs<> ls /etc/ssh
moduli
ssh_config
sshd_config

Similarly, if the sysprep-op-machine-id and sysprep-op-tmp-files operations were enabled, the /etc/machine-id file will be empty and there shouldn't be any files under /tmp or /var/tmp

>fs<> cat /etc/machine-id

>fs<> ls /tmp
>fs<> ls /var/tmp

Thanks to the awesomeness of Guestfish we can see we have success! Now would be a good time to take a look at all the commands you can run within the Guestfish shell by taking a look at the docs for the tool.

To finish up simply exit out of the Guestfish shell.

>fs<> exit