This repository demonstrates building and 'sysprepping' a basic CentOS VM with Packer and Packer-Virt-Sysprep.
This repository is intended as a reference only. The resultant VM should not be used!
It should be fairly easy to cherry pick the bits of config required to use the Packer-Virt-Sysprep operations within your own builds. Alternatively, you can use the templates and files included here as a starting point and go from there.
- sysprep-op-bash-history: Delete bash history for root and all users under /home
- sysprep-op-crash-data: Delete any crash data created by kexec-tools
- sysprep-op-dhcp-client-state: Delete any DHCP lease information
- sysprep-op-firewall-rules: Delete custom rules and firewall customisations
- sysprep-op-logfiles: Ensures the resultant image is devoid of log files
- sysprep-op-machine-id: Deletes the machine-id. This ensures a unique id is created the next time the machine is booted.
- sysprep-op-mail-spool: Removes any mail from the local spool
- sysprep-op-package-manager-cache: Removes cache files associated with the guests package manager. Should work for apt, dnf, yum and zypper.
- sysprep-op-rpm-db: Removes host specific RPM database files. RPM will recreate these files automatically when needed. Clearly intended for RPM based distro's (but should be safe to run against non RPM distro's as well)
- sysprep-op-ssh-hostkeys: Delete the host ssh keys. A new set of keys will be auto-generated by the host at next boot.
- sysprep-op-tmp-files: Ensures the resultant image is devoid of any temp files
- sysprep-op-yum-uuid: Remove the yum package manager UUID associated with the guest. A new UUID will be automatically generated the next time yum is run
The Packer-Virt-Sysprep scripts are incorporated under the scripts directory as a submodule so don't forget the --recurse-submodules option when cloning. It is assumed that you have Packer and Virtualbox installed. All testing has been done with Virtualbox 5.0.26.
$ git clone --recurse-submodules https://github.com/DanHam/packer-virt-sysprep-example.git
$ cd packer-virt-sysprep-example
- Open the centos.json build template with your favourite editor.
- Optionally adjust which Packer-Virt-Sysprep options will be executed or
skipped over by setting true or false for each of the
sysprep-op-*
user variables in thevariables
section at the head of the template.
Build the box
$ packer build centos.json
The truly impatient can race ahead to the testing section below to see if the configured options worked correctly.
The scripts provided have been tested on CentOS 6.x and 7.x and Debian 8.x. While not tested, the scripts or 'operations' should also work on Red Hat 6 and 7 without issue. However, the software is provided as is and you should test thoroughly to ensure the results of running the scripts are as expected! In other words, there is no implied warranty of any kind!!
Usage on any other OS, such as SUSE or Ubuntu, may work but will require thorough testing. Let me know if you find, after thorough testing, that the scripts work for you on another OS. Similarly, if you adapt the scripts to make them work on another platform, or simply make some improvements, please feel free to issue a PR to incorporate the changes.
Some basic familiarity with Packer is assumed. If something below doesn't make sense, please read through the Packer documentation.
Generally speaking the packer-virt-sysprep operations should be among the last provisioning scripts you run against your build before shutting it down.
Each script or virt-sysprep style operation can be used individually or in conjuction with any or all of the other operations.
The example below will run the operations that ensure each machine created from the generated image will have a unique machine-id and host ssh keys. Be warned that you may need to change the "execute_command" to fit with how you do things in your build e.g. run with sudo. Additionally, note that all packer-virt-sysprep scripts expect to be run within a Bash shell!
{
"builders": [
{
...
VMware ISO builder
Virtualbox ISO builder
etc
...
}
],
"provisioners": [
{
"type": "shell"
"execute_command": "{{ .Vars }} $(command -v bash) '{{ .Path }}'",
"scripts": [
"scripts/packer-virt-sysprep/sysprep-op-machine-id.sh",
"scripts/packer-virt-sysprep/sysprep-op-ssh-hostkeys.sh"
]
}
]
}
Rather than referencing each operation script individually it is much better to run all packer-virt-sysprep operations from a wrapper or master control script. This helps to keep the Packer template fairly tidy since only the wrapper script needs to be referenced within the template. When used in conjunction with user variables and exported environment variables, the use of a wrapper script provides a convenient way to control what operations are performed without having to constantly rearrange the template.
Be warned that by default Packer will use /tmp
as the directory to
which provisioning scripts are uploaded to and subsequently executed from.
Since the sysprep-op-tmp-files
operation aims to delete all files
under tmp
, it's probably not a good idea to have Packer use
/tmp
when running the packer-virt-sysprep operation scripts!
Thankfully, Packer provides a mechanism for customising which directory
the scripts will be uploaded to and run from.
All of this is probably best demonstrated through example - which, after all, is why this 'example' repo exists in the first place...
Taking each section of the Packer template in turn:
{
"variables": {
...
"packer_virt_sysprep_dir": "/packer-virt-sysprep",
"sysprep_op_bash_history": "true",
"sysprep_op_crash_data": "true",
"sysprep_op_dhcp_client_state": "true",
"sysprep_op_firewall_rules": "true",
"sysprep_op_logfiles": "true",
"sysprep_op_machine_id": "true",
"sysprep_op_mail_spool": "true",
"sysprep_op_package_manager_cache": "true",
"sysprep_op_rpm_db": "true",
"sysprep_op_ssh_hostkeys": "true",
"sysprep_op_tmp_files": "true",
"sysprep_op_yum_uuid": "true"
},
These user variables are referenced in the environment variables section of the remote shell provisioner.
- The
sysprep_op_*
variables provide the means by which the master control script decides which operations should be executed and which operations will be skipped. - The
packer_virt_sysprep_dir
variable is used to specify the directory within the guest to which the packer-virt-sysprep scripts will be uploaded. It is also the directory the control script is uploaded to and then executed from.
The builders
section of the template is fairly standard. There's
nothing of particular relevance here with regard to the use of the
packer-virt-sysprep scripts.
The provisioners
section is a little more involved, but breaking
it down into it's constituent parts makes it easily understandable.
The first shell provisioner creates the vagrant user and sets up sudoers. All fairly standard stuff and nothing really to do with packer-virt-sysprep. Moving on...
...
"provisioners": [
{
"type": "shell",
"execute_command": "{{ .Vars }} $(command -v bash) '{{.Path }}'",
"scripts": [
"scripts/01-create-vagrant-user.sh",
"scripts/02-configure-sudoers.sh"
]
},
...
The following provisioner creates the directory to which we'll be
uploading all the packer-virt-sysprep scripts. As stated above,
the packer_virt_sysprep_dir
user variable is referenced in
the environment_vars
section and sets us up for running
the mkdir
command in the inline
section.
All fairly obvious stuff. Moving on...
...
{
"type": "shell",
"execute_command": "{{ .Vars }} $(command -v bash) '{{.Path }}'",
"environment_vars": [
"PACKER_VIRT_SYSPREP_DIR={{user `packer_virt_sysprep_dir`}}"
],
"inline": [
"mkdir $PACKER_VIRT_SYSPREP_DIR"
]
},
...
The provisioner shown below uploads all of the scripts under our
repo's scripts/packer-virt-sysprep
directory to the directory we
created in the previous step. Now our scripts are present on the target
machine we can proceed to the next provisioner...
...
{
"type": "file",
"source": "scripts/packer-virt-sysprep/",
"destination": "{{user `packer_virt_sysprep_dir`}}"
},
This is where the real action occurs.
The remote_folder
stanza is important. This is where the control script will be uploaded to
and executed from. Using the custom directory we specified earlier keeps
the control script out of the way - saving it from potentially being
obliterated by the sysprep-op-tmp-files
operation.
All of the user variables are referenced and exported as environment
variables of the same name in the environment_vars
section. All of
the environment variables are subsequently referenced and used in the
control
script.
The script
stanza tells Packer to run our control script. The script
is very simple - it will run the operation or script if the corresponding
user variable is set to true. The operation will be skipped otherwise.
...
{
"type": "shell",
"remote_folder": "{{user `packer_virt_sysprep_dir`}}",
"environment_vars": [
"PACKER_VIRT_SYSPREP_DIR={{user `packer_virt_sysprep_dir`}}",
"SYSPREP_OP_BASH_HISTORY={{user `sysprep_op_bash_history`}}",
"SYSPREP_OP_CRASH_DATA={{user `sysprep_op_crash_data`}}",
"SYSPREP_OP_DHCP_CLIENT_STATE={{user `sysprep_op_dhcp_client_state`}}",
"SYSPREP_OP_FIREWALL_RULES={{user `sysprep_op_firewall_rules`}}",
"SYSPREP_OP_LOGFILES={{user `sysprep_op_logfiles`}}",
"SYSPREP_OP_MACHINE_ID={{user `sysprep_op_machine_id`}}",
"SYSPREP_OP_MAIL_SPOOL={{user `sysprep_op_mail_spool`}}",
"SYSPREP_OP_PACKAGE_MANAGER_CACHE={{user `sysprep_op_package_manager_cache`}}",
"SYSPREP_OP_RPM_DB={{user `sysprep_op_rpm_db`}}",
"SYSPREP_OP_SSH_HOSTKEYS={{user `sysprep_op_ssh_hostkeys`}}",
"SYSPREP_OP_TMP_FILES={{user `sysprep_op_tmp_files`}}",
"SYSPREP_OP_YUM_UUID={{user `sysprep_op_yum_uuid`}}"
],
"execute_command": "{{ .Vars }} $(command -v bash) '{{.Path }}'",
"script": "scripts/99-packer-virt-sysprep-control-script.sh"
},
...
The final provisioner simply cleans everything up.
...
{
"type": "shell",
"execute_command": "{{ .Vars }} $(command -v bash) '{{.Path }}'",
"environment_vars": [
"PACKER_VIRT_SYSPREP_DIR={{user `packer_virt_sysprep_dir`}}"
],
"inline": [
"rm -rf $PACKER_VIRT_SYSPREP_DIR"
]
}
...
One of the more challenging steps in performing operations of this kind is
testing that the script or operation actually did what you wanted it to.
For example, it's all well and good seeing the sysprep-op-logfiles.sh
script run, but has it actually done its stuff? Is the image we just
created with Packer now devoid of all its log files?
Clearly, if you boot the machine back up again to take a look, all of the logs (or at least some) will be present because a running system will recreate those logs as and when it needs them. The same thing goes for the hosts ssh keys or the machine ID.
What we need is a way to look inside the image (actually the images hard disk file) without actually starting it. Thankfully, there is an excellent tool available for doing just this - guestfish. Again, this is part of the libguestfs toolset. Unfortunately, libguestfs is not available natively on every platform. However...
If you're using Packer then you are undoubtedly aware of (and probably using) Vagrant. It's trivially easy to grab a box and then install the libguestfs tools we need. libguestfs tools is available on both Debian and CentOS. Note that there are a few limitations with the libguestfs package on CentOS that are not present in the Debian package. As such I would recommend using Debian for your testing.
Create a working directory for the box; initialise, start, and then ssh into the box.
$ mkdir ~/bento-debian
$ cd ~/bento-debian
$ vagrant init bento/debian-8.6
$ vagrant ssh
Once we're on the box we can install the tools we need:
$ sudo apt-get install -y libguestfs-tools
Debconf will fire up asking you to configure mdadm. In the dialog box
asking about MD arrays needed for the root file system:
delete
all
and enter none
. That's it.
Create a working directory for the box; initialise, start, and then ssh into the box.
$ mkdir ~/bento-centos
$ cd ~/bento-centos
$ vagrant init bento/centos-7.2
$ vagrant ssh
Once we're on the box we can install the tools we need:
$ sudo yum install -y libguestfs-tools-c
That's it.
If you've deleted the virtualbox-iso
builder from the example
template and are instead using the vmware-iso
builder, the good news
is that there is nothing to do here.
Long and short - vmdk files produced by the vmware-iso
builder are
good to go.
However, if you've stuck with the virtualbox-iso
builder you will
first need to convert the exported vmdk file into a format that Guestfish
can understand. To reduce the size of the exported image, Virtualbox
exports the machines hard disk in a compressed format that Guestfish
cannot work with. Thankfully, the VBoxManage
tool provides a way to
convert the disk file so that we can work with it.
If you've not created the box yet, now is the time to do so:
$ packer build centos.json
Once the process is complete you should have a Virtualbox VM in OVF format
under the output-virtualbox-iso
directory. The VM is comprised of
two files - the OVF file itself and the virtual hard disk or vmdk file.
We need to convert the vmdk file:
$ cd output-virtualbox-iso
$ VBoxManage clonehd \
packer-virtualbox-iso-1480432683-disk1.vmdk \
--format VMDK \
--variant Standard \
centos-guestfish.vmdk
The name of the vmdk file under the output-virtualbox-iso
directory
may differ from that shown above - just change the command appropriately.
After the command has finished you should have created the
centos-guestfish.vmdk
file. This will be somewhat larger than the
original vmdk but, importantly for us, will be usable with Guestfish.
The first step is to copy the vmdk file so that it is accessible within the
Vagrant box. Simply copying the vmdk file to the root of Vagrants working
directory should be all that's needed - the vmdk will then be visible
under the /vagrant
directory within the guest.
Next step is to change to the Vagrant boxes working directory and ssh into
the box. From there change to the /vagrant
directory. The vmdk file
should be there.
Using the Debian Vagrant box example above:
$ cp centos-guestfish.vmdk ~/bento-debian/
$ cd ~/bento-debian
$ vagrant ssh
$ cd /vagrant
We can finally start guestfish and take a look at the contents of the vmdk. The preparations or start up procedure for Guestfish are slightly different depending on whether you are using Guestfish on Debian or CentOS.
First Debian:
$ guestfish
You should see a small greeting message and be presented with the Guestfish prompt. Add the vmdk file and initialise with the following commands
><fs> add centos-guestfish.vmdk
><fs> run
You should be presented with a progress bar as Guestfish does its stuff and reads in the vmdk.
Now CentOS. Under the covers Guestfish makes use of the qemu-img
binary to access and manipulate virtual disks. For whatever reason the
functionality of the qemu-img
vmdk
driver is limited to
read-only on CentOS. For whatever reason, we also need to tell libguestfs
that is should use qemu without going through libvirt...
$ export LIBGUESTFS_BACKEND=direct
$ guestfish
Same as under Debian, you should see the greeting and be presented with the Guestfish prompt. When adding the vmdk under CentOS we need to specify that the vmdk should be loaded up read-only:
><fs> add centos-guestfish.vmdk readonly:true
><fs> run
Again, you should be presented with a progress bar as Guestfish reads in
the vmdk. If you see an error at this point exit out of the box and issue
a vagrant reload
to reboot. Log back in and try running Guestfish
again - it should succeed. If not... blow the box away and use Debian
instead!!!
At this point you should be at the same stage whatever OS you have chosen to install Guestfish on.
List the filesystems on the vmdk and mount the second one - for the example build this is the root file system. If you wish you can then go ahead and mount the first filesystem as well - this is actually /boot. Note that if you don't do this you will simply have an empty /boot directory under /
>fs<> list-filesystems
/dev/sda1: xfs
/dev/sda2: xfs
>fs<> mount /dev/sda2 /
>fs<> mount /dev/sda1 /boot
While some what limited when compared to the usual suite of tools
available from a normal shell, Guestfish helpfully provides similes of
some of the most useful shell commands. Assuming
the sysprep-op-ssh-hostkeys
variable was set to true
in the
Packer template, we should see that the host keys under /etc/ssh
have indeed been removed:
>fs<> ls /etc/ssh
moduli
ssh_config
sshd_config
Similarly, if the sysprep-op-machine-id
and sysprep-op-tmp-files
operations were enabled, the /etc/machine-id
file will be empty and
there shouldn't be any files under /tmp
or /var/tmp
>fs<> cat /etc/machine-id
>fs<> ls /tmp
>fs<> ls /var/tmp
Thanks to the awesomeness of Guestfish we can see we have success! Now would be a good time to take a look at all the commands you can run within the Guestfish shell by taking a look at the docs for the tool.
To finish up simply exit out of the Guestfish shell.
>fs<> exit