What is peg? ============ peg stands for PostgreSQL Environment Generator. It's a fairly complicated script that automates most of the repetitive work required to install test instances of PostgreSQL. It's primarily aimed at developers who are building from source control (cvs, git) and toward reviewers who are trying out a source code package (.tar.gz), including release source snapshots and the alpha/beta builds. Motivation ========== peg usage relies heavily on environment variables. As much as possible, it mirrors the environment setups of two common PostgreSQL setups: * The psql client and other programs that use the libpq library look for environment variables such as PGPORT to change their behavior. Server startup does some of this as well, such as using PGDATA to set the database location. * The RPM packaging used by RedHat's PostgreSQL server uses a sourced environment stored in /etc/sysconfig/postgresql/postgresql which defines PGDATA. One useful idiom when working with postgresql is to have the login profile of the postresql user and others using Postgres to source this file, and therefore have the exact same settings used to start the server. The idea is that once you've setup an environment with peg, the environment variables you'll have available to you will match what you'd get if you were logging into a production RHEL server running the standard PostgreSQL RPM set. Once peg has setup your environment, commands like the following will work, exactly the same as if you'd sourced the sysconfig file on a production RHEL server:: pg_ctl start -l $PGLOG $EDITOR $PGDATA/postgresql.conf TODO: Show a sample RHEL sysconfig script for comparison sake Whether or not you find that useful, you'll also get source code builds and cluster creation, setup, and control automated as well by using peg. peg takes commands similarly to common version control systems: you call the one executable script with a subcommand, followed by what you want it to operate on, which right now is always a project name. Installation ============ peg is so far a single script that doesn't rely on anything else. You can just copy or link it to any directory that's in your path, such as /usr/local/bin for a system-wide install. Here's a sample installation that checks out a working copy and then links to that copy in a standard directory found in the PATH on Linux systems:: $ cd $HOME $ git clone git://github.com/gregs1104/peg.git $ sudo ln -s $HOME/peg/peg /usr/local/bin/ Directory layout ================ peg assumes you are installing your Postgres database as a regular user, typically in your home directory. It allows you have to multiple "projects" installed in this structure, each of which gets its own source code, binaries, and database. The easiest way to make peg work is to create a $HOME/pgwork directory and let peg manage the whole thing for you:: cd mkdir -p pgwork # Repo setup: CVS peg init test . peg build psql This will synchronize with the master PostgreSQL CVS servers via rsync, using the same techniques documented at http://wiki.postgresql.org/wiki/Working_with_CVS (The outline given in its "Initial setup" section is actually peg's distant ancestor) This will get you a directory tree that looks like this (this is simplified to only include the basic outline, there will be more in there):: pgwork/ |-- data | `-- test |-- inst | `-- test | `-- bin |-- repo | `-- cvs `-- src `-- test There will only be one repo directory for each of the projects hosted in this work area, and each project will checkout its own unique copy of that repo into its work area--unless you are using git; see below. The top level directories are intended to work like this: * pgwork/repo: The one copy of the master repo all projects share * pgwork/src/project: Checked out copy of the repo with this project's source * pgwork/inst/project: Binaries build by the "make install" step * pgwork/data/project: Hold the database cluster created by initdb Source code layout for git ~~~~~~~~~~~~~~~~~~~~~~~~~~ If you are using git for your repo, the src/ directory is just a symbolic link to the repo itself, so that every project shares the same repo. Each project is instead given its own branch when you first use "peg init". This seems to match the workflow of git users better than checking out a separate copy for each project. This is simple enough to change: in between the init and build steps, you can remove the symlink and manually copy the master repo. TODO: Provide an example of that Sample tgz session ================== Here's how you might use peg to test out an alpha build downloaded in source code form. To do that instead of using a regular repo, you merely need to create a tgz repo directory and drop the file into there:: # Repo setup: tgz cd mkdir -p pgwork/repo/tgz cp postgresql-8.5alpha2.tar.gz pgwork/repo/tgz # Basic install peg init test . peg build psql cvs or tgz repo with patch ~~~~~~~~~~~~~~~~~~~~~~~~~~ Here's how you might test a patch using CVS for the base repo:: peg init test cd pgwork/src/test patch -p 0 < some.patch . peg build psql TODO: Test the above Sample git session ================== You can clone the postgresql.org git repo just by changing your default PGVCS to be git: cd mkdir -p pgwork # Repo setup: git export PGVCS=git peg init test . peg build psql At this point your sole git repository will be switched to a new branch that matches the name of the project. TODO You can probably force this just by creating a repo/git directory too. Need to document that logic. git repo with patch ~~~~~~~~~~~~~~~~~~~ Here's how you might test a patch using git for the base repo:: peg init test cd pgwork/src/test git apply some.patch . peg build psql TODO: Test the above Sample two-cluster session ========================== Here is a complicated peg installation. The intent is to start two database clusters that shared the same source code and binary set, perhaps for testing replication with two "nodes" on a single server. This is relatively easy to script, using peg to do most of the dirty work normally required here:: # Two node cluster setup peg init master peg init slave # Make the slave use the same source code and binaries as the slave pushd pgwork/inst rm -rf slave ln -s master slave popd pushd pgwork/src rm -rf slave ln -s master slave popd # Start the master peg build master # Can't source the above yet, because then PGDATA will be set # Start the slave PGPORT=5434 ; peg start slave ; PGPORT= . peg switch master psql psql -p 5434 Note that if you now try to stop the slave like this:: peg stop slave This won't actually work, because it will be still using the PGDATA environment variable you sourced in. Instead you need to do this:: . peg switch slave peg stop TODO: Test that last part Base directory detection ======================== The entire peg directory tree is based in a directory recommended to be named pgwork. If you use another directory, you can make the script use it by setting the PGWORK environment variable. The sequence searched to find a working area is: 1. The value passed for PGWORK 2. The current directory 3. $HOME/pgwork 4. $HOME peg assumes it found a correct working area when there is a "repo" subdirectory in one of these locations. Command summary =============== The following subcommands are accepted by peg: * status: Report on environment variables that would be in place if you were to execute a command. Useful mainly for troubleshooting. * init: Create a repo and a project based on it, if one is named * update: Update your repo and a project based on it, if one is named * build-only: Execute the build steps and create a database cluster, but don't start it. This is typically for if you know you need to modify the database configuration before you start it. * build: Build binaries, install them, create a cluster, start the database * initdb: Create a cluster * switch: Switch to an existing built binary set and cluster * start: Start a cluster * stop: Stop a cluster * rm: Remove all data from a project (but not the repo) Shortcuts ========= Once you've gotten an environment setup, using it again can be as easy as:: . peg switch That will use whatever project name you last gave the script and start the server up. If you shutdown your system, that should be all that's needed to bring things back up again. When you source the peg script, along with getting settings like PGDATA available it also creates a couple of quick aliases you can use instead of calling peg for those functions: * start: If you already have a database cluster setup, start it * stop: Stop a running database with the slow shutdown code * immediate: Stop a running database immediately Here again, the names were picked to be similar to the "service postgresql start" and stop commands used by the RPM packaging. start and stop are used on some UNIX systems for job control or system initialization. It's unlikely you'll need those while doing PostgreSQL work too, so re-using those commands for this should save you some typing. Environment variable reference ============================== You can see the main environment variables peg uses internally with:: peg status <project> All of those values are set automatically only if you don't explicitly set them to something first. This allows you to do things like use peg to manage your source and binary builds, while still having a PGDATA that points to a separate location where you want your database to go. * PGPORT: Client programs use this to determine what port to connect on; if set, any "peg start" commands will start the server on that port. See the multi-cluster example for how this might be used. * PGVCS: Valid options are "cvs", "git", and "tgz". If you have more than one type of source in your repo directory, you can use this to specify which of them you should use. * PGWORK: Base directory for working area. See "Base directory detection". * PGPROJECT: If this is set, it will become the project name used for all commands, regardless of what's passed on the command line. * PGDEBUG: By default, peg builds PostgreSQL with the standard flags you'd want to use for development and testing. This includes assertions, which will slow down the code considerably. If you want to build without assertions and debugging information, you'll need to set this to a non-empty value that can be passed through to "configure" without doing anything. A space works for this, for example:: export PGDEBUG=" " * PGMAKE: Program to run GNU make. This defaults to "make" but can be overridden. Solaris Use =========== The defaults for peg are known to have issues building on a typical Solaris system, where the GNU building toolchain is not necessarily the default one. Here's a sample configuration you can put into your environment to change the defaults to work on that platform, assuming you've installed the Sun Freeware GNU tools in their default location:: export PGMAKE="/usr/sfw/bin/gmake" export PGDEBUG="--enable-cassert --enable-debug --without-readline --with-libedit-preferred" Known Issues ============ See TODO notes in the peg source code (and even this documentation) for the open issues in the code. Some of the rougher edges right now: * peg creates a database matching your name, which is what psql wants for a default. It doesn't check whether it already exist first though, so you'll often see an error about that when starting a database. This is harmless. * If you source peg output repeatedly, it will pollute your PATH with multiple pointers to the same directory tree. This is mostly harmless, just slowing down how fast commands can be found in your PATH a bit. Credits ======= Copyright (c) 2009, Gregory Smith All rights reserved. See COPYRIGHT file for full license details peg was written by Greg Smith to make all of the PostgreSQL systems he works on regularly have something closer to a common UI at the console. peg's directory layout and general design was inspired by several members of the PostgreSQL community, including: * Heikki Linnakangas, whose outlined his personal work habits for interacting with the CVS repo and inspired Greg to write the original "CVS+rsync Solutions" section of http://wiki.postgresql.org/wiki/Working_with_CVS * Alan Li, Jeff Davis, and other members of Truviso who I may not directly remember borrowing ideas from as much as those two. Alan and Jeff both had their own way to organize PostgreSQL installations in their respective home directories that I found interesting when we worked together on projects.