SQL Version Control System Design Issues

The initial release of Pyrseas is nearing completion. As noted elsewhere, Pyrseas was born to follow through on the concepts of the Andromeda Project. The idea of using a data dictionary specified in JSON or YAML was perhaps the most attractive feature of Andromeda. However, I felt there were several areas that could be improved upon. In this post I’ll go over some of the design issues that influenced the creation of the Pyrseas version control utilities.

Test Driven Development

One of the goals of Pyrseas is to encourage, support or otherwise assist in the development of test-driven applications. The tools should allow the DBA or developer to experiment with changes to database tables and other objects with almost as much ease as a file-based application programmer can change the file structures (granted, the latter has much more work overall due to the lack of a DBMS). ALTER TABLE is great to experiment and make changes, but you need something else if you also want (or need) to keep a current representation of the CREATE TABLE statement in your VCS.

In order to encourage TDD, I think it’s crucial for the tools to do their work speedily. Andromeda takes about 10 seconds to process a database change such as those I have been using as examples. The analogous Pyrseas tool completes the task in one to two orders of magnitude less time.

Team Development

Another objective is to make the tools convenient for use in a team environment. In other words, it should be easy for one DBA to make a change in her database and confidently share it –together with a set of related code changes– with her colleague via a distributed VCS. Note this does not preclude them being used by a single developer.

Modularity

Andromeda insists on a single data dictionary file. Although it requires “modules” to be defined to group tables, all user tables are created in the ‘public’ schema. It appears that Andromeda “modules” are primarily for user interface and security constraint purposes.

The initial release of Pyrseas recognizes the existence of PostgreSQL schemas and allows the DBA to place the tables in schemas as desired. Currently, the Pyrseas dbtoyaml tool (see below) can output specifications for a single schema or for selected tables. I hope to refine this capability in the future, so that –if desired– the data dictionary file could be split and stored in the VCS in a modular fashion, i.e., side-by-side or close to related application code.

Existing System Support

One aspect of Andromeda that I found limiting when I started delving into it was what happens if you want to use if for an existing database, application or project. First, it requires you to create a YAML specification of your database manually, according to its rules, for example, columns need to defined before tables and then “re-used” to specify the tables. Secondly, you can only use a limited set of data types, e.g., a ‘Y’ or ‘N’ column for a BOOLEAN.

For Pyrseas, the first tool designed was dbtoyaml which connects to a database and outputs its schemas, tables and columns in a YAML/JSON format. This format can then be input directly into the second tool, yamltodb. The latter generates SQL statements to bring a second database to match the first.

Summarizing all of the above, this allows a DBA to make a database change either through psql, pgAdmin or some other tool, and then use dbtoyaml to quickly generate a YAML dictionary which can be submitted to the VCS together with related code changes. Another DBA or programmer can pull the changes and apply them promptly to their database to test or integrate them with their own database and code changes.

Platform Support

This an area that concerns implementation but nevertheless affects design. SQL database version control tools can opt to support multiple databases or just one. Presently, I’ve decided to concentrate on PostgreSQL. I believe this can provide complete or nearly complete coverage of the many PG features, and will keep my hands full. If Pyrseas is found useful by others, I’ll be glad to assist anyone with expanding support to other DBMSs.

Development of Pyrseas was done under Linux, but since it’s written in Python, hopefully it will encounter few problems on other OSs.

Comments and discussion welcome!

SQL Database Version Control – Summary

Update: Please visit the Schema VC page for updated information.

Before moving on to discussing how the Pyrseas version control tools deal with the database changes we’ve been using as examples, it seems appropriate to summarize the tools reviewed so far.

The following classifies the tools according to the main feature or mode of operation, and lists them alphabetically within each group. I’ve added some other tools that were not reviewed but were mentioned in comments, in PostgreSQL mailing lists, or elsewhere. Except where otherwise mentioned, the tools are open source software and support PostgreSQL.

Difference Engines

Andromeda

Summary: More than a difference engine, compares input data dictionary to information_schema views, and applies changes to upgrade the connected database.

Input format: YAML

Dependencies: PHP, Apache, PEAR, PL/Perl

Limitations: Data dictionary needs to be created and maintained manually. Limited support of PostgreSQL data types and other features.

apgdiff

Summary: Compares two SQL DDL input files, and outputs SQL statements to upgrade the first schema to match the second.

Input format: output of pg_dump -s

Dependencies: Java

Limitations: Does not currently support all PostgreSQL features.

Pyrseas

Summary: Compares an input data dictionary to the pg_* catalogs of a connected database, and outputs SQL statements to upgrade the latter to match the former. Also provides utility to output catalog information in YAML/JSON format.

Input format: YAML (JSON subset)

Dependencies: Python, psycopg2, LibYAML

Limitations: Does not currently support all PostgreSQL features.

Triangulum

Summary: Successor to Andromeda (see above) but with more focused objectives.

Input format: YAML

Dependencies: PHP, Spyc (YAML library)

Limitations: Data dictionary needs to be created and maintained manually. Limited support of PostgreSQL data types (but better than Andromeda) and other features.

Version Control of Database Changes

dbdeploy

Summary: Database refactoring tool similar to Liquibase (see below). Converts database changes specified in XML to SQL statements to upgrade a target database.

Input format: XML

Dependencies: Java, JDBC

Limitations: Primary support is for DBMSs other than PostgreSQL, so lowest common denominator coverage of PG features.

dbsteward

Summary: Announced in the pgsql-general mailing list. Expected to be available as open source by PGCon 2011. Similar to Liquibase (see below), but see this message for more details.

Input format: XML

Dependencies: Unknown

Limitations: Apparently, XML-based data dictionary must be maintained manually.

depesz Versioning

Summary: Mechanism to control database upgrades implemented as named sets of “patches.”

Input format: SQL scripts

Dependencies: Perl (to list patch dependencies)

Limitations: Scripts to rollback changes need to be created manually.

Liquibase

Summary: Database refactoring tool. Converts database changes specified in XML to SQL statements to upgrade a target database.

Input format: XML

Dependencies: Java, JDBC driver

Limitations: Primary support is for DBMSs other than PostgreSQL, so lowest common denominator coverage of PG features.

Database as VCS

Post Facto

Summary: Uses a number of PostgreSQL databases to establish a Subversion-like repository, so that schema changes to one database can be committed to the repository, and then the changes can be propagated from the repository to other databases.

Input format: none

Dependencies: Python, pyPgSQL

Limitations: Still in alpha status. Last PG version supported: 8.3.

Catalog Snapshots

POV

Summary: Allows you to save a snapshot of the PostgreSQL catalogs, make some schema changes, and possibly roll back the changes by restoring the snapshot.

Input format: none

Dependencies: Uses pgcrypto if available

Other

ChronicDB

Commercial product, supports Linux (Debian and RPM packages).

EMS DB Comparer for PostgreSQL

Commercial product, only available for Microsoft Windows platforms.

neXtep Designer

GPL licensed, IDE based. multiple DBMS support.

This summary has also been posted in a separate page in this blog, and I’ll update that from time to time. If anyone has any corrections or updates, please let me know in the comments.

SQL Database Version Control – Andromeda

In my first review of database VC tools, I mentioned Kenneth Down’s article where he recommended storing a single metadata file in a VCS and using a difference engine to do database upgrades. In a subsequent post, Kenneth expanded on this, providing a step-by-step description of how to implement such a tool. In fact, he had already gone further: he created an open source project named Andromeda that attempted to deliver on that vision.

Let’s see how our DBAs Carol and Dave can use Andromeda to implement the database changes specified in Version Control, Part 2: SQL Databases.

Version 0.1

Andromeda is much more than a version control tool. Its SourceForge page advertises it as a “database development system” and a “complete tool for building database-centric” applications. So, to start off, Carol, Dave and Manny will have to install Apache, PHP and other dependencies before installing Andromeda. Then they’ll be able to access it via web browser at a URL such as http://localhost/~manny/andro_root/andro. After an initialization step, which includes creating a PostgreSQL database named andro, Manny can create the “movies” application as described here.

In order to create the first version of the film table, Carol has to create the YAML spec file movies.dd.yaml as follows (this is the file that will be placed under version control):

group admin:
    description: Administrators
    module movies:
        permsel: Y
        permins: Y
        permupd: Y
        permdel: Y

group users:
    description:  Employees
    module movies:
        permsel: Y

module movies:
    description: Film Database
    uisort: 100

column id:
    description: Unique film ID
    type_id: int

column film_title:
    description: Film title
    type_id: vchar
    colprec: 32

column release_year:
    description: Release year
    type_id: int
    value_min: 1888

table film:
    description: Films
    module: movies
    uisort: 30
    spaceafter: Y

    column id:
        uisearch: Y
        primary_key: Y

    column film_title:
        uisearch: Y

    column release_year:

The group definitions are so that Andromeda can implement security constraints. The module definition is used to group tables logically, e.g., in the full Pagila database we may group tables into movies, customers and accounting. The column definitions precede the table definitions since the former can be used in multiple tables. The YAML spec is processed by selecting a “Build” step from Andromeda. If successful, it will create a “movies” database including the film table.

Note that Carol had to use film_title instead of title because the latter conflicts with an undocumented predefined column. Using title causes the following error:

ERROR >> Duplicate Column definition: 

ERROR >>    column_id => title

Notice also that she used value_min instead of valuemin for the release_year constraint. Andromeda documents the latter. However, no CHECK constraint is actually created and the trigger functions (see below) don’t implement a validation.

If you look at the created table, you may be surprised:

               Table "public.film"
    Column    |         Type          | Modifiers
--------------+-----------------------+-----------
 _agg         | character(1)          |
 skey_quiet   | character(1)          |
 film_title   | character varying(32) |
 skey         | integer               |
 release_year | integer               |
 id           | integer               |
Indexes:
    "film_idx_film_skey_idx" btree (skey)
    "film_pk" btree (id)
Triggers:
    film_ins_bef_r_t BEFORE INSERT ON film FOR EACH ROW EXECUTE PROCEDURE film_ins_bef_r_f()
    film_upd_bef_r_t BEFORE UPDATE ON film FOR EACH ROW EXECUTE PROCEDURE film_upd_bef_r_f()

As you can see, the order of the columns is not what you’d expect from the spec1 and three internal columns were added. Note that although there is an film_pk index, PostgreSQL doesn’t recognize it as the PRIMARY KEY, and all columns are nullable. The “magic” is all in the generated trigger functions.


1 Of course, according to the relational model, the order of the attributes is not significant, but in practice most people examining a relation expect some order.

Version 0.2

Carol and Dave implement the second set of changes by editing movies.dd.yaml. The combined edits are as follows:

@@ -30,4 +30,15 @@
 value_min: 1888

+column length:
+    description: Length in minutes
+    type_id: int
+    value_min: 1
+    value_max: 9999
+
+column rating:
+    description: Film rating
+    type_id: char
+    colprec: 5
+   
 table film:
     description: Films
@@ -44,2 +55,6 @@

 column release_year:
+
+    column length:
+
+    column rating:

The departure from the original is minor: length was defined as int instead of smallint because Andromeda only allows a limited set of types. And although Dave specified the CHECK constraint on length, it’s not functional.

Version 0.3

For the third iteration, Carol and Dave again edit movies.dd.yaml as shown below:

@@ -16,4 +16,44 @@
 uisort: 100

+column language_id:
+    description: Language ID
+    type_id: int
+
+table language:
+    description: Spoken language
+    module: movies
+    uisort: 10
+
+    column language_id:
+        primary_key: Y
+        uisearch: Y
+
+    column name:
+        description: Language name
+        uisearch: Y
+
+    column ts_upd:
+
+column category_id:
+    description: Film category ID
+    type_id: int
+
+table category:
+    description: Categories
+    module: movies
+    uisort: 20
+    spaceafter: Y
+
+    column category_id:
+        auto: sequence
+        primary_key: Y
+        uisearch: Y
+
+    column name:
+        description: Film category
+        uisearch: Y
+
+    column ts_upd:
+
 column id:
     description: Unique film ID
@@ -59,2 +99,17 @@

 column rating:
+
+    foreign_key language:
+
+table film_category:
+    description: Film categories
+    module: movies
+    uisort: 40
+
+    foreign_key film:
+        primary_key: Y
+
+    foreign_key category:
+        primary_key: Y
+
+    column ts_upd:

Notes:

  • The foreign_key specification names the table only. Generated column names match those of the referenced table, which may be confusing.
  • Each column of a PRIMARY KEY needs a primary_key: Y line.
  • The predefined column name was used instead of table-specific columns as in the original. The former creates varchar(100) columns (although the documentation says varchar(40)).
  • The predefined column ts_upd creates timestamp without time zone rather than with time zone as in the original. The latter is not supported by Andromeda.
  • The sequence category_seq_category_id is created, as a result of auto: sequence on category_id but is not linked directly to the category table (it’s used in the trigger functions).

Review

Andromeda is written in PHP (comments indicate it was earlier implemented in Java) and supports PostgreSQL databases. It also requires Apache, PEAR and PL/Perl.

Andromeda implements its difference engine by querying the information_schema views. It saves the data from the views and from the YAML spec into its own tables (it creates 105 tables in a schema named zdd and 77 tables in the public schema –not all of these are for version control or database automation tasks). It then uses SQL queries to determine what actions are needed for the upgrade. It applies the changes immediately.

As pointed out previously, Andromeda supports a limited set of data types. For example, booleans are only implemented as CHAR(1)’s that accept ‘Y’ or ‘N’. A follow-on project, Triangulum, appears to expand on data type support.

Andromeda can load data from CSV files and is able to retain data in an existing table whose structure is changing. However, it cannot deal with RENAMEs of columns or tables and it doesn’t seem to be able to DROP a column.

The last official release of Andromeda occurred in 2009. The guide for its successor, which made a third alpha release available in Feb. 2011, implies Andromeda will be superseded by Triangulum.

The concept of using a YAML specification for SQL database version control is very appealing (in fact, it’s what led to the Pyrseas project) because it disassociates SQL syntax from the definition and is not verbose like the XML used by other tools. Andromeda (and Triangulum) thus implement a useful mechanism for schema versioning.

SQL Database Version Control – Post Facto

In this series on SQL database version control solutions, so far we’ve seen a difference engine and some change control aids. The difference tool, apgdiff, suggests you need to store schema listings, i.e., the output from pg_dump -s, in your VCS repository. The change control tools, depesz Versioning and Liquibase (also dbdeploy), instead imply you keep the database change scripts (or XML specifications) in the repository. Of course, you may also want to store the generated SQL scripts in the first case, as well as the pg_dump -s output in the latter.

Today’s tool is probably unique. It’s a bit like saying: if the database will not come to version control, version control must go to the database. Post Facto builds a version control repository on top of a normal PostgreSQL database.

How can our DBAs Carol and Dave use Post Facto to implement the changes described in Version Control, Part 2: SQL Databases? Note: Unfortunately, the source code for Post Facto has not been updated for 15 months and it is still listed in alpha status. I do have the source, having retrieved it several months ago, but you may get errors checking it out if you use a recent Subversion client (e.g., svn 1.6.12). I briefly tried out Post Facto in the past, but this post may be less definitive than previous ones.

Version 0.1

For the first release, the manager Manny needs to create two databases: a postfacto database and a repodb database. The first name is fixed, but you can choose the second. Post Facto will create these databases at some point if it doesn’t find them, but by creating them ahead of time he can presumably assert some control over the process. Carol has to create her caroldev database and Manny can initialize the repository with the following command:

$ pf import caroldev postfacto://manny@localhost/repodb/movies/trunk
Committed revision 1.

Carol can then issue the CREATE TABLE film statement as shown in the original article, and commit her changes as follows:

$ pf add caroldev "*"
Added 5 objects:
  constraints: public.film
  indexes: public.film_pkey
  tables: public.film
  types: public._film, public.film
$ pf commit  --message "Add film table" caroldev
Committed revision 2.

Dave can check out version 0.1 as follows:

$ pf checkout postfacto:///repodb/movies/trunk davedev
Checked out revision 2.

Behind the scenes, Post Facto has created two additional databases, postfacto_w_caroldev and postfacto_w_davedev, the “working copies” for Carol and Dave, respectively. To install the release in production or in a QA environment, you seemingly need to checkout from the repodb as well, probably requiring remote access to the machine holding the repository.

Technical Note 1: You may need to change max_prepared_transactions, e.g., set it to 3, in postgresql.conf (requires restart) for the checkout to work. Technical Note 2: the pf commands have to be issued under a superuser role (or someone with SELECT permission on pg_authid).

Version 0.2

The next step would be for Dave to issue the ALTER TABLE statement to add the length column, as shown previously. Dave would then add and commit his changes with the following:

$ pf add davedev "*"
Added 0 objects:
$ pf commit -m "Add length column" davedev
Committed revision 3.

Similarly, Carol can issue her ALTER TABLE statement to add the rating column and commit her changes using a similar set of commands. The pf log command shows what transpired so far:

$ pf log caroldev
Rev     Who        Created             Message
4       carol      2011-03-07 10:31:55 Add rating
3       dave       2011-03-07 10:22:42 Add length column
2       carol      2011-03-07 09:23:41 Add film table
1       manny      2011-03-07 09:19:58 Initial import.

If Manny were to check out postfacto:///repodb/movies/trunk into a new database, say for QA purposes, he would see the table with the two new columns. However, I have not been able to merge or update the changes so that Carol and Dave can see the merged table. The command:

$ pf merge postfacto:///repodb/movies/trunk@4 davedev
Merging postfacto:///repodb/movies/trunk@4 into davedev... done.

says it has merged the change at revision 4 into davedev but psql shows that is not the case. The command pf update, which takes a database name and an optional revision number, fails with a Python error: ValueError: too many values to unpack (which I have not investigated further). It appears the only alternative is to use pf checkout --force to force each database to be recreated with the complete new schema.

Version 0.3

The process for subsequent releases is essentially the same. Each developer or DBA makes changes to his or her database and commits them to the repository. Ideally, changes would then be merged so they could continue working with an up-to-date schema.

Review

Post Facto was developed by Robert Brewer while he was at Etsy. He gave a presentation about it at PGCon 2009. In the first 15 minutes of the audio he gives a very useful summary of the typical problems associated with propagating changes to databases and some common solutions.

Post Facto is written in Python (requires 2.3 or higher, but has not been ported to 3.x) and uses pyPgSQL to access PostgreSQL (up to version 8.3). Since it works directly with the catalogs, AFAICT it supports nearly all PostgreSQL features, up to the supported release. As can be seen above, its user inteface was designed to resemble the centralized VCS Subversion.

Regrettably, as noted earlier, Post Facto appears to be an inactive project. Although pyPgSQL is still available its last release was made in June 2006. Aside from the PgCon presentation, a brief description in the project’s home page, and pf‘s usage message its documentation is non-existent.

Post Facto’s concept of using the database as the VC repository is novel. However, it may be argued that doing so separates the database “code,” i.e., the logical structures, from the related application code which still resides in a conventional VCS.

SQL Database Version Control – Liquibase

Scott Ambler is a prolific author in the area of agile software and agile database development. He and Pramod Sadalage wrote Refactoring Databases: Evolutionary Database Design, a book that expounds the process and techniques for making changes to databases. Caveat for followers of C.J. Date: Chris does not hold Mr. Ambler’s writings in high regard (see this, for example). Sadalage, a ThoughtWorks consultant, has a web site that lists and provides examples of the various refactorings discussed in the book.

The book inspired at least two products for database version control. One, dbdeploy, stemmed from the work of some ThoughtWorks UK staff members. Liquibase, the other product/project, was started by Nathan Voxland after reading the book and noticing a lack of tools to address the problem.

Both dbdeploy and Liquibase are open source and both use XML for their change specification files. Liquibase uses Java, whereas dbdeploy has been ported to Java, .Net and PHP (Liquibase is working on a .Net port). According to its documentation, dbdeploy supports Oracle, SQL Server, Sybase, Hypersonic SQL and MySQL, but the release notes for 3.0M2 indicate support for PostgreSQL and DB2 has been added. Liquibase supports all those and more.

Today we’ll once again follow our favorite DBAs, Carol and Dave, as they use Liquibase to implement the changes described in Version Control, Part 2: SQL Databases.

Version 0.1

For the first release, Carol creates the changelog file film.xml, as follows:

<?xml version="1.0" encoding="UTF-8"?>

<databaseChangeLog
  xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
         http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-2.0.xsd">
    <changeSet id="1" author="carol">
        <createTable tableName="film">
            <column name="id" type="int">
                <constraints primaryKey="true" nullable="false"/>
            </column>
            <column name="title" type="varchar(32)">
                <constraints nullable="false"/>
            </column>
            <column name="release_year" type="int">
            </column>
        </createTable>
    </changeSet>
</databaseChangeLog>

Obviously, there’s very little that could be identified as SQL in the changelog. If you’re paying attention, you’ll notice the CHECK constraint on release_year is missing. Although Sadalage lists “Introduce Column Constraint” as one of the refactorings, Liquibase 2.0 doesn’t support that.

After creating film.xml, Carol can validate it, generate SQL from it, or apply it to her database. For example, to actually create the table she issues a command such as the following:

$ liquibase --driver=org.postgresql.Driver \
      --classpath=/usr/share/java/postgresql.jar \
      --url="jdbc:postgresql://localhost/caroldev"
      --changeLogFile=film.xml --username=carol --password=pswd \
      update

The liquibase command is a shell script (or Windows .bat file) that invokes java -jar liquibase.jar. The options can be placed in a file named liquibase.properties for convenience.

Liquibase has the ability to back out several types of changes automatically and createTable is one of them. For example, the following sequence creates a tag for possible future rollback, applies the film.xml changes, and rolls back the changes, i.e., DROPs the film table:

$ liquibase tag rel0
$ liquibase --changeLogFile=film.xml update
$ liquibase rollback rel0

The databasechangelog table records the Liquibase activities:

moviesdb=> select id, author, filename, orderexecuted, tag
moviesdb-> from databasechangelog;
      id       |  author   |      filename      | orderexecuted | tag
---------------+-----------+--------------------+---------------+------
 1298957853145 | liquibase | liquibase-internal |             1 | rel0
 1             | carol     | film.xml           |             2 |
(2 rows)

Version 0.2

Next, Dave creates film-length.xml and Carol creates film-rating.xml. Below are the changeSet portions of those files:

    <changeSet id="1" author="dave">
      <addColumn tableName="film">
	<column name="length" type="smallint"/>
      </addColumn>
    </changeSet>
    <changeSet id="2" author="carol">
      <addColumn tableName="film">
	<column name="rating" type="char(5)"/>
      </addColumn>
    </changeSet>

Not much more to say except that we were again unable to specify the CHECK constraint (on the length column).

Version 0.3

To implement the changes for version 0.3, Carol and Dave create film-category.xml and film-language.xml. Here are the respective changeSets:

<changeSet id="2" author="dave">
    <createTable tableName="language">
        <column name="language_id" type="int">
            <constraints primaryKey="true" nullable="false"/>
        </column>
        <column name="name" type="varchar(20)">
            <constraints nullable="false"/>
        </column>
        <column name="last_update" type="timestamp with time zone">
            <constraints nullable="false"/>
        </column>
    </createTable>
    <addColumn tableName="film">
      <column name="language_id" type="int">
        <constraints nullable="false"/>
        </column>
    </addColumn>
    <addForeignKeyConstraint constraintName="fk_film_language"
        baseTableName="film" baseColumnNames="language_id"
        referencedTableName="language" referencedColumnNames="language_id"
        />
</changeSet>
<changeSet id="3" author="carol">
    <createTable tableName="category">
        <column name="category_id" type="int" autoIncrement="true">
            <constraints primaryKey="true" nullable="false"/>
        </column>
        <column name="name" type="varchar(25)">
            <constraints nullable="false"/>
        </column>
        <column name="last_update" type="timestamp with time zone">
            <constraints nullable="false"/>
        </column>
    </createTable>
    <createTable tableName="film_category">
        <column name="film_id" type="int">
            <constraints primaryKey="true" nullable="false"/>
        </column>
        <column name="category_id" type="int">
            <constraints primaryKey="true" nullable="false"/>
        </column>
        <column name="last_update" type="timestamp with time zone">
            <constraints nullable="false"/>
        </column>
    </createTable>
        <addForeignKeyConstraint constraintName="fk_film_category"
            baseTableName="film_category" baseColumnNames="film_id"
            referencedTableName="film" referencedColumnNames="id"
            />
        <addForeignKeyConstraint constraintName="fk_category_film"
            baseTableName="film_category" baseColumnNames="category_id"
            referencedTableName="category" referencedColumnNames="category_id"
            />
</changeSet>

There may be a way to specify foreign key constraints when creating a table, or adding a column to a table, but the documentation for the “references” attribute of the column Constraints tag is not very helpful.

Review

Liquibase is installed by extracting liquibase.jar from a .zip or .tar.gz archive. It requires Java 1.5 or newer. To use it with PostgreSQL you also need to install the PostgreSQL JDBC Driver, available as the libpg-java package on Debian and Ubuntu.

In addition to the command line script, Liquibase can be invoked from the Apache Ant build tool, as well as various Java tools such as Maven and Grails. Although training videos indicate there are plugins for both Eclipse and IntelliJ IDEA, the former appears to be still under development. With the IDE plugin, instead of editing the XML changelog files, you can instead complete it by selecting from context menus and other aids.

Liquibase supports over 40 refactorings. It does not support PostgreSQL-specific features such as TYPEs, DOMAINs and AGGREGATEs. Its validate command verifies the XML syntax but delegates data type validation to the DBMS. So it would be possible to CREATE TYPE mpaa_rating AS ENUM and CREATE DOMAIN year with psql (i.e., without version control) and then use those types in a Liquibase changeSet.

One of the strengths of Liquibase is its semantic knowledge. Unlike a schema diff tool, it knows a table or column is being renamed and not dropped with a new one taking its place, because the developer has to make that explicit. On the other hand, the verbosity of XML tends to detract from its appeal, at least for those of us who prefer a concise or even terse approach.