SQL Database Version Control – Post Facto

In this series on SQL database version control solutions, so far we’ve seen a difference engine and some change control aids. The difference tool, apgdiff, suggests you need to store schema listings, i.e., the output from pg_dump -s, in your VCS repository. The change control tools, depesz Versioning and Liquibase (also dbdeploy), instead imply you keep the database change scripts (or XML specifications) in the repository. Of course, you may also want to store the generated SQL scripts in the first case, as well as the pg_dump -s output in the latter.

Today’s tool is probably unique. It’s a bit like saying: if the database will not come to version control, version control must go to the database. Post Facto builds a version control repository on top of a normal PostgreSQL database.

How can our DBAs Carol and Dave use Post Facto to implement the changes described in Version Control, Part 2: SQL Databases? Note: Unfortunately, the source code for Post Facto has not been updated for 15 months and it is still listed in alpha status. I do have the source, having retrieved it several months ago, but you may get errors checking it out if you use a recent Subversion client (e.g., svn 1.6.12). I briefly tried out Post Facto in the past, but this post may be less definitive than previous ones.

Version 0.1

For the first release, the manager Manny needs to create two databases: a postfacto database and a repodb database. The first name is fixed, but you can choose the second. Post Facto will create these databases at some point if it doesn’t find them, but by creating them ahead of time he can presumably assert some control over the process. Carol has to create her caroldev database and Manny can initialize the repository with the following command:

$ pf import caroldev postfacto://manny@localhost/repodb/movies/trunk
Committed revision 1.

Carol can then issue the CREATE TABLE film statement as shown in the original article, and commit her changes as follows:

$ pf add caroldev "*"
Added 5 objects:
  constraints: public.film
  indexes: public.film_pkey
  tables: public.film
  types: public._film, public.film
$ pf commit  --message "Add film table" caroldev
Committed revision 2.

Dave can check out version 0.1 as follows:

$ pf checkout postfacto:///repodb/movies/trunk davedev
Checked out revision 2.

Behind the scenes, Post Facto has created two additional databases, postfacto_w_caroldev and postfacto_w_davedev, the “working copies” for Carol and Dave, respectively. To install the release in production or in a QA environment, you seemingly need to checkout from the repodb as well, probably requiring remote access to the machine holding the repository.

Technical Note 1: You may need to change max_prepared_transactions, e.g., set it to 3, in postgresql.conf (requires restart) for the checkout to work. Technical Note 2: the pf commands have to be issued under a superuser role (or someone with SELECT permission on pg_authid).

Version 0.2

The next step would be for Dave to issue the ALTER TABLE statement to add the length column, as shown previously. Dave would then add and commit his changes with the following:

$ pf add davedev "*"
Added 0 objects:
$ pf commit -m "Add length column" davedev
Committed revision 3.

Similarly, Carol can issue her ALTER TABLE statement to add the rating column and commit her changes using a similar set of commands. The pf log command shows what transpired so far:

$ pf log caroldev
Rev     Who        Created             Message
4       carol      2011-03-07 10:31:55 Add rating
3       dave       2011-03-07 10:22:42 Add length column
2       carol      2011-03-07 09:23:41 Add film table
1       manny      2011-03-07 09:19:58 Initial import.

If Manny were to check out postfacto:///repodb/movies/trunk into a new database, say for QA purposes, he would see the table with the two new columns. However, I have not been able to merge or update the changes so that Carol and Dave can see the merged table. The command:

$ pf merge postfacto:///repodb/movies/trunk@4 davedev
Merging postfacto:///repodb/movies/trunk@4 into davedev... done.

says it has merged the change at revision 4 into davedev but psql shows that is not the case. The command pf update, which takes a database name and an optional revision number, fails with a Python error: ValueError: too many values to unpack (which I have not investigated further). It appears the only alternative is to use pf checkout --force to force each database to be recreated with the complete new schema.

Version 0.3

The process for subsequent releases is essentially the same. Each developer or DBA makes changes to his or her database and commits them to the repository. Ideally, changes would then be merged so they could continue working with an up-to-date schema.

Review

Post Facto was developed by Robert Brewer while he was at Etsy. He gave a presentation about it at PGCon 2009. In the first 15 minutes of the audio he gives a very useful summary of the typical problems associated with propagating changes to databases and some common solutions.

Post Facto is written in Python (requires 2.3 or higher, but has not been ported to 3.x) and uses pyPgSQL to access PostgreSQL (up to version 8.3). Since it works directly with the catalogs, AFAICT it supports nearly all PostgreSQL features, up to the supported release. As can be seen above, its user inteface was designed to resemble the centralized VCS Subversion.

Regrettably, as noted earlier, Post Facto appears to be an inactive project. Although pyPgSQL is still available its last release was made in June 2006. Aside from the PgCon presentation, a brief description in the project’s home page, and pf‘s usage message its documentation is non-existent.

Post Facto’s concept of using the database as the VC repository is novel. However, it may be argued that doing so separates the database “code,” i.e., the logical structures, from the related application code which still resides in a conventional VCS.

4 thoughts on “SQL Database Version Control – Post Facto”

  1. Thanks for the notice, Joe! Yes, Postfacto has languished a bit. I still think the central concept of “checkout gets you a database instead of files” is a very sound one. There’s a good bit more work to do to make it totally viable. I just haven’t had time on my own to ram it through a company or two to increase adoption and get some more developers on board. [I have a hard enough time keeping CherryPy up to date!] I think it has even more potential now that PG 9 brings replication support–one of the big showstoppers of the Postfacto design was how to deal with Slony scripts. So if any of your hardworking, intelligent readers would like to help take up the charge, I’d love to hear from them. :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s