In this series on SQL database version control solutions, so far we’ve seen a difference engine and some change control aids. The difference tool, apgdiff, suggests you need to store schema listings, i.e., the output from pg_dump -s, in your VCS repository. The change control tools, depesz Versioning and Liquibase (also dbdeploy), instead imply you keep the database change scripts (or XML specifications) in the repository. Of course, you may also want to store the generated SQL scripts in the first case, as well as the pg_dump -s output in the latter.
Today’s tool is probably unique. It’s a bit like saying: if the database will not come to version control, version control must go to the database. Post Facto builds a version control repository on top of a normal PostgreSQL database.
How can our DBAs Carol and Dave use Post Facto to implement the changes described in Version Control, Part 2: SQL Databases? Note: Unfortunately, the source code for Post Facto has not been updated for 15 months and it is still listed in alpha status. I do have the source, having retrieved it several months ago, but you may get errors checking it out if you use a recent Subversion client (e.g., svn 1.6.12). I briefly tried out Post Facto in the past, but this post may be less definitive than previous ones.
For the first release, the manager Manny needs to create two databases: a postfacto database and a repodb database. The first name is fixed, but you can choose the second. Post Facto will create these databases at some point if it doesn’t find them, but by creating them ahead of time he can presumably assert some control over the process. Carol has to create her caroldev database and Manny can initialize the repository with the following command:
$ pf import caroldev postfacto://manny@localhost/repodb/movies/trunk Committed revision 1.
Carol can then issue the CREATE TABLE film statement as shown in the original article, and commit her changes as follows:
$ pf add caroldev "*" Added 5 objects: constraints: public.film indexes: public.film_pkey tables: public.film types: public._film, public.film $ pf commit --message "Add film table" caroldev Committed revision 2.
Dave can check out version 0.1 as follows:
$ pf checkout postfacto:///repodb/movies/trunk davedev Checked out revision 2.
Behind the scenes, Post Facto has created two additional databases, postfacto_w_caroldev and postfacto_w_davedev, the “working copies” for Carol and Dave, respectively. To install the release in production or in a QA environment, you seemingly need to checkout from the repodb as well, probably requiring remote access to the machine holding the repository.
Technical Note 1: You may need to change max_prepared_transactions, e.g., set it to 3, in postgresql.conf (requires restart) for the checkout to work. Technical Note 2: the pf commands have to be issued under a superuser role (or someone with SELECT permission on pg_authid).
The next step would be for Dave to issue the ALTER TABLE statement to add the length column, as shown previously. Dave would then add and commit his changes with the following:
$ pf add davedev "*" Added 0 objects: $ pf commit -m "Add length column" davedev Committed revision 3.
Similarly, Carol can issue her ALTER TABLE statement to add the rating column and commit her changes using a similar set of commands. The pf log command shows what transpired so far:
$ pf log caroldev Rev Who Created Message 4 carol 2011-03-07 10:31:55 Add rating 3 dave 2011-03-07 10:22:42 Add length column 2 carol 2011-03-07 09:23:41 Add film table 1 manny 2011-03-07 09:19:58 Initial import.
If Manny were to check out postfacto:///repodb/movies/trunk into a new database, say for QA purposes, he would see the table with the two new columns. However, I have not been able to merge or update the changes so that Carol and Dave can see the merged table. The command:
$ pf merge postfacto:///repodb/movies/trunk@4 davedev Merging postfacto:///repodb/movies/trunk@4 into davedev... done.
says it has merged the change at revision 4 into davedev but psql shows that is not the case. The command pf update, which takes a database name and an optional revision number, fails with a Python error: ValueError: too many values to unpack (which I have not investigated further). It appears the only alternative is to use pf checkout --force to force each database to be recreated with the complete new schema.
The process for subsequent releases is essentially the same. Each developer or DBA makes changes to his or her database and commits them to the repository. Ideally, changes would then be merged so they could continue working with an up-to-date schema.
Post Facto was developed by Robert Brewer while he was at Etsy. He gave a presentation about it at PGCon 2009. In the first 15 minutes of the audio he gives a very useful summary of the typical problems associated with propagating changes to databases and some common solutions.
Post Facto is written in Python (requires 2.3 or higher, but has not been ported to 3.x) and uses pyPgSQL to access PostgreSQL (up to version 8.3). Since it works directly with the catalogs, AFAICT it supports nearly all PostgreSQL features, up to the supported release. As can be seen above, its user inteface was designed to resemble the centralized VCS Subversion.
Regrettably, as noted earlier, Post Facto appears to be an inactive project. Although pyPgSQL is still available its last release was made in June 2006. Aside from the PgCon presentation, a brief description in the project’s home page, and pf‘s usage message its documentation is non-existent.
Post Facto’s concept of using the database as the VC repository is novel. However, it may be argued that doing so separates the database “code,” i.e., the logical structures, from the related application code which still resides in a conventional VCS.