Tag Archives: apgdiff

SQL Database Version Control – apgdiff

In “Database Development: Table Structure Changes,” Kenneth Downs argued that build scripts to modify table structures “are usually the least tested part of the system.” He suggested using a single metadata file that describes the database and can be placed under version control, and a difference engine as the tool of choice to perform database upgrades.

Another PostgreSQL Diff Tool (aka apgdiff) is a utility that implements such an engine.  For its metadata, apgdiff takes two pg_dump -s output files, representing previous and new database schemas, and generates ALTER TABLE and other SQL statements needed to upgrade from previous to new.

In this installment of our series on using VCS tools while developing a SQL-based application, we’ll see how Carol and Dave can use apgdiff to control the changes described in Version Control, Part 2: SQL Databases.

Version 0.1

To start off, Carol creates the film table in her database, caroldev, using the CREATE TABLE script, version 0.1, which we saw in the previous article. She also creates a general development database, moviesdev, and issues the following commands:

$ pg_dump -s moviesdev > film0.sql
$ pg_dump -s caroldev > film.sql

The film0.sql file is essentially empty, consisting only of some SET, REVOKE and GRANT statements. The film.sql file includes statements to CREATE and ALTER the film table, as can be seen in the following diff:

$ diff -u film0.sql film.sql
--- film0.sql
+++ film.sql
@@ -9,6 +9,34 @@
 SET client_min_messages = warning;
 SET escape_string_warning = off;

+SET search_path = public, pg_catalog;
+
+SET default_tablespace = '';
+
+SET default_with_oids = false;
+
+--
+-- Name: film; Type: TABLE; Schema: public; Owner: carol; Tablespace:
+--
+
+CREATE TABLE film (
+    id integer NOT NULL,
+    title character varying(32) NOT NULL,
+    release_year integer NOT NULL,
+    CONSTRAINT film_release_year_check CHECK ((release_year >= 1888))
+);
+
+
+ALTER TABLE public.film OWNER TO carol;
+
+--
+-- Name: film_pkey; Type: CONSTRAINT; Schema: public; Owner: carol; Tablespace:
+--
+
+ALTER TABLE ONLY film
+    ADD CONSTRAINT film_pkey PRIMARY KEY (id);
+
+
 --
 -- Name: public; Type: ACL; Schema: -; Owner: postgres
 --

Then Carol can use apgdiff to generate the SQL statements needed to create the table in the development and production databases, e.g.,

$ apgdiff film0.sql film.sql
CREATE TABLE film (
 id integer NOT NULL,
 title character varying(32) NOT NULL,
 release_year integer NOT NULL
);

ALTER TABLE film
 ADD CONSTRAINT film_pkey PRIMARY KEY (id);

ALTER TABLE film
 ADD CONSTRAINT film_release_year_check CHECK ((release_year >= 1888));

In this initial phase, the diff and apgdiff outputs are not that … different. The value of apgdiff will be more obvious in subsequent versions.

As part of the 0.1 release process, Carol commits film.sql to the VCS (she could also commit film0.sql as the film.sql base version, if desired). She may also want to store the apgdiff output in the VCS, since the latter is what will actually be used to upgrade the production database(s).

Version 0.2

For the next version, Dave clones the VCS repository and uses the film.sql script or the saved apgdiff output to create the film table in his own database. He then issues an ALTER TABLE to add the length column, invokes pg_dump -s to create a new film.sql and commits that to his repository. Carol uses another ALTER TABLE to add the rating column in her database, and stores the new pg_dump output to her repository.

Manny, the integration manager, merges the new film.sql versions into the project-wide repository (he has to deal with table ownership statements but perhaps the pg_dump outputs can be stripped of such statements prior to committing them to the VCS). Manny then uses apgdiff to produce the statements needed to upgrade to version 0.2:

$ git diff 0.1 0.2
diff --git a/film.sql b/film.sql
index 799b418..db820bd 100644
--- a/film.sql
+++ b/film.sql
@@ -23,6 +23,9 @@ CREATE TABLE film (
 id integer NOT NULL,
 title character varying(32) NOT NULL,
 release_year integer NOT NULL,
+    length smallint,
+    rating character(5),
+    CONSTRAINT film_length_check CHECK (((length > 0) AND (length < 10000))),
 CONSTRAINT film_release_year_check CHECK ((release_year >= 1888))
 );

$ git checkout 0.1
$ cp film.sql film.sql-0.1
$ git checkout 0.2
$ apgdiff film.sql-0.1 film.sql

ALTER TABLE film
 ADD COLUMN length smallint,
 ADD COLUMN rating character(5);

ALTER TABLE film
 ADD CONSTRAINT film_length_check CHECK (((length > 0) AND (length < 10000)));

Version 0.3

For the next round, Carol and Dave issue the statements shown under Version 0.3 in the previous article and commit newer versions of film.sql which Manny then integrates into the project repository. The apgdiff output comparing the two versions is as follows:

$ apgdiff film.sql-0.2 film.sql

CREATE SEQUENCE category_category_id_seq
 START WITH 1
 INCREMENT BY 1
 NO MAXVALUE
 NO MINVALUE
 CACHE 1;

CREATE TABLE category (
 category_id integer DEFAULT nextval('category_category_id_seq'::regclass) NOT NULL,
 name character varying(25) NOT NULL,
 last_update timestamp with time zone NOT NULL
);

CREATE TABLE film_category (
 film_id integer NOT NULL,
 category_id integer NOT NULL,
 last_update timestamp with time zone NOT NULL
);

CREATE TABLE "language" (
 language_id integer NOT NULL,
 name character varying(20) NOT NULL,
 last_update timestamp with time zone NOT NULL
);

ALTER TABLE film
 ADD COLUMN language_id integer NOT NULL;

ALTER TABLE category
 ADD CONSTRAINT category_pkey PRIMARY KEY (category_id);

ALTER TABLE film_category
 ADD CONSTRAINT film_category_pkey PRIMARY KEY (film_id, category_id);

ALTER TABLE "language"
 ADD CONSTRAINT language_pkey PRIMARY KEY (language_id);

ALTER TABLE film
 ADD CONSTRAINT film_language_id_fkey FOREIGN KEY (language_id) REFERENCES language(language_id);

ALTER TABLE film_category
 ADD CONSTRAINT film_category_category_id_fkey FOREIGN KEY (category_id) REFERENCES category(category_id);

ALTER TABLE film_category
 ADD CONSTRAINT film_category_film_id_fkey FOREIGN KEY (film_id) REFERENCES film(id);

You may notice a small discrepancy when running the above (tested with PostgreSQL 8.4.5 and apgdiff 2.3) to upgrade a second database and then compare the output from the pg_dump -s commands. In the original database, the category_category_id_seq sequence is owned by category.category_id, but this is missing from the upgraded database.

Review

Apgdiff appears to be a useful tool to manage changes to PostgreSQL databases in conjunction with a VCS. It does require that you run pg_dump -s and that you store its output in the VCS. However, apgdiff could be used against a set of manually-maintained SQL scripts.

Apgdiff requires Java 1.6 and Debian, Ubuntu and Gentoo have native apgdiff packages. It supports several (but not all) PostgreSQL features. For example, it omits the TYPEs, DOMAINs and AGGREGATEs used in the Pagila sample database (it was able to recreate 14 out of 21 Pagila tables). It depends on parsing SQL DDL syntax —albeit not fully, so it is susceptible to syntax changes between PostgreSQL releases. It cannot deal with RENAMEs, e.g., if the rating column is renamed to mpaa_rating, apgdiff will drop the first and add the second—not ideal in a populated database.

Apgdiff does not operate on the actual database being upgraded but merely on textual representations. On the plus side, it can be used to generate backout scripts, e.g., by diffing film.sql version 0.3 vs. 0.2 you can generate statements to undo the changes from 0.2 to 0.3. The next tool we’ll review is particularly concerned with this aspect of the upgrade process.