Tag Archives: version control

Audit Trails

Back in February, I defined the film table version 0.1 with just three columns:

CREATE TABLE film (
    id INTEGER NOT NULL PRIMARY KEY,
    title VARCHAR(32) NOT NULL,
    release_year INTEGER NOT NULL CHECK (release_year >= 1888)
);

It was unclear then how far I’d go in supporting all of PostgreSQL data definition features. Now, Pyrseas is making progress towards covering all those capabilities.

Consequently, I’d like to add one column to the above and tag it as version 0.1.1. This is the column to be added (from the Pagila sample database):

   last_update timestamp with time zone NOT NULL DEFAULT now()

With this new column, I’ll be able to explore another kind of field in the database user interface: one that will not be present on new records (because the DEFAULT will take care of filling in that data), but will be displayed, in readonly mode, for updates. Note: In Pagila, last_update is defined as timestamp without time zone, but that seems too ambiguous for my taste.

To complete this, we need an ON UPDATE trigger together with a function to be called by the trigger. These are also provided by Pagila, but to highlight some of the capabilities of Pyrseas, I’ll first show the output of dbtoyaml against a database with all the necessary objects:

language plpgsql:
  trusted: true
schema public:
  description: standard public schema
  function last_updated():
    language: plpgsql
    returns: trigger
    source: "\nBEGIN\n    NEW.last_update = CURRENT_TIMESTAMP;\n    RETURN NEW;\n\
      END "
  table film:
    check_constraints:
      film_release_year_check:
        columns:
        - release_year
        expression: (release_year >= 1888)
    columns:
    - id:
        not_null: true
        type: integer
    - title:
        not_null: true
        type: character varying(32)
    - release_year:
        not_null: true
        type: integer
    - last_update:
        default: now()
        not_null: true
        type: timestamp with time zone
    primary_key:
      film_pkey:
        access_method: btree
        columns:
        - id
    triggers:
      last_updated:
        events:
        - update
        level: row
        procedure: last_updated()
        timing: before

If you feed this YAML to yamltodb against a database that already has the film table version 0.1, it generates the following (against a PostgreSQL 8.4 DBMS—vs. 9.0 the creation of PL/pgSQL is bypassed):

BEGIN;
CREATE LANGUAGE plpgsql;
SET check_function_bodies = false;
CREATE FUNCTION last_updated() RETURNS trigger
    LANGUAGE plpgsql
    AS $_$
BEGIN
    NEW.last_update = CURRENT_TIMESTAMP;
    RETURN NEW;
END $_$;
ALTER TABLE film
    ADD COLUMN last_update timestamp with time zone NOT NULL DEFAULT now();
CREATE TRIGGER last_updated
    BEFORE UPDATE ON film
    FOR EACH ROW
    EXECUTE PROCEDURE last_updated();
COMMIT;

Thus we can successfully upgrade the version 0.1 database to version 0.1.1, with our audit trail column.

Happy trails to you, ’till we meet again …

PostgreSQL Version Control Feature Coverage

In a comment to one of my early posts about version control, Peter Eisentraut stated he was “somewhat discouraged because [seeing how fast PostgreSQL develops and adds new features,] I don’t see how a tool like apgdiff can keep up and stay useful unless a lot more resources are put into it.” At the time, the first release of Pyrseas (0.1.0) was well over a month away. Currently, Pyrseas 0.3.0 is nearing release . This seems like a good time to summarize where Pyrseas stands vis-à-vis the list of PostgreSQL data definition features:

DDL Feature 0.1 0.2 0.3
AGGREGATE  X
CAST
CONSTRAINT TRIGGER
CONVERSION
DOMAIN  X
FOREIGN DATA WRAPPER
FUNCTION  X  X
GROUP
INDEX  X  X  X
LANGUAGE  X  X
OPERATOR
OPERATOR CLASS
OPERATOR FAMILY
ROLE
RULE X
SCHEMA X X X
SEQUENCE X X X
SERVER
TABLE  X  X  X
 - CHECK CONSTRAINT  X  X  X
 - PRIMARY KEY  X  X  X
 - FOREIGN KEY  X  X  X
 - UNIQUE CONSTRAINT  X  X  X
 - INHERIT  X  X
TABLESPACE
TEXT SEARCH CONFIGURATION
TEXT SEARCH DICTIONARY
TEXT SEARCH PARSER
TEXT SEARCH TEMPLATE
TRIGGER  X
TYPE  X
USER
USER MAPPING
VIEW  X  X
COMMENT  X  X

Note that PosgreSQL 9.1 will add EXTENSION to the list above. Note also that although the links in the matrix go to the corresponding CREATE object documentation page, Pyrseas supports CREATE, CREATE OR REPLACE, DROP and ALTER, as applicable. A final note: an “X” in a particular release doesn’t imply that the feature is supported in all its syntactic glory.

Caveats aside, progress has been made. Pyrseas 0.1.0 supported most of the basic relational database objects and concepts present in the SQL Standard, and usually covered by generic version control tools. Pyrseas 0.3.0 will support the features showcased in the Pagila sample database, which I hope is somewhat representative of the average PostgreSQL database (in terms of DDL features, of course).

The goal for the 0.4.0 release will be to cover the remaining features excluding Text Search, SQL/MED and ROLE/GROUP, which will be tackled in 0.5.0 or later. Is there a feature you or your organization is interested in? Let me know in the comments.

Gearing up for a second release

The first release of Pyrseas was aimed at addressing the essential version control issues described in my first posts. The second release, somewhat accidentally, has focused on being able to recreate the PostgreSQL autodoc regression database. In addition, Pyrseas will be released on the PostgreSQL Extension Network (PGXN).

Pyrseas already supports the basic schema, table, column, primary key, foreign key and index CREATE and ALTER operations provided by PostgreSQL (and every relational DBMS worthy of that designation). Here is a summary of PostgreSQL features that will be supported in the upcoming release:

  • COMMENTs on schemas, tables, columns and functions.
  • FOREIGN KEY ON UPDATE and ON DELETE actions.
  • ALTER TABLE RENAME COLUMN and enhanced support for other ALTER object RENAME statements.
  • VIEWs
  • INHERITed tables, and by extension, partitioned tables.
  • PROCEDURAL LANGUAGEs
  • FUNCTIONs

Support for some of these features is still elementary. However, the Pyrseas utilities are now able to recreate the autodoc database and revert back to an empty database, i.e., they also know how to drop each object in the correct order.

The Pyrseas unit tests have also been enhanced so they can be run against different PostgreSQL installations. For example, using the PYRSEAS_TEST_PORT environment variable I was able to run the tests against the 9.1 Beta 1 release, as well as 9.0 and 8.4.

Looking forward, the subsequent release will tackle being able to recreate the Pagila sample database, thus adding support for TYPEs, DOMAINs, AGGREGATEs, TRIGGERs and RULEs, and improving support for existing features.

Update: Pyrseas 0.2.0 has been released and is now available via PyPI, PGXN, PgFoundry and from the GitHub repository.

SQL Version Control Implementation Choices II

In the previous post, I discussed two alternative implementations for the yamltodb difference engine: SQL comparisons using extra dictionary tables, and comparing the YAML/JSON maps (Python dicts). Today I’ll review the last option: internal structure comparisons.

Internal Structures – Take 1

The original implementation for yamltodb was based on the implementation of dbtoyaml. Since the YAML maps output by the latter were hierarchical, the internal structures were also organized in layers: a Python class instance (Database) to represent the database, having a dict of classes representing the schemas, each in turn with a dict of “schema objects,” i.e., tables, sequences, etc. Each layer was populated by queries against the corresponding PostgreSQL catalogs.

This organization was also used for the difference engine. A second Database instance represented the YAML input. The two hierarchies were then traversed and compared at each layer, outputting SQL as differences were found.

This approach served to flesh out the SQL generation code and to deal with various feature tests, e.g., creation of a schema, creation of a table, adding columns to a table, and even RENAMEing schemas and tables. However, it couldn’t easily handle FOREIGN KEYs and SEQUENCEs owned by tables (resulting from a SERIAL data type).

Internal Structures – Take 2

To tackle the various dependencies, I refactored the code to use structures somewhat along the lines of Post Facto. The Database class now holds objects derived from class DbObjectDict which corresponds roughly to a single PostgreSQL catalog. Thus, SchemaDict is nearly equivalent to pg_namespace, ClassDict to pg_class and ConstraintDict to pg_constraint. Each one of those classes is also a Python dict and holds specialized objects, e.g., ClassDict holds objects (derived from DbClass) of class Sequence, Table or View (for the time being, class Index is handled separately).

Each XxxDict class is hashed by a Python tuple (an immutable sequence), paralleling the underlying catalog UNIQUE keys. After the dicts are populated, either from the catalogs or from the YAML map, methods are called to link related objects, e.g., columns to tables, foreign keys to the primary key tables, etc.

Note that Pyrseas implements dual sets of classes: collections (dicts) of objects and classes for individual object types. In constrast Post Facto has a single set of objects and uses a Python classmethod to fetch the catalog information. I found the dual sets more convenient to deal with certain implementation issues.

In order to generate the SQL in the correct sequence considering inter-object dependencies, Post Facto does a topological sort of its objects. Influenced by pg_dump, I preferred a lazier approach.  The diff_map methods of Database and lower level classes generate the SQL in a stratified manner with deferred actions.

For example, since one cannot drop tables or primary keys if foreign keys still refer to the former, ClassDict‘s diff_map, upon finding a table has been dropped (it doesn’t exist in the input YAML map), only marks it for dropping. Then it has a pass to drop foreign keys and views, followed by another pass to drop other constraints and indexes and finally the tables themselves.

As a second illustration, I recently added support for inherited tables (and by extension, partitioned tables). Since PostgreSQL allows for multiple inheritance, creating (and dropping) tables in an inheritance “network” requires special care. My solution: the first pass stacks children tables and only creates “root” tables, while the second pass iterates over the stack to create the rest in the right order.

I’d love to hear feedback on this or from anyone who has experimented with the Pyrseas utilities.

SQL Version Control Implementation Choices I

The Andromeda “data dictionary” is organized as multiple hierarchies, e.g., modules, generic columns, and tables with columns. Here is a very simple example:

module movies:
    description: Movies

column id:
    type_id: int

column title:
    type_id: vchar
    colprec: 32

column release_year:
    type_id: int

table film:
    module: movies
    description: Film information
    column film_id:
        primary_key: Y
    column title:
    column release_year:

This has to be created and maintained manually which may be OK if you’re starting off on a new database project, but not so cool if you have an existing database with a hundred tables (or hundreds of thousands of tables–see page 6).

Thus for Pyrseas I decided to create first a tool (dbtoyaml) to output an existing database schema in YAML format. Structurally, a single hierarchy, e.g., database → schemas → tables (somewhat like pgAdmin’s left panel), appeared preferable to Andromeda’s design. After a couple of iterations, the Pyrseas YAML specification ended as follows:

schema public:
  table film:
    columns:
    - id:
        not_null: true
        type: integer
    - title:
        type: character varying(32)
    - release_year:
        type: integer
    description: Film information
    primary_key:
      film_pkey:
        columns:
        - id

The above is the output from PyYAML’s dump function with default_flow_style=False. Internally, the structure is a Python dict or map which looks just like JSON, e.g.,

{
  'schema public':
  {
    'table film':
    {
      'primary_key':
      {
        'film_pkey':
        {
          'columns': ['id']
        }
      },
      'columns':
      [
        {
          'id':
          {
            'not_null': True,
            'type': 'integer'
          }
        },
        {
          'title':
          {
            'type': 'character varying(32)'
          }
        },
        {
          'release_year':
          {
            'type': 'integer'
          }
        }
      ],
      'description':
        'Film information'
    }
  }
}

PyYAML’s load function can read the YAML file and turn it into the Python dict, so the second tool (yamltodb) can deal with the structure above directly.

The most important implementation decision for yamltodb was how to compare the input map to the map created from the current database’s catalogs. A few options were possible:

  1. SQL-based comparisons
  2. Map comparisons
  3. Internal structure comparisons

SQL Comparisons

This is the approach taken by Andromeda. Its essence is described in step 4 of Dictionary Based Database Upgrades. It consists of storing the input specification into a set of tables and the existing catalogs into a parallel set (potentially one could use the information_schema views), and then issuing SQL queries to compare the two sets.

The presumed advantage of this method is that some of the operations can be done –as SQL set operations– in the DBMS. However, the input spec first has to be inserted into the tables, a row-at-a-time, thus negating the former benefit. In addition, the affected database has to carry at least one set of extraneous catalog-like tables.

Map Comparisons

This is the naive approach consisting of “walking the trees,” i.e., comparing the input map to an internal map created from the current database. This works, up to a point. For example, if a table in the input map is not present in the internal map, it’s easy enough to call a function to generate a CREATE TABLE statement.

However, as most readers know, a PostgreSQL database usually has many dependencies between the various system entities, e.g., foreign keys on primary keys, views on tables, etc. So comparing hierarchies isn’t ideal.

Internal Comparisons

In a forthcoming post, I’ll explore this method as well as other issues such as generating the SQL statements in the correct order.

SQL Database Version Control and RENAMEs

You’ve just watched Josh Berkus’ presentation “Ten Ways to Wreck Your Database” and bearing in mind point #2 “ranDom_naming(s),” your team has decided to get your act together and adopt a naming convention (although this may not be advantageous to your job security, per josh :-)).

PostgreSQL makes it fairly easy to rename schemas, tables and columns via an appropriate ALTER object RENAME statement. Aside: Other SQL implementations provide similar capabilities to various degrees of completeness. Some SQL version control tools support such changes. For example, Liquibase supports renaming tables, columns and views.

On the other hand, an ALTER statement –like a biological mutation– destroys information, which creates a problem for difference engines such as apgdiff, Andromeda and Pyrseas. It may be obvious to a human that the film.rating column and category table in the production database correspond to the renamed film.mpaa_rating column and categories table, respectively, in the development database. However, there is nothing in the latter’s system catalogs that tie the new names to the previous ones. As a result, difference engines will usually DROP the former objects and CREATE others with the new names.

The Pyrseas solution to this problem is to add an oldname field to the YAML spec. For example, if you have the following abbreviated spec (generated by dbtoyaml from the development database):

schema public:
  table film:
    check_constraints:
      ...
    columns:
    ...
    - mpaa_rating:
        type: character(5)
    ...
    primary_key:
    ...

By editing the spec and making the following change to the mpaa_rating column:

    - mpaa_rating:
        oldname: rating
        type: character(5)

Then yamltodb will, when run against a database that still uses the previous colum name, generate the following SQL script:

ALTER TABLE film RENAME COLUMN rating TO mpaa_rating;

Similarly, you can use oldname on a table to generate ALTER TABLE name RENAME TO new_name and on a schema to generate ALTER SCHEMA name RENAME TO new_name.

This is not a foolproof solution because the rename is a one-time action. You could commit the changed spec to your VCS, but if you ran yamltodb against a database after the RENAME has been applied, it will give an error because it cannot find the oldname object. Aside: I’m considering changing that to simply a warning. Another option would be to allow yamltodb to read more than one YAML spec or one that lists some objects more than once, so the extra information could be added when needed. For these types of changes, a complementary tool such as depesz Versioning can be used to introduce the RENAMEs into production.

Pyrseas 0.1.0 includes the RENAME capability for schemas and tables and I’ve recently committed the changes needed for RENAMEing columns.

In other project news, I’ve also added support for FOREIGN KEY ON UPDATE and ON DELETE actions, support for COMMENT statements on schemas, tables and columns, and corrected a problem with indexes being created in the wrong schema. With these changes, the autodoc regression database schemas, tables, columns, primary keys, foreign keys, indexes and comments on these objects can all be properly recreated using dbtoyaml and yamltodb, with the exception of the inheritance tables.

SQL Database Version Control – Pyrseas

Now that Pyrseas has been released, it’s time to see how the DBAs Carol and Dave would use it to manage the changes described in Version Control, Part 2: SQL Databases.

Version 0.1

Carol starts by creating the film table in her database, caroldev, using the CREATE TABLE script, version 0.1, which we saw in the original article. She then uses the Pyrseas dbtoyaml tool to ouput the database specification:

$ dbtoyaml caroldev
schema public:
  table film:
    check_constraints:
      film_release_year_check:
        columns:
        - release_year
        expression: (release_year >= 1888)
    columns:
    - id:
        not_null: true
        type: integer
    - title:
        not_null: true
        type: character varying(32)
    - release_year:
        not_null: true
        type: integer
    primary_key:
      film_pkey:
        access_method: btree
        columns:
        - id

This should be mostly self-explanatory, but let’s go over it. Indentation indicates structure (nesting) and hyphens are used for each element in a list. Thus, there is one schema (public) and one table (film) defined within it. The table has three columns (id, title and release_year) with the given data types and none of them are nullable. The PRIMARY KEY (film_pkey) is on the id column and there is a CHECK constraint on the release_year column. By the way, the alphabetical order within each level is simply the default output format of PyYAML.

Carol can redirect the output of dbtoyaml to a file, say, film.yaml and store it in a Git repository or some other VCS. Dave will then be able to pull the file from the repository and use it to generate the SQL statements needed to create the table in his database, or the development, test and production databases, using the yamltodb tool as follows:

$ yamltodb -1 davedev film.yaml
BEGIN;
CREATE TABLE film (
    id integer NOT NULL,
    title character varying(32) NOT NULL,
    release_year integer NOT NULL);
ALTER TABLE film ADD CONSTRAINT film_pkey PRIMARY KEY (id);
ALTER TABLE film ADD CONSTRAINT film_release_year_check CHECK (release_year >= 1888);
COMMIT;

If he is satisfied with the SQL, he can pipe the output to psql, e.g.,

$ yamltodb -1 davedev film.yaml | psql davedev

Version 0.2

For the second version, Dave issues an ALTER TABLE to add the length column, invokes dbtoyaml to create a new film.yaml and commits that to his repository. Carol uses another ALTER TABLE to add the rating column in her database, and stores a new film.yaml in her repository.

Manny, the integration manager, merges the new film.yaml versions into the project-wide repository. He uses yamltodb to output the statements needed to upgrade to version 0.2:

$ git diff 0.1 0.2
diff --git a/film.yaml b/film.yaml
index 0633c1b..98f19c4 100644
--- a/film.yaml
+++ b/film.yaml
@@ -1,6 +1,10 @@
 schema public:
   table film:
     check_constraints:
+      film_length_check:
+        columns:
+        - length
+        expression: ((length > 0) AND (length < 10000))
       film_release_year_check:
         columns:
         - release_year
@@ -15,6 +19,10 @@ schema public:
     - release_year:
         not_null: true
         type: integer
+    - length:
+        type: smallint
+    - rating:
+        type: character(5)
     primary_key:
       film_pkey:
         access_method: btree
$ yamltodb moviesdev film.yaml
ALTER TABLE film
    ADD COLUMN length smallint;
ALTER TABLE film
    ADD COLUMN rating character(5);
ALTER TABLE film ADD CONSTRAINT film_length_check CHECK ((length > 0) AND (length < 10000));

Version 0.3

For the next round, Carol and Dave issue the statements shown under Version 0.3 in the previous article and commit newer versions of film.yaml which Manny integrates into the project repository. The yamltodb output against the project database is as follows:

$ yamltodb -1 film.yaml
BEGIN;
CREATE SEQUENCE category_category_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MAXVALUE
    NO MINVALUE
    CACHE 1;
CREATE TABLE film_category (
    film_id integer NOT NULL,
    category_id integer NOT NULL,
    last_update timestamp with time zone NOT NULL);
CREATE TABLE category (
    category_id integer NOT NULL DEFAULT nextval('category_category_id_seq'::regclass),
    name character varying(25) NOT NULL,
    last_update timestamp with time zone NOT NULL);
ALTER SEQUENCE category_category_id_seq OWNED BY category.category_id;
ALTER TABLE category ADD CONSTRAINT category_pkey PRIMARY KEY (category_id);
ALTER TABLE film_category ADD CONSTRAINT film_category_pkey PRIMARY KEY (film_id, category_id);
ALTER TABLE film_category ADD CONSTRAINT film_category_category_id_fkey FOREIGN KEY (category_id) REFERENCES category (category_id);
ALTER TABLE film_category ADD CONSTRAINT film_category_film_id_fkey FOREIGN KEY (film_id) REFERENCES film (id);
COMMIT;

Summary

I cannot objectively review the tools I have created, but sincerely hope others will find them useful in managing changes to PostgreSQL databases in conjunction with a VCS.

Pyrseas dbtoyaml and yamltodb require Python, psycopg2 and the PyYAML library. They currently support the basic PostgreSQL features (schemas, tables, PRIMARY KEYs, FOREIGN KEYs, UNIQUE and CHECK constraints, indexes) but the intention is to expand them to cover all features.

The primary audience of the tools are DBAs and developers, but they can be used in conjunction with something like depesz Versioning to implement a stricter system for control over production databases.

I’d like to take this opportunity to thank Adam Cornett, an early adopter/tester, who found an issue with a FOREIGN KEY across schemas, which has been fixed in the master repository.

SQL Version Control System Design Issues

The initial release of Pyrseas is nearing completion. As noted elsewhere, Pyrseas was born to follow through on the concepts of the Andromeda Project. The idea of using a data dictionary specified in JSON or YAML was perhaps the most attractive feature of Andromeda. However, I felt there were several areas that could be improved upon. In this post I’ll go over some of the design issues that influenced the creation of the Pyrseas version control utilities.

Test Driven Development

One of the goals of Pyrseas is to encourage, support or otherwise assist in the development of test-driven applications. The tools should allow the DBA or developer to experiment with changes to database tables and other objects with almost as much ease as a file-based application programmer can change the file structures (granted, the latter has much more work overall due to the lack of a DBMS). ALTER TABLE is great to experiment and make changes, but you need something else if you also want (or need) to keep a current representation of the CREATE TABLE statement in your VCS.

In order to encourage TDD, I think it’s crucial for the tools to do their work speedily. Andromeda takes about 10 seconds to process a database change such as those I have been using as examples. The analogous Pyrseas tool completes the task in one to two orders of magnitude less time.

Team Development

Another objective is to make the tools convenient for use in a team environment. In other words, it should be easy for one DBA to make a change in her database and confidently share it –together with a set of related code changes– with her colleague via a distributed VCS. Note this does not preclude them being used by a single developer.

Modularity

Andromeda insists on a single data dictionary file. Although it requires “modules” to be defined to group tables, all user tables are created in the ‘public’ schema. It appears that Andromeda “modules” are primarily for user interface and security constraint purposes.

The initial release of Pyrseas recognizes the existence of PostgreSQL schemas and allows the DBA to place the tables in schemas as desired. Currently, the Pyrseas dbtoyaml tool (see below) can output specifications for a single schema or for selected tables. I hope to refine this capability in the future, so that –if desired– the data dictionary file could be split and stored in the VCS in a modular fashion, i.e., side-by-side or close to related application code.

Existing System Support

One aspect of Andromeda that I found limiting when I started delving into it was what happens if you want to use if for an existing database, application or project. First, it requires you to create a YAML specification of your database manually, according to its rules, for example, columns need to defined before tables and then “re-used” to specify the tables. Secondly, you can only use a limited set of data types, e.g., a ‘Y’ or ‘N’ column for a BOOLEAN.

For Pyrseas, the first tool designed was dbtoyaml which connects to a database and outputs its schemas, tables and columns in a YAML/JSON format. This format can then be input directly into the second tool, yamltodb. The latter generates SQL statements to bring a second database to match the first.

Summarizing all of the above, this allows a DBA to make a database change either through psql, pgAdmin or some other tool, and then use dbtoyaml to quickly generate a YAML dictionary which can be submitted to the VCS together with related code changes. Another DBA or programmer can pull the changes and apply them promptly to their database to test or integrate them with their own database and code changes.

Platform Support

This an area that concerns implementation but nevertheless affects design. SQL database version control tools can opt to support multiple databases or just one. Presently, I’ve decided to concentrate on PostgreSQL. I believe this can provide complete or nearly complete coverage of the many PG features, and will keep my hands full. If Pyrseas is found useful by others, I’ll be glad to assist anyone with expanding support to other DBMSs.

Development of Pyrseas was done under Linux, but since it’s written in Python, hopefully it will encounter few problems on other OSs.

Comments and discussion welcome!

SQL Database Version Control – Summary

Update: Please visit the Schema VC page for updated information.

Before moving on to discussing how the Pyrseas version control tools deal with the database changes we’ve been using as examples, it seems appropriate to summarize the tools reviewed so far.

The following classifies the tools according to the main feature or mode of operation, and lists them alphabetically within each group. I’ve added some other tools that were not reviewed but were mentioned in comments, in PostgreSQL mailing lists, or elsewhere. Except where otherwise mentioned, the tools are open source software and support PostgreSQL.

Difference Engines

Andromeda

Summary: More than a difference engine, compares input data dictionary to information_schema views, and applies changes to upgrade the connected database.

Input format: YAML

Dependencies: PHP, Apache, PEAR, PL/Perl

Limitations: Data dictionary needs to be created and maintained manually. Limited support of PostgreSQL data types and other features.

apgdiff

Summary: Compares two SQL DDL input files, and outputs SQL statements to upgrade the first schema to match the second.

Input format: output of pg_dump -s

Dependencies: Java

Limitations: Does not currently support all PostgreSQL features.

Pyrseas

Summary: Compares an input data dictionary to the pg_* catalogs of a connected database, and outputs SQL statements to upgrade the latter to match the former. Also provides utility to output catalog information in YAML/JSON format.

Input format: YAML (JSON subset)

Dependencies: Python, psycopg2, LibYAML

Limitations: Does not currently support all PostgreSQL features.

Triangulum

Summary: Successor to Andromeda (see above) but with more focused objectives.

Input format: YAML

Dependencies: PHP, Spyc (YAML library)

Limitations: Data dictionary needs to be created and maintained manually. Limited support of PostgreSQL data types (but better than Andromeda) and other features.

Version Control of Database Changes

dbdeploy

Summary: Database refactoring tool similar to Liquibase (see below). Converts database changes specified in XML to SQL statements to upgrade a target database.

Input format: XML

Dependencies: Java, JDBC

Limitations: Primary support is for DBMSs other than PostgreSQL, so lowest common denominator coverage of PG features.

dbsteward

Summary: Announced in the pgsql-general mailing list. Expected to be available as open source by PGCon 2011. Similar to Liquibase (see below), but see this message for more details.

Input format: XML

Dependencies: Unknown

Limitations: Apparently, XML-based data dictionary must be maintained manually.

depesz Versioning

Summary: Mechanism to control database upgrades implemented as named sets of “patches.”

Input format: SQL scripts

Dependencies: Perl (to list patch dependencies)

Limitations: Scripts to rollback changes need to be created manually.

Liquibase

Summary: Database refactoring tool. Converts database changes specified in XML to SQL statements to upgrade a target database.

Input format: XML

Dependencies: Java, JDBC driver

Limitations: Primary support is for DBMSs other than PostgreSQL, so lowest common denominator coverage of PG features.

Database as VCS

Post Facto

Summary: Uses a number of PostgreSQL databases to establish a Subversion-like repository, so that schema changes to one database can be committed to the repository, and then the changes can be propagated from the repository to other databases.

Input format: none

Dependencies: Python, pyPgSQL

Limitations: Still in alpha status. Last PG version supported: 8.3.

Catalog Snapshots

POV

Summary: Allows you to save a snapshot of the PostgreSQL catalogs, make some schema changes, and possibly roll back the changes by restoring the snapshot.

Input format: none

Dependencies: Uses pgcrypto if available

Other

ChronicDB

Commercial product, supports Linux (Debian and RPM packages).

EMS DB Comparer for PostgreSQL

Commercial product, only available for Microsoft Windows platforms.

neXtep Designer

GPL licensed, IDE based. multiple DBMS support.

This summary has also been posted in a separate page in this blog, and I’ll update that from time to time. If anyone has any corrections or updates, please let me know in the comments.

SQL Database Version Control – Andromeda

In my first review of database VC tools, I mentioned Kenneth Down’s article where he recommended storing a single metadata file in a VCS and using a difference engine to do database upgrades. In a subsequent post, Kenneth expanded on this, providing a step-by-step description of how to implement such a tool. In fact, he had already gone further: he created an open source project named Andromeda that attempted to deliver on that vision.

Let’s see how our DBAs Carol and Dave can use Andromeda to implement the database changes specified in Version Control, Part 2: SQL Databases.

Version 0.1

Andromeda is much more than a version control tool. Its SourceForge page advertises it as a “database development system” and a “complete tool for building database-centric” applications. So, to start off, Carol, Dave and Manny will have to install Apache, PHP and other dependencies before installing Andromeda. Then they’ll be able to access it via web browser at a URL such as http://localhost/~manny/andro_root/andro. After an initialization step, which includes creating a PostgreSQL database named andro, Manny can create the “movies” application as described here.

In order to create the first version of the film table, Carol has to create the YAML spec file movies.dd.yaml as follows (this is the file that will be placed under version control):

group admin:
    description: Administrators
    module movies:
        permsel: Y
        permins: Y
        permupd: Y
        permdel: Y

group users:
    description:  Employees
    module movies:
        permsel: Y

module movies:
    description: Film Database
    uisort: 100

column id:
    description: Unique film ID
    type_id: int

column film_title:
    description: Film title
    type_id: vchar
    colprec: 32

column release_year:
    description: Release year
    type_id: int
    value_min: 1888

table film:
    description: Films
    module: movies
    uisort: 30
    spaceafter: Y

    column id:
        uisearch: Y
        primary_key: Y

    column film_title:
        uisearch: Y

    column release_year:

The group definitions are so that Andromeda can implement security constraints. The module definition is used to group tables logically, e.g., in the full Pagila database we may group tables into movies, customers and accounting. The column definitions precede the table definitions since the former can be used in multiple tables. The YAML spec is processed by selecting a “Build” step from Andromeda. If successful, it will create a “movies” database including the film table.

Note that Carol had to use film_title instead of title because the latter conflicts with an undocumented predefined column. Using title causes the following error:

ERROR >> Duplicate Column definition: 

ERROR >>    column_id => title

Notice also that she used value_min instead of valuemin for the release_year constraint. Andromeda documents the latter. However, no CHECK constraint is actually created and the trigger functions (see below) don’t implement a validation.

If you look at the created table, you may be surprised:

               Table "public.film"
    Column    |         Type          | Modifiers
--------------+-----------------------+-----------
 _agg         | character(1)          |
 skey_quiet   | character(1)          |
 film_title   | character varying(32) |
 skey         | integer               |
 release_year | integer               |
 id           | integer               |
Indexes:
    "film_idx_film_skey_idx" btree (skey)
    "film_pk" btree (id)
Triggers:
    film_ins_bef_r_t BEFORE INSERT ON film FOR EACH ROW EXECUTE PROCEDURE film_ins_bef_r_f()
    film_upd_bef_r_t BEFORE UPDATE ON film FOR EACH ROW EXECUTE PROCEDURE film_upd_bef_r_f()

As you can see, the order of the columns is not what you’d expect from the spec1 and three internal columns were added. Note that although there is an film_pk index, PostgreSQL doesn’t recognize it as the PRIMARY KEY, and all columns are nullable. The “magic” is all in the generated trigger functions.


1 Of course, according to the relational model, the order of the attributes is not significant, but in practice most people examining a relation expect some order.

Version 0.2

Carol and Dave implement the second set of changes by editing movies.dd.yaml. The combined edits are as follows:

@@ -30,4 +30,15 @@
 value_min: 1888

+column length:
+    description: Length in minutes
+    type_id: int
+    value_min: 1
+    value_max: 9999
+
+column rating:
+    description: Film rating
+    type_id: char
+    colprec: 5
+   
 table film:
     description: Films
@@ -44,2 +55,6 @@

 column release_year:
+
+    column length:
+
+    column rating:

The departure from the original is minor: length was defined as int instead of smallint because Andromeda only allows a limited set of types. And although Dave specified the CHECK constraint on length, it’s not functional.

Version 0.3

For the third iteration, Carol and Dave again edit movies.dd.yaml as shown below:

@@ -16,4 +16,44 @@
 uisort: 100

+column language_id:
+    description: Language ID
+    type_id: int
+
+table language:
+    description: Spoken language
+    module: movies
+    uisort: 10
+
+    column language_id:
+        primary_key: Y
+        uisearch: Y
+
+    column name:
+        description: Language name
+        uisearch: Y
+
+    column ts_upd:
+
+column category_id:
+    description: Film category ID
+    type_id: int
+
+table category:
+    description: Categories
+    module: movies
+    uisort: 20
+    spaceafter: Y
+
+    column category_id:
+        auto: sequence
+        primary_key: Y
+        uisearch: Y
+
+    column name:
+        description: Film category
+        uisearch: Y
+
+    column ts_upd:
+
 column id:
     description: Unique film ID
@@ -59,2 +99,17 @@

 column rating:
+
+    foreign_key language:
+
+table film_category:
+    description: Film categories
+    module: movies
+    uisort: 40
+
+    foreign_key film:
+        primary_key: Y
+
+    foreign_key category:
+        primary_key: Y
+
+    column ts_upd:

Notes:

  • The foreign_key specification names the table only. Generated column names match those of the referenced table, which may be confusing.
  • Each column of a PRIMARY KEY needs a primary_key: Y line.
  • The predefined column name was used instead of table-specific columns as in the original. The former creates varchar(100) columns (although the documentation says varchar(40)).
  • The predefined column ts_upd creates timestamp without time zone rather than with time zone as in the original. The latter is not supported by Andromeda.
  • The sequence category_seq_category_id is created, as a result of auto: sequence on category_id but is not linked directly to the category table (it’s used in the trigger functions).

Review

Andromeda is written in PHP (comments indicate it was earlier implemented in Java) and supports PostgreSQL databases. It also requires Apache, PEAR and PL/Perl.

Andromeda implements its difference engine by querying the information_schema views. It saves the data from the views and from the YAML spec into its own tables (it creates 105 tables in a schema named zdd and 77 tables in the public schema –not all of these are for version control or database automation tasks). It then uses SQL queries to determine what actions are needed for the upgrade. It applies the changes immediately.

As pointed out previously, Andromeda supports a limited set of data types. For example, booleans are only implemented as CHAR(1)’s that accept ‘Y’ or ‘N’. A follow-on project, Triangulum, appears to expand on data type support.

Andromeda can load data from CSV files and is able to retain data in an existing table whose structure is changing. However, it cannot deal with RENAMEs of columns or tables and it doesn’t seem to be able to DROP a column.

The last official release of Andromeda occurred in 2009. The guide for its successor, which made a third alpha release available in Feb. 2011, implies Andromeda will be superseded by Triangulum.

The concept of using a YAML specification for SQL database version control is very appealing (in fact, it’s what led to the Pyrseas project) because it disassociates SQL syntax from the definition and is not verbose like the XML used by other tools. Andromeda (and Triangulum) thus implement a useful mechanism for schema versioning.