Category Archives: Version control

Database Redesign and User Interface Refactoring

In this series about database user interfaces, until now I’ve focused on a single table with just three attributes. As I did with version control, I’ll be introducing additional entities and attributes and exploring how they could be presented or updated via the interface.

Here are some of the attributes we could add to our lonely film table:

  • a single-valued attribute
  • a multi-valued attribute with few values per entity
  • a multi-valued attribute with a potentially large number of values per entity
  • an audit trail attribute

Single valued attributes

Practically every film in existence has a primary spoken language (or intertitle language in the case of silent films). Rather than using the language table I showed before (patterned after Pagila), i.e., a smallint language_id, it is preferable to use a standard coding scheme, i.e., ISO 639. For purposes of this tutorial, ISO 639-1 will suffice. Another attribute we’ll add is the film running time in minutes.

Multi-valued attribute, small fan-out

Films are classified into one or more categories, usually identifying a genre, e.g., action, comedy, science fiction. Although there is no definitive list of genres and there are disagreements on categorizing any given film, there is a limited set of genres and most films fall into one or two categories and rarely more than six. Instead of the serial integer and separate film_category table used before (also as in Pagila), since we are using PostgreSQL, perhaps we’ll experiment with an array of varchar (or text) to list the categories in the film table.

Multi-valued attribute, large fan-out

Making a movie is of course a collective endeavor, employing several individuals as cast and crew (even student films). We’ll need a table to record their names and possibly other information and a second table to connect each film to those involved in it.

Challenges

Each of these kinds of attributes present different issues for the user interface.

Assuming we limit ourselves to a short list of genres, we could use a drop-down list to allow the user to choose the categories for each film, but we have to cater for multiple items being selected.

We may allow the user to enter the film language using a two-letter code, but we probably want to display the full language name as confirmation.

On the other hand, we don’t want to force the user to remember (or even be aware of) person identifiers to enter the cast and crew, so some search capability seems necessary.

Refactoring the Interface

To explore these additions, we’ll go back to the minimalist command line interface. However, we’ll first refactor it with the knowledge we gained up to this point.

Instead of the bl.py module we had earlier, we’ll copy the bl/film.py module from the CherryPy interface. We’ll also replace the dblib.py module by the more robust version, but adding a DbConnection.close() method.

The major refactoring occurs in dbapp.py. First, we add a top level “menu” (only two options for now: Films and Quit) and split film-related functionality to a separate film.py module. The latter is patterned after the corresponding module in the CherryPy version. A brief review of the result may be useful:

  • The Film class (in bl/film.py) is the part that knows how to fetch from and update the film table
  • The FilmForm class is the component that obtains valid data from the user
  • The FilmHandler, for the most part, acts a controller, sequencing the various operations

The code is available on GitHub, tagged as v0.4.0.

Audit Trails

Back in February, I defined the film table version 0.1 with just three columns:

CREATE TABLE film (
    id INTEGER NOT NULL PRIMARY KEY,
    title VARCHAR(32) NOT NULL,
    release_year INTEGER NOT NULL CHECK (release_year >= 1888)
);

It was unclear then how far I’d go in supporting all of PostgreSQL data definition features. Now, Pyrseas is making progress towards covering all those capabilities.

Consequently, I’d like to add one column to the above and tag it as version 0.1.1. This is the column to be added (from the Pagila sample database):

   last_update timestamp with time zone NOT NULL DEFAULT now()

With this new column, I’ll be able to explore another kind of field in the database user interface: one that will not be present on new records (because the DEFAULT will take care of filling in that data), but will be displayed, in readonly mode, for updates. Note: In Pagila, last_update is defined as timestamp without time zone, but that seems too ambiguous for my taste.

To complete this, we need an ON UPDATE trigger together with a function to be called by the trigger. These are also provided by Pagila, but to highlight some of the capabilities of Pyrseas, I’ll first show the output of dbtoyaml against a database with all the necessary objects:

language plpgsql:
  trusted: true
schema public:
  description: standard public schema
  function last_updated():
    language: plpgsql
    returns: trigger
    source: "\nBEGIN\n    NEW.last_update = CURRENT_TIMESTAMP;\n    RETURN NEW;\n\
      END "
  table film:
    check_constraints:
      film_release_year_check:
        columns:
        - release_year
        expression: (release_year >= 1888)
    columns:
    - id:
        not_null: true
        type: integer
    - title:
        not_null: true
        type: character varying(32)
    - release_year:
        not_null: true
        type: integer
    - last_update:
        default: now()
        not_null: true
        type: timestamp with time zone
    primary_key:
      film_pkey:
        access_method: btree
        columns:
        - id
    triggers:
      last_updated:
        events:
        - update
        level: row
        procedure: last_updated()
        timing: before

If you feed this YAML to yamltodb against a database that already has the film table version 0.1, it generates the following (against a PostgreSQL 8.4 DBMS—vs. 9.0 the creation of PL/pgSQL is bypassed):

BEGIN;
CREATE LANGUAGE plpgsql;
SET check_function_bodies = false;
CREATE FUNCTION last_updated() RETURNS trigger
    LANGUAGE plpgsql
    AS $_$
BEGIN
    NEW.last_update = CURRENT_TIMESTAMP;
    RETURN NEW;
END $_$;
ALTER TABLE film
    ADD COLUMN last_update timestamp with time zone NOT NULL DEFAULT now();
CREATE TRIGGER last_updated
    BEFORE UPDATE ON film
    FOR EACH ROW
    EXECUTE PROCEDURE last_updated();
COMMIT;

Thus we can successfully upgrade the version 0.1 database to version 0.1.1, with our audit trail column.

Happy trails to you, ’till we meet again …

PostgreSQL Version Control Feature Coverage

In a comment to one of my early posts about version control, Peter Eisentraut stated he was “somewhat discouraged because [seeing how fast PostgreSQL develops and adds new features,] I don’t see how a tool like apgdiff can keep up and stay useful unless a lot more resources are put into it.” At the time, the first release of Pyrseas (0.1.0) was well over a month away. Currently, Pyrseas 0.3.0 is nearing release . This seems like a good time to summarize where Pyrseas stands vis-à-vis the list of PostgreSQL data definition features:

DDL Feature 0.1 0.2 0.3
AGGREGATE  X
CAST
CONSTRAINT TRIGGER
CONVERSION
DOMAIN  X
FOREIGN DATA WRAPPER
FUNCTION  X  X
GROUP
INDEX  X  X  X
LANGUAGE  X  X
OPERATOR
OPERATOR CLASS
OPERATOR FAMILY
ROLE
RULE X
SCHEMA X X X
SEQUENCE X X X
SERVER
TABLE  X  X  X
 - CHECK CONSTRAINT  X  X  X
 - PRIMARY KEY  X  X  X
 - FOREIGN KEY  X  X  X
 - UNIQUE CONSTRAINT  X  X  X
 - INHERIT  X  X
TABLESPACE
TEXT SEARCH CONFIGURATION
TEXT SEARCH DICTIONARY
TEXT SEARCH PARSER
TEXT SEARCH TEMPLATE
TRIGGER  X
TYPE  X
USER
USER MAPPING
VIEW  X  X
COMMENT  X  X

Note that PosgreSQL 9.1 will add EXTENSION to the list above. Note also that although the links in the matrix go to the corresponding CREATE object documentation page, Pyrseas supports CREATE, CREATE OR REPLACE, DROP and ALTER, as applicable. A final note: an “X” in a particular release doesn’t imply that the feature is supported in all its syntactic glory.

Caveats aside, progress has been made. Pyrseas 0.1.0 supported most of the basic relational database objects and concepts present in the SQL Standard, and usually covered by generic version control tools. Pyrseas 0.3.0 will support the features showcased in the Pagila sample database, which I hope is somewhat representative of the average PostgreSQL database (in terms of DDL features, of course).

The goal for the 0.4.0 release will be to cover the remaining features excluding Text Search, SQL/MED and ROLE/GROUP, which will be tackled in 0.5.0 or later. Is there a feature you or your organization is interested in? Let me know in the comments.

Gearing up for a second release

The first release of Pyrseas was aimed at addressing the essential version control issues described in my first posts. The second release, somewhat accidentally, has focused on being able to recreate the PostgreSQL autodoc regression database. In addition, Pyrseas will be released on the PostgreSQL Extension Network (PGXN).

Pyrseas already supports the basic schema, table, column, primary key, foreign key and index CREATE and ALTER operations provided by PostgreSQL (and every relational DBMS worthy of that designation). Here is a summary of PostgreSQL features that will be supported in the upcoming release:

  • COMMENTs on schemas, tables, columns and functions.
  • FOREIGN KEY ON UPDATE and ON DELETE actions.
  • ALTER TABLE RENAME COLUMN and enhanced support for other ALTER object RENAME statements.
  • VIEWs
  • INHERITed tables, and by extension, partitioned tables.
  • PROCEDURAL LANGUAGEs
  • FUNCTIONs

Support for some of these features is still elementary. However, the Pyrseas utilities are now able to recreate the autodoc database and revert back to an empty database, i.e., they also know how to drop each object in the correct order.

The Pyrseas unit tests have also been enhanced so they can be run against different PostgreSQL installations. For example, using the PYRSEAS_TEST_PORT environment variable I was able to run the tests against the 9.1 Beta 1 release, as well as 9.0 and 8.4.

Looking forward, the subsequent release will tackle being able to recreate the Pagila sample database, thus adding support for TYPEs, DOMAINs, AGGREGATEs, TRIGGERs and RULEs, and improving support for existing features.

Update: Pyrseas 0.2.0 has been released and is now available via PyPI, PGXN, PgFoundry and from the GitHub repository.

SQL Version Control Implementation Choices II

In the previous post, I discussed two alternative implementations for the yamltodb difference engine: SQL comparisons using extra dictionary tables, and comparing the YAML/JSON maps (Python dicts). Today I’ll review the last option: internal structure comparisons.

Internal Structures – Take 1

The original implementation for yamltodb was based on the implementation of dbtoyaml. Since the YAML maps output by the latter were hierarchical, the internal structures were also organized in layers: a Python class instance (Database) to represent the database, having a dict of classes representing the schemas, each in turn with a dict of “schema objects,” i.e., tables, sequences, etc. Each layer was populated by queries against the corresponding PostgreSQL catalogs.

This organization was also used for the difference engine. A second Database instance represented the YAML input. The two hierarchies were then traversed and compared at each layer, outputting SQL as differences were found.

This approach served to flesh out the SQL generation code and to deal with various feature tests, e.g., creation of a schema, creation of a table, adding columns to a table, and even RENAMEing schemas and tables. However, it couldn’t easily handle FOREIGN KEYs and SEQUENCEs owned by tables (resulting from a SERIAL data type).

Internal Structures – Take 2

To tackle the various dependencies, I refactored the code to use structures somewhat along the lines of Post Facto. The Database class now holds objects derived from class DbObjectDict which corresponds roughly to a single PostgreSQL catalog. Thus, SchemaDict is nearly equivalent to pg_namespace, ClassDict to pg_class and ConstraintDict to pg_constraint. Each one of those classes is also a Python dict and holds specialized objects, e.g., ClassDict holds objects (derived from DbClass) of class Sequence, Table or View (for the time being, class Index is handled separately).

Each XxxDict class is hashed by a Python tuple (an immutable sequence), paralleling the underlying catalog UNIQUE keys. After the dicts are populated, either from the catalogs or from the YAML map, methods are called to link related objects, e.g., columns to tables, foreign keys to the primary key tables, etc.

Note that Pyrseas implements dual sets of classes: collections (dicts) of objects and classes for individual object types. In constrast Post Facto has a single set of objects and uses a Python classmethod to fetch the catalog information. I found the dual sets more convenient to deal with certain implementation issues.

In order to generate the SQL in the correct sequence considering inter-object dependencies, Post Facto does a topological sort of its objects. Influenced by pg_dump, I preferred a lazier approach.  The diff_map methods of Database and lower level classes generate the SQL in a stratified manner with deferred actions.

For example, since one cannot drop tables or primary keys if foreign keys still refer to the former, ClassDict‘s diff_map, upon finding a table has been dropped (it doesn’t exist in the input YAML map), only marks it for dropping. Then it has a pass to drop foreign keys and views, followed by another pass to drop other constraints and indexes and finally the tables themselves.

As a second illustration, I recently added support for inherited tables (and by extension, partitioned tables). Since PostgreSQL allows for multiple inheritance, creating (and dropping) tables in an inheritance “network” requires special care. My solution: the first pass stacks children tables and only creates “root” tables, while the second pass iterates over the stack to create the rest in the right order.

I’d love to hear feedback on this or from anyone who has experimented with the Pyrseas utilities.

SQL Version Control Implementation Choices I

The Andromeda “data dictionary” is organized as multiple hierarchies, e.g., modules, generic columns, and tables with columns. Here is a very simple example:

module movies:
    description: Movies

column id:
    type_id: int

column title:
    type_id: vchar
    colprec: 32

column release_year:
    type_id: int

table film:
    module: movies
    description: Film information
    column film_id:
        primary_key: Y
    column title:
    column release_year:

This has to be created and maintained manually which may be OK if you’re starting off on a new database project, but not so cool if you have an existing database with a hundred tables (or hundreds of thousands of tables–see page 6).

Thus for Pyrseas I decided to create first a tool (dbtoyaml) to output an existing database schema in YAML format. Structurally, a single hierarchy, e.g., database → schemas → tables (somewhat like pgAdmin’s left panel), appeared preferable to Andromeda’s design. After a couple of iterations, the Pyrseas YAML specification ended as follows:

schema public:
  table film:
    columns:
    - id:
        not_null: true
        type: integer
    - title:
        type: character varying(32)
    - release_year:
        type: integer
    description: Film information
    primary_key:
      film_pkey:
        columns:
        - id

The above is the output from PyYAML’s dump function with default_flow_style=False. Internally, the structure is a Python dict or map which looks just like JSON, e.g.,

{
  'schema public':
  {
    'table film':
    {
      'primary_key':
      {
        'film_pkey':
        {
          'columns': ['id']
        }
      },
      'columns':
      [
        {
          'id':
          {
            'not_null': True,
            'type': 'integer'
          }
        },
        {
          'title':
          {
            'type': 'character varying(32)'
          }
        },
        {
          'release_year':
          {
            'type': 'integer'
          }
        }
      ],
      'description':
        'Film information'
    }
  }
}

PyYAML’s load function can read the YAML file and turn it into the Python dict, so the second tool (yamltodb) can deal with the structure above directly.

The most important implementation decision for yamltodb was how to compare the input map to the map created from the current database’s catalogs. A few options were possible:

  1. SQL-based comparisons
  2. Map comparisons
  3. Internal structure comparisons

SQL Comparisons

This is the approach taken by Andromeda. Its essence is described in step 4 of Dictionary Based Database Upgrades. It consists of storing the input specification into a set of tables and the existing catalogs into a parallel set (potentially one could use the information_schema views), and then issuing SQL queries to compare the two sets.

The presumed advantage of this method is that some of the operations can be done –as SQL set operations– in the DBMS. However, the input spec first has to be inserted into the tables, a row-at-a-time, thus negating the former benefit. In addition, the affected database has to carry at least one set of extraneous catalog-like tables.

Map Comparisons

This is the naive approach consisting of “walking the trees,” i.e., comparing the input map to an internal map created from the current database. This works, up to a point. For example, if a table in the input map is not present in the internal map, it’s easy enough to call a function to generate a CREATE TABLE statement.

However, as most readers know, a PostgreSQL database usually has many dependencies between the various system entities, e.g., foreign keys on primary keys, views on tables, etc. So comparing hierarchies isn’t ideal.

Internal Comparisons

In a forthcoming post, I’ll explore this method as well as other issues such as generating the SQL statements in the correct order.

SQL Database Version Control and RENAMEs

You’ve just watched Josh Berkus’ presentation “Ten Ways to Wreck Your Database” and bearing in mind point #2 “ranDom_naming(s),” your team has decided to get your act together and adopt a naming convention (although this may not be advantageous to your job security, per josh :-)).

PostgreSQL makes it fairly easy to rename schemas, tables and columns via an appropriate ALTER object RENAME statement. Aside: Other SQL implementations provide similar capabilities to various degrees of completeness. Some SQL version control tools support such changes. For example, Liquibase supports renaming tables, columns and views.

On the other hand, an ALTER statement –like a biological mutation– destroys information, which creates a problem for difference engines such as apgdiff, Andromeda and Pyrseas. It may be obvious to a human that the film.rating column and category table in the production database correspond to the renamed film.mpaa_rating column and categories table, respectively, in the development database. However, there is nothing in the latter’s system catalogs that tie the new names to the previous ones. As a result, difference engines will usually DROP the former objects and CREATE others with the new names.

The Pyrseas solution to this problem is to add an oldname field to the YAML spec. For example, if you have the following abbreviated spec (generated by dbtoyaml from the development database):

schema public:
  table film:
    check_constraints:
      ...
    columns:
    ...
    - mpaa_rating:
        type: character(5)
    ...
    primary_key:
    ...

By editing the spec and making the following change to the mpaa_rating column:

    - mpaa_rating:
        oldname: rating
        type: character(5)

Then yamltodb will, when run against a database that still uses the previous colum name, generate the following SQL script:

ALTER TABLE film RENAME COLUMN rating TO mpaa_rating;

Similarly, you can use oldname on a table to generate ALTER TABLE name RENAME TO new_name and on a schema to generate ALTER SCHEMA name RENAME TO new_name.

This is not a foolproof solution because the rename is a one-time action. You could commit the changed spec to your VCS, but if you ran yamltodb against a database after the RENAME has been applied, it will give an error because it cannot find the oldname object. Aside: I’m considering changing that to simply a warning. Another option would be to allow yamltodb to read more than one YAML spec or one that lists some objects more than once, so the extra information could be added when needed. For these types of changes, a complementary tool such as depesz Versioning can be used to introduce the RENAMEs into production.

Pyrseas 0.1.0 includes the RENAME capability for schemas and tables and I’ve recently committed the changes needed for RENAMEing columns.

In other project news, I’ve also added support for FOREIGN KEY ON UPDATE and ON DELETE actions, support for COMMENT statements on schemas, tables and columns, and corrected a problem with indexes being created in the wrong schema. With these changes, the autodoc regression database schemas, tables, columns, primary keys, foreign keys, indexes and comments on these objects can all be properly recreated using dbtoyaml and yamltodb, with the exception of the inheritance tables.

SQL Database Version Control – Pyrseas

Now that Pyrseas has been released, it’s time to see how the DBAs Carol and Dave would use it to manage the changes described in Version Control, Part 2: SQL Databases.

Version 0.1

Carol starts by creating the film table in her database, caroldev, using the CREATE TABLE script, version 0.1, which we saw in the original article. She then uses the Pyrseas dbtoyaml tool to ouput the database specification:

$ dbtoyaml caroldev
schema public:
  table film:
    check_constraints:
      film_release_year_check:
        columns:
        - release_year
        expression: (release_year >= 1888)
    columns:
    - id:
        not_null: true
        type: integer
    - title:
        not_null: true
        type: character varying(32)
    - release_year:
        not_null: true
        type: integer
    primary_key:
      film_pkey:
        access_method: btree
        columns:
        - id

This should be mostly self-explanatory, but let’s go over it. Indentation indicates structure (nesting) and hyphens are used for each element in a list. Thus, there is one schema (public) and one table (film) defined within it. The table has three columns (id, title and release_year) with the given data types and none of them are nullable. The PRIMARY KEY (film_pkey) is on the id column and there is a CHECK constraint on the release_year column. By the way, the alphabetical order within each level is simply the default output format of PyYAML.

Carol can redirect the output of dbtoyaml to a file, say, film.yaml and store it in a Git repository or some other VCS. Dave will then be able to pull the file from the repository and use it to generate the SQL statements needed to create the table in his database, or the development, test and production databases, using the yamltodb tool as follows:

$ yamltodb -1 davedev film.yaml
BEGIN;
CREATE TABLE film (
    id integer NOT NULL,
    title character varying(32) NOT NULL,
    release_year integer NOT NULL);
ALTER TABLE film ADD CONSTRAINT film_pkey PRIMARY KEY (id);
ALTER TABLE film ADD CONSTRAINT film_release_year_check CHECK (release_year >= 1888);
COMMIT;

If he is satisfied with the SQL, he can pipe the output to psql, e.g.,

$ yamltodb -1 davedev film.yaml | psql davedev

Version 0.2

For the second version, Dave issues an ALTER TABLE to add the length column, invokes dbtoyaml to create a new film.yaml and commits that to his repository. Carol uses another ALTER TABLE to add the rating column in her database, and stores a new film.yaml in her repository.

Manny, the integration manager, merges the new film.yaml versions into the project-wide repository. He uses yamltodb to output the statements needed to upgrade to version 0.2:

$ git diff 0.1 0.2
diff --git a/film.yaml b/film.yaml
index 0633c1b..98f19c4 100644
--- a/film.yaml
+++ b/film.yaml
@@ -1,6 +1,10 @@
 schema public:
   table film:
     check_constraints:
+      film_length_check:
+        columns:
+        - length
+        expression: ((length > 0) AND (length < 10000))
       film_release_year_check:
         columns:
         - release_year
@@ -15,6 +19,10 @@ schema public:
     - release_year:
         not_null: true
         type: integer
+    - length:
+        type: smallint
+    - rating:
+        type: character(5)
     primary_key:
       film_pkey:
         access_method: btree
$ yamltodb moviesdev film.yaml
ALTER TABLE film
    ADD COLUMN length smallint;
ALTER TABLE film
    ADD COLUMN rating character(5);
ALTER TABLE film ADD CONSTRAINT film_length_check CHECK ((length > 0) AND (length < 10000));

Version 0.3

For the next round, Carol and Dave issue the statements shown under Version 0.3 in the previous article and commit newer versions of film.yaml which Manny integrates into the project repository. The yamltodb output against the project database is as follows:

$ yamltodb -1 film.yaml
BEGIN;
CREATE SEQUENCE category_category_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MAXVALUE
    NO MINVALUE
    CACHE 1;
CREATE TABLE film_category (
    film_id integer NOT NULL,
    category_id integer NOT NULL,
    last_update timestamp with time zone NOT NULL);
CREATE TABLE category (
    category_id integer NOT NULL DEFAULT nextval('category_category_id_seq'::regclass),
    name character varying(25) NOT NULL,
    last_update timestamp with time zone NOT NULL);
ALTER SEQUENCE category_category_id_seq OWNED BY category.category_id;
ALTER TABLE category ADD CONSTRAINT category_pkey PRIMARY KEY (category_id);
ALTER TABLE film_category ADD CONSTRAINT film_category_pkey PRIMARY KEY (film_id, category_id);
ALTER TABLE film_category ADD CONSTRAINT film_category_category_id_fkey FOREIGN KEY (category_id) REFERENCES category (category_id);
ALTER TABLE film_category ADD CONSTRAINT film_category_film_id_fkey FOREIGN KEY (film_id) REFERENCES film (id);
COMMIT;

Summary

I cannot objectively review the tools I have created, but sincerely hope others will find them useful in managing changes to PostgreSQL databases in conjunction with a VCS.

Pyrseas dbtoyaml and yamltodb require Python, psycopg2 and the PyYAML library. They currently support the basic PostgreSQL features (schemas, tables, PRIMARY KEYs, FOREIGN KEYs, UNIQUE and CHECK constraints, indexes) but the intention is to expand them to cover all features.

The primary audience of the tools are DBAs and developers, but they can be used in conjunction with something like depesz Versioning to implement a stricter system for control over production databases.

I’d like to take this opportunity to thank Adam Cornett, an early adopter/tester, who found an issue with a FOREIGN KEY across schemas, which has been fixed in the master repository.

First release of Pyrseas

just a quick note to announce that the first release of Pyrseas is now available as follows:

Github

https://github.com/jmafc/Pyrseas: Git repository. You can report any issues there.

Python Package Index

http://pypi.python.org/pypi/Pyrseas:  Source tar archive.

PgFoundry

http://pgfoundry.org/projects/pyrseas/: Source tar archive.

Discussion

You can of course leave comments below, but there is also a general mailing list available at PgFoundry, which also hosts forums.

My deepest thanks to the PgFoundry team for all the support tools.

In addition there is a website which is currently mostly bare, but I hope to upload the documentation soon. Update: The initial documentation is now available here.

Pyrseas was developed on Debian Linux with PostgreSQL 8.4, and also tested against 9.0, using Python 2.6, psycopg 2.2 (also tested with 2.4), and PyYAML 3.09. I intend to test on Windows, but I’ll be thankful if some adventurous soul tries it out first.

SQL Version Control System Design Issues

The initial release of Pyrseas is nearing completion. As noted elsewhere, Pyrseas was born to follow through on the concepts of the Andromeda Project. The idea of using a data dictionary specified in JSON or YAML was perhaps the most attractive feature of Andromeda. However, I felt there were several areas that could be improved upon. In this post I’ll go over some of the design issues that influenced the creation of the Pyrseas version control utilities.

Test Driven Development

One of the goals of Pyrseas is to encourage, support or otherwise assist in the development of test-driven applications. The tools should allow the DBA or developer to experiment with changes to database tables and other objects with almost as much ease as a file-based application programmer can change the file structures (granted, the latter has much more work overall due to the lack of a DBMS). ALTER TABLE is great to experiment and make changes, but you need something else if you also want (or need) to keep a current representation of the CREATE TABLE statement in your VCS.

In order to encourage TDD, I think it’s crucial for the tools to do their work speedily. Andromeda takes about 10 seconds to process a database change such as those I have been using as examples. The analogous Pyrseas tool completes the task in one to two orders of magnitude less time.

Team Development

Another objective is to make the tools convenient for use in a team environment. In other words, it should be easy for one DBA to make a change in her database and confidently share it –together with a set of related code changes– with her colleague via a distributed VCS. Note this does not preclude them being used by a single developer.

Modularity

Andromeda insists on a single data dictionary file. Although it requires “modules” to be defined to group tables, all user tables are created in the ‘public’ schema. It appears that Andromeda “modules” are primarily for user interface and security constraint purposes.

The initial release of Pyrseas recognizes the existence of PostgreSQL schemas and allows the DBA to place the tables in schemas as desired. Currently, the Pyrseas dbtoyaml tool (see below) can output specifications for a single schema or for selected tables. I hope to refine this capability in the future, so that –if desired– the data dictionary file could be split and stored in the VCS in a modular fashion, i.e., side-by-side or close to related application code.

Existing System Support

One aspect of Andromeda that I found limiting when I started delving into it was what happens if you want to use if for an existing database, application or project. First, it requires you to create a YAML specification of your database manually, according to its rules, for example, columns need to defined before tables and then “re-used” to specify the tables. Secondly, you can only use a limited set of data types, e.g., a ‘Y’ or ‘N’ column for a BOOLEAN.

For Pyrseas, the first tool designed was dbtoyaml which connects to a database and outputs its schemas, tables and columns in a YAML/JSON format. This format can then be input directly into the second tool, yamltodb. The latter generates SQL statements to bring a second database to match the first.

Summarizing all of the above, this allows a DBA to make a database change either through psql, pgAdmin or some other tool, and then use dbtoyaml to quickly generate a YAML dictionary which can be submitted to the VCS together with related code changes. Another DBA or programmer can pull the changes and apply them promptly to their database to test or integrate them with their own database and code changes.

Platform Support

This an area that concerns implementation but nevertheless affects design. SQL database version control tools can opt to support multiple databases or just one. Presently, I’ve decided to concentrate on PostgreSQL. I believe this can provide complete or nearly complete coverage of the many PG features, and will keep my hands full. If Pyrseas is found useful by others, I’ll be glad to assist anyone with expanding support to other DBMSs.

Development of Pyrseas was done under Linux, but since it’s written in Python, hopefully it will encounter few problems on other OSs.

Comments and discussion welcome!