Tag Archives: alter table

SQL Database Version Control and RENAMEs

You’ve just watched Josh Berkus’ presentation “Ten Ways to Wreck Your Database” and bearing in mind point #2 “ranDom_naming(s),” your team has decided to get your act together and adopt a naming convention (although this may not be advantageous to your job security, per josh :-)).

PostgreSQL makes it fairly easy to rename schemas, tables and columns via an appropriate ALTER object RENAME statement. Aside: Other SQL implementations provide similar capabilities to various degrees of completeness. Some SQL version control tools support such changes. For example, Liquibase supports renaming tables, columns and views.

On the other hand, an ALTER statement –like a biological mutation– destroys information, which creates a problem for difference engines such as apgdiff, Andromeda and Pyrseas. It may be obvious to a human that the film.rating column and category table in the production database correspond to the renamed film.mpaa_rating column and categories table, respectively, in the development database. However, there is nothing in the latter’s system catalogs that tie the new names to the previous ones. As a result, difference engines will usually DROP the former objects and CREATE others with the new names.

The Pyrseas solution to this problem is to add an oldname field to the YAML spec. For example, if you have the following abbreviated spec (generated by dbtoyaml from the development database):

schema public:
  table film:
    check_constraints:
      ...
    columns:
    ...
    - mpaa_rating:
        type: character(5)
    ...
    primary_key:
    ...

By editing the spec and making the following change to the mpaa_rating column:

    - mpaa_rating:
        oldname: rating
        type: character(5)

Then yamltodb will, when run against a database that still uses the previous colum name, generate the following SQL script:

ALTER TABLE film RENAME COLUMN rating TO mpaa_rating;

Similarly, you can use oldname on a table to generate ALTER TABLE name RENAME TO new_name and on a schema to generate ALTER SCHEMA name RENAME TO new_name.

This is not a foolproof solution because the rename is a one-time action. You could commit the changed spec to your VCS, but if you ran yamltodb against a database after the RENAME has been applied, it will give an error because it cannot find the oldname object. Aside: I’m considering changing that to simply a warning. Another option would be to allow yamltodb to read more than one YAML spec or one that lists some objects more than once, so the extra information could be added when needed. For these types of changes, a complementary tool such as depesz Versioning can be used to introduce the RENAMEs into production.

Pyrseas 0.1.0 includes the RENAME capability for schemas and tables and I’ve recently committed the changes needed for RENAMEing columns.

In other project news, I’ve also added support for FOREIGN KEY ON UPDATE and ON DELETE actions, support for COMMENT statements on schemas, tables and columns, and corrected a problem with indexes being created in the wrong schema. With these changes, the autodoc regression database schemas, tables, columns, primary keys, foreign keys, indexes and comments on these objects can all be properly recreated using dbtoyaml and yamltodb, with the exception of the inheritance tables.

Version Control, Part 2: SQL Databases

In Part 1 we saw how Alice and Bob used a VCS to manage changes to a file of movie information.  In this post, we’ll examine how Carol and Dave, database specialists, deal with similar changes while developing the same application against a SQL-based DBMS.

Version 0.1

When using a relational DBMS, the database designer, administrator (DBA) or programmer defines the data structures, i.e., the tables and columns, using SQL data definition language (DDL), such as this:

CREATE TABLE film (
    id INTEGER NOT NULL PRIMARY KEY,
    title VARCHAR(32) NOT NULL,
    release_year INTEGER NOT NULL CHECK (release_year >= 1888)
);

By adding PRIMARY KEY on the id line, the responsibility for ensuring unique identifiers no longer rests with the application but with the DBMS (but the application needs to be prepared to deal with duplicate ID errors).

Similarly, the validation of release_year is now performed by the DBMS as specified by the CHECK constraint. The application may still implement validations, but the DBMS ensures data integrity even when records are updated using tools external to the primary application.

The DDL can be added to a VCS, say, as a script named film.sql. The application may still use a file like the film.h seen previously to represent the collection of data retrieved from the DBMS or input by a user.  For example, Alice can surround the typedef struct film by EXEC SQL BEGIN|END DECLARE SECTION statements to use it in an ECPG program.

Version 0.2

For the second version, Dave wants to add the movie length column, just as Bob did earlier. In his test environment, he can simply edit film.sql and add length SMALLINT at the end. Similarly, Carol can add rating CHAR(5) to her copy of film.sql and recreate the film table to test her changes. They could even merge their changes using the VCS and store a version 0.2 as follows:

CREATE TABLE film (
    id INTEGER NOT NULL PRIMARY KEY,
    title VARCHAR(32) NOT NULL,
    release_year INTEGER NOT NULL CHECK (release_year >= 1888),
    length SMALLINT CHECK (length > 0 AND length < 10000),
    rating CHAR(5)
);

This approach may be OK for testing (and new customers), but is not satisfactory for production environments with numerous records already in the database. For the flat-file scenario, there was no choice but to create a migration program to reformat or reload the file into the new structure. For a SQL database, you can use the following DDL statements:

ALTER TABLE film ADD COLUMN length SMALLINT
    CHECK (length > 0 AND length < 10000);
ALTER TABLE film ADD COLUMN rating CHAR(5);

Unfortunately, while the ALTER statement allows easy modification of the DBMS storage structures—even in tables with millions of rows, it is not orthogonal with respect to version control. Carol and Dave can commit the CREATE TABLE film.sql version 0.2 to the VCS for use with new installations, but they also have to put the ALTER TABLE statements under version control, probably as part of a script to upgrade existing databases.

The ideal solution may be a CREATE OR REPLACE TABLE statement, but that does not exist.

Version 0.3

For the next release, Dave adds the following:

CREATE TABLE language (
    language_id INTEGER NOT NULL PRIMARY KEY,
    name VARCHAR(20) NOT NULL,
    last_update TIMESTAMP WITH TIME ZONE NOT NULL
);
ALTER TABLE film ADD COLUMN language_id INTEGER NOT NULL;
ALTER TABLE film ADD FOREIGN KEY (language_id)
    REFERENCES language (language_id);

And Carol adds this:

CREATE TABLE category (
    category_id SERIAL NOT NULL PRIMARY KEY,
    name VARCHAR(25) NOT NULL,
    last_update TIMESTAMP WITH TIME ZONE NOT NULL
);
CREATE TABLE film_category (
    film_id INTEGER NOT NULL REFERENCES film (id),
    category_id INTEGER NOT NULL
        REFERENCES category (category_id),
    last_update TIMESTAMP WITH TIME ZONE NOT NULL,
    PRIMARY KEY (film_id, category_id)
);

We again have to deal with storing in the VCS both the CREATE and ALTER statements in order to accomodate new as well as existing customers at different release levels.  However, these changes also exemplify two other issues. The first is that some default DBMS actions are not visible in the DDL.  For example, PostgreSQL will give default names to PRIMARY and FOREIGN KEYS, e.g., film_language_id_fkey. It will also create a SEQUENCE in response to the SERIAL data type specification and add a column DEFAULT based on that sequence.  It may be useful to record these names and other objects in the VCS.

More importantly, there are now dependencies between the various tables. Previously, issuing a certain ALTER TABLE ADD COLUMN statement before another did not (or should not) have much impact on the database or application. Now, film_category can only be created after category and film exist, and language must be created before film (or before the FOREIGN KEY is added to it).

How can projects using PostgreSQL or other RDBMSs deal with the complexities of version control over the database? We’ll examine some solutions in forthcoming posts.