In Part 1 we saw how Alice and Bob used a VCS to manage changes to a file of movie information. In this post, we’ll examine how Carol and Dave, database specialists, deal with similar changes while developing the same application against a SQL-based DBMS.
When using a relational DBMS, the database designer, administrator (DBA) or programmer defines the data structures, i.e., the tables and columns, using SQL data definition language (DDL), such as this:
CREATE TABLE film ( id INTEGER NOT NULL PRIMARY KEY, title VARCHAR(32) NOT NULL, release_year INTEGER NOT NULL CHECK (release_year >= 1888) );
By adding PRIMARY KEY on the id line, the responsibility for ensuring unique identifiers no longer rests with the application but with the DBMS (but the application needs to be prepared to deal with duplicate ID errors).
Similarly, the validation of release_year is now performed by the DBMS as specified by the CHECK constraint. The application may still implement validations, but the DBMS ensures data integrity even when records are updated using tools external to the primary application.
The DDL can be added to a VCS, say, as a script named film.sql. The application may still use a file like the film.h seen previously to represent the collection of data retrieved from the DBMS or input by a user. For example, Alice can surround the typedef struct film by EXEC SQL BEGIN|END DECLARE SECTION statements to use it in an ECPG program.
For the second version, Dave wants to add the movie length column, just as Bob did earlier. In his test environment, he can simply edit film.sql and add length SMALLINT at the end. Similarly, Carol can add rating CHAR(5) to her copy of film.sql and recreate the film table to test her changes. They could even merge their changes using the VCS and store a version 0.2 as follows:
CREATE TABLE film ( id INTEGER NOT NULL PRIMARY KEY, title VARCHAR(32) NOT NULL, release_year INTEGER NOT NULL CHECK (release_year >= 1888), length SMALLINT CHECK (length > 0 AND length < 10000), rating CHAR(5) );
This approach may be OK for testing (and new customers), but is not satisfactory for production environments with numerous records already in the database. For the flat-file scenario, there was no choice but to create a migration program to reformat or reload the file into the new structure. For a SQL database, you can use the following DDL statements:
ALTER TABLE film ADD COLUMN length SMALLINT CHECK (length > 0 AND length < 10000); ALTER TABLE film ADD COLUMN rating CHAR(5);
Unfortunately, while the ALTER statement allows easy modification of the DBMS storage structures—even in tables with millions of rows, it is not orthogonal with respect to version control. Carol and Dave can commit the CREATE TABLE film.sql version 0.2 to the VCS for use with new installations, but they also have to put the ALTER TABLE statements under version control, probably as part of a script to upgrade existing databases.
The ideal solution may be a CREATE OR REPLACE TABLE statement, but that does not exist.
CREATE TABLE language ( language_id INTEGER NOT NULL PRIMARY KEY, name VARCHAR(20) NOT NULL, last_update TIMESTAMP WITH TIME ZONE NOT NULL ); ALTER TABLE film ADD COLUMN language_id INTEGER NOT NULL; ALTER TABLE film ADD FOREIGN KEY (language_id) REFERENCES language (language_id);
And Carol adds this:
CREATE TABLE category ( category_id SERIAL NOT NULL PRIMARY KEY, name VARCHAR(25) NOT NULL, last_update TIMESTAMP WITH TIME ZONE NOT NULL ); CREATE TABLE film_category ( film_id INTEGER NOT NULL REFERENCES film (id), category_id INTEGER NOT NULL REFERENCES category (category_id), last_update TIMESTAMP WITH TIME ZONE NOT NULL, PRIMARY KEY (film_id, category_id) );
We again have to deal with storing in the VCS both the CREATE and ALTER statements in order to accomodate new as well as existing customers at different release levels. However, these changes also exemplify two other issues. The first is that some default DBMS actions are not visible in the DDL. For example, PostgreSQL will give default names to PRIMARY and FOREIGN KEYS, e.g., film_language_id_fkey. It will also create a SEQUENCE in response to the SERIAL data type specification and add a column DEFAULT based on that sequence. It may be useful to record these names and other objects in the VCS.
More importantly, there are now dependencies between the various tables. Previously, issuing a certain ALTER TABLE ADD COLUMN statement before another did not (or should not) have much impact on the database or application. Now, film_category can only be created after category and film exist, and language must be created before film (or before the FOREIGN KEY is added to it).
How can projects using PostgreSQL or other RDBMSs deal with the complexities of version control over the database? We’ll examine some solutions in forthcoming posts.