Interesting database tool: Adminer

No, it’s not a tool to extract extra revenue from Google Ads. Adminer (which I guess is pronounced admin-er, rather than ad-minor) is a nifty little tool that competes with phpMyAdmin, the web-browser-based MySQL administration tool. What is interesting from my perspective is that it’s been so successful that it now also supports PostgreSQL –as well as SQLite, Oracle and SQL Server (so it also competes with phpPgAdmin).

Nifty, in the sense of “quick, agile,” is an apt description for Adminer: adminer-3.1.0.php weighs in at only 292kB and that single file is all you have to download and install. If you prefer an English-only version it’s just 188kB (there are also MySQL-only versions that are smaller yet).  It installs quickly, provided you have Apache and PHP already installed, and its responsiveness is admirable.

But perhaps what I found most impressive was its ability to display the database schema.  Here is Adminer showing the schema for the “movies” database I’ve been using in recent posts:

Its MySQL roots are visible in some places, e.g., in the table display below, the SERIAL column language_id is shown as Auto increment:

Overall, a very responsive, easy-to-use tool for administering a database.

Hat tip: Selena Marie.

SQL Database Version Control – depesz Versioning

In the previous article, we reviewed how a difference engine, as implemented by apgdiff, in conjuction with a VCS, can assist in controlling upgrades to a SQL database,

In “How to manage changes to your database?” Hubert “depesz” Lubaczewski explains why the schema diff approach hasn’t worked for him. His primary concern is that a development database tends to have extraneous objects, e.g., secondary indexes, other tables in a non-finalized state. Hubert goes on to introduce his own Versioning project designed to aid in managing database “patches.”

Today we’ll explore how Carol and Dave could use Versioning to deal with the changes described in Version Control, Part 2: SQL Databases.

Version 0.1

The Versioning tool is complementary to the overall process of developing a database application. It is more concerned with getting the deployment correct rather than helping the programmers or DBAs with figuring out what needs to be changed for the current release.

After installing Versioning, for the first release, Carol needs to create two scripts, film.sql:

BEGIN;
  SELECT _v.register_patch('film');
  CREATE TABLE film (
      id INTEGER NOT NULL PRIMARY KEY,
      title VARCHAR(32) NOT NULL,
      release_year INTEGER NOT NULL CHECK (release_year >= 1888)
  );
COMMIT;

and rollback-film.sql:

BEGIN;
  SELECT _v.unregister_patch('film');
  DROP TABLE film;
COMMIT;

The argument to the register_patch and unregister_patch functions (created in the Versioning schema _v) is the name given to the patch. The film.sql script is executed first. If there are any problems after installation and testing, rollback-film.sql can be used to remove the patch. The patches table records what patches have been installed:

moviesdb=> select * from _v.patches;
 patch_name |          applied_tsz          | applied_by | requires | conflicts
------------+-------------------------------+------------+----------+-----------
 film       | 2011-02-18 12:51:06.780811-05 | carol      | {}       | {}
(1 row)

Version 0.2

For the second version, Dave has to create two other scripts:

BEGIN;
  SELECT _v.register_patch('film-length', ARRAY['film'], NULL);
  ALTER TABLE film ADD COLUMN length SMALLINT
      CHECK (length > 0 AND length < 10000);
COMMIT;
BEGIN;
  SELECT _v.unregister_patch('film-length');
  ALTER TABLE film DROP COLUMN length;
COMMIT;

Carol also creates two scripts for her change:

BEGIN;
  SELECT _v.register_patch('film-rating', ARRAY['film'], NULL);
  ALTER TABLE film ADD COLUMN rating CHAR(5);
COMMIT;
BEGIN;
  SELECT _v.unregister_patch('film-rating');
  ALTER TABLE film DROP COLUMN rating;
COMMIT;

The second argument to the register_patch function is an array of requirements, i.e., patches that should have been installed prior to the current one. If required patches are missing, Versioning rejects the patch with an error:

moviesdb=>   SELECT _v.register_patch('film-length', ARRAY['film'], NULL);
ERROR:  Missing prerequisite(s): .

Versioning also provides a shell script, list-dependencies-from-patches.sh, that can be used to determine the order in which patches should be applied to avoid such dependency errors.

The third argument to the register_patch function is an array of conflicts, i.e., patches that would conflict with the current one.

Version 0.3

For the third release, it should be obvious that Carol and Dave have to create two new scripts each: an installation script and a rollback script. The installation scripts contain the same SQL statements as shown under Version 0.3 in the previous article but preceded by a SELECT _v.register_patch() and wrapped in transaction statements. The rollback scripts consist of SQL statements to undo the effect of the installation statements, again together with an unregister_patch call and wrapped by BEGIN/COMMIT, e.g., here is Dave’s rollback-film-language.sql:

BEGIN;
  SELECT _v.unregister_patch('film-language');
  ALTER TABLE film DROP COLUMN language_id;
  DROP TABLE language;
COMMIT;

Review

Versioning is simply a set of SQL scripts so it should run on any supported PostgreSQL environment, but the list-dependencies-from-patches.sh script requires bash. It needs PL/pgSQL so you’ll have to create the language in any pre-9.0 database if not present.  With Versioning, the VCS will store the patch installation and rollback scripts, but not necessarily a representation of the overall database, although that could be satisfied by storing the pg_dump -s outputs at each release.

To determine which SQL statements to include in the patch scripts, Carol and Dave will have to (a) use Versioning in their own development databases, (b) keep a record of the database changes as they make them, or (c) inspect the differences between the development and production databases, perhaps using a tool such as apgdiff. The first option seems to preclude the use of developer-friendly tools such as pgAdmin to modify table structures. Most options require fairly knowledgeable database developers since they have to understand not only how to make a change but also how to undo it, including being aware of dependencies between database objects.

On the other hand, use of Versioning during development will tend to ensure that changes to table structures are adequately tested–a concern voiced by Kenneth Downs. The Versioning tool can also help with complex upgrade scenarios such as renaming tables or columns without losing existing data, splitting a column into two, or fancier database refactorings.

SQL Database Version Control – apgdiff

In “Database Development: Table Structure Changes,” Kenneth Downs argued that build scripts to modify table structures “are usually the least tested part of the system.” He suggested using a single metadata file that describes the database and can be placed under version control, and a difference engine as the tool of choice to perform database upgrades.

Another PostgreSQL Diff Tool (aka apgdiff) is a utility that implements such an engine.  For its metadata, apgdiff takes two pg_dump -s output files, representing previous and new database schemas, and generates ALTER TABLE and other SQL statements needed to upgrade from previous to new.

In this installment of our series on using VCS tools while developing a SQL-based application, we’ll see how Carol and Dave can use apgdiff to control the changes described in Version Control, Part 2: SQL Databases.

Version 0.1

To start off, Carol creates the film table in her database, caroldev, using the CREATE TABLE script, version 0.1, which we saw in the previous article. She also creates a general development database, moviesdev, and issues the following commands:

$ pg_dump -s moviesdev > film0.sql
$ pg_dump -s caroldev > film.sql

The film0.sql file is essentially empty, consisting only of some SET, REVOKE and GRANT statements. The film.sql file includes statements to CREATE and ALTER the film table, as can be seen in the following diff:

$ diff -u film0.sql film.sql
--- film0.sql
+++ film.sql
@@ -9,6 +9,34 @@
 SET client_min_messages = warning;
 SET escape_string_warning = off;

+SET search_path = public, pg_catalog;
+
+SET default_tablespace = '';
+
+SET default_with_oids = false;
+
+--
+-- Name: film; Type: TABLE; Schema: public; Owner: carol; Tablespace:
+--
+
+CREATE TABLE film (
+    id integer NOT NULL,
+    title character varying(32) NOT NULL,
+    release_year integer NOT NULL,
+    CONSTRAINT film_release_year_check CHECK ((release_year >= 1888))
+);
+
+
+ALTER TABLE public.film OWNER TO carol;
+
+--
+-- Name: film_pkey; Type: CONSTRAINT; Schema: public; Owner: carol; Tablespace:
+--
+
+ALTER TABLE ONLY film
+    ADD CONSTRAINT film_pkey PRIMARY KEY (id);
+
+
 --
 -- Name: public; Type: ACL; Schema: -; Owner: postgres
 --

Then Carol can use apgdiff to generate the SQL statements needed to create the table in the development and production databases, e.g.,

$ apgdiff film0.sql film.sql
CREATE TABLE film (
 id integer NOT NULL,
 title character varying(32) NOT NULL,
 release_year integer NOT NULL
);

ALTER TABLE film
 ADD CONSTRAINT film_pkey PRIMARY KEY (id);

ALTER TABLE film
 ADD CONSTRAINT film_release_year_check CHECK ((release_year >= 1888));

In this initial phase, the diff and apgdiff outputs are not that … different. The value of apgdiff will be more obvious in subsequent versions.

As part of the 0.1 release process, Carol commits film.sql to the VCS (she could also commit film0.sql as the film.sql base version, if desired). She may also want to store the apgdiff output in the VCS, since the latter is what will actually be used to upgrade the production database(s).

Version 0.2

For the next version, Dave clones the VCS repository and uses the film.sql script or the saved apgdiff output to create the film table in his own database. He then issues an ALTER TABLE to add the length column, invokes pg_dump -s to create a new film.sql and commits that to his repository. Carol uses another ALTER TABLE to add the rating column in her database, and stores the new pg_dump output to her repository.

Manny, the integration manager, merges the new film.sql versions into the project-wide repository (he has to deal with table ownership statements but perhaps the pg_dump outputs can be stripped of such statements prior to committing them to the VCS). Manny then uses apgdiff to produce the statements needed to upgrade to version 0.2:

$ git diff 0.1 0.2
diff --git a/film.sql b/film.sql
index 799b418..db820bd 100644
--- a/film.sql
+++ b/film.sql
@@ -23,6 +23,9 @@ CREATE TABLE film (
 id integer NOT NULL,
 title character varying(32) NOT NULL,
 release_year integer NOT NULL,
+    length smallint,
+    rating character(5),
+    CONSTRAINT film_length_check CHECK (((length > 0) AND (length < 10000))),
 CONSTRAINT film_release_year_check CHECK ((release_year >= 1888))
 );

$ git checkout 0.1
$ cp film.sql film.sql-0.1
$ git checkout 0.2
$ apgdiff film.sql-0.1 film.sql

ALTER TABLE film
 ADD COLUMN length smallint,
 ADD COLUMN rating character(5);

ALTER TABLE film
 ADD CONSTRAINT film_length_check CHECK (((length > 0) AND (length < 10000)));

Version 0.3

For the next round, Carol and Dave issue the statements shown under Version 0.3 in the previous article and commit newer versions of film.sql which Manny then integrates into the project repository. The apgdiff output comparing the two versions is as follows:

$ apgdiff film.sql-0.2 film.sql

CREATE SEQUENCE category_category_id_seq
 START WITH 1
 INCREMENT BY 1
 NO MAXVALUE
 NO MINVALUE
 CACHE 1;

CREATE TABLE category (
 category_id integer DEFAULT nextval('category_category_id_seq'::regclass) NOT NULL,
 name character varying(25) NOT NULL,
 last_update timestamp with time zone NOT NULL
);

CREATE TABLE film_category (
 film_id integer NOT NULL,
 category_id integer NOT NULL,
 last_update timestamp with time zone NOT NULL
);

CREATE TABLE "language" (
 language_id integer NOT NULL,
 name character varying(20) NOT NULL,
 last_update timestamp with time zone NOT NULL
);

ALTER TABLE film
 ADD COLUMN language_id integer NOT NULL;

ALTER TABLE category
 ADD CONSTRAINT category_pkey PRIMARY KEY (category_id);

ALTER TABLE film_category
 ADD CONSTRAINT film_category_pkey PRIMARY KEY (film_id, category_id);

ALTER TABLE "language"
 ADD CONSTRAINT language_pkey PRIMARY KEY (language_id);

ALTER TABLE film
 ADD CONSTRAINT film_language_id_fkey FOREIGN KEY (language_id) REFERENCES language(language_id);

ALTER TABLE film_category
 ADD CONSTRAINT film_category_category_id_fkey FOREIGN KEY (category_id) REFERENCES category(category_id);

ALTER TABLE film_category
 ADD CONSTRAINT film_category_film_id_fkey FOREIGN KEY (film_id) REFERENCES film(id);

You may notice a small discrepancy when running the above (tested with PostgreSQL 8.4.5 and apgdiff 2.3) to upgrade a second database and then compare the output from the pg_dump -s commands. In the original database, the category_category_id_seq sequence is owned by category.category_id, but this is missing from the upgraded database.

Review

Apgdiff appears to be a useful tool to manage changes to PostgreSQL databases in conjunction with a VCS. It does require that you run pg_dump -s and that you store its output in the VCS. However, apgdiff could be used against a set of manually-maintained SQL scripts.

Apgdiff requires Java 1.6 and Debian, Ubuntu and Gentoo have native apgdiff packages. It supports several (but not all) PostgreSQL features. For example, it omits the TYPEs, DOMAINs and AGGREGATEs used in the Pagila sample database (it was able to recreate 14 out of 21 Pagila tables). It depends on parsing SQL DDL syntax —albeit not fully, so it is susceptible to syntax changes between PostgreSQL releases. It cannot deal with RENAMEs, e.g., if the rating column is renamed to mpaa_rating, apgdiff will drop the first and add the second—not ideal in a populated database.

Apgdiff does not operate on the actual database being upgraded but merely on textual representations. On the plus side, it can be used to generate backout scripts, e.g., by diffing film.sql version 0.3 vs. 0.2 you can generate statements to undo the changes from 0.2 to 0.3. The next tool we’ll review is particularly concerned with this aspect of the upgrade process.

Version Control, Part 2: SQL Databases

In Part 1 we saw how Alice and Bob used a VCS to manage changes to a file of movie information.  In this post, we’ll examine how Carol and Dave, database specialists, deal with similar changes while developing the same application against a SQL-based DBMS.

Version 0.1

When using a relational DBMS, the database designer, administrator (DBA) or programmer defines the data structures, i.e., the tables and columns, using SQL data definition language (DDL), such as this:

CREATE TABLE film (
    id INTEGER NOT NULL PRIMARY KEY,
    title VARCHAR(32) NOT NULL,
    release_year INTEGER NOT NULL CHECK (release_year >= 1888)
);

By adding PRIMARY KEY on the id line, the responsibility for ensuring unique identifiers no longer rests with the application but with the DBMS (but the application needs to be prepared to deal with duplicate ID errors).

Similarly, the validation of release_year is now performed by the DBMS as specified by the CHECK constraint. The application may still implement validations, but the DBMS ensures data integrity even when records are updated using tools external to the primary application.

The DDL can be added to a VCS, say, as a script named film.sql. The application may still use a file like the film.h seen previously to represent the collection of data retrieved from the DBMS or input by a user.  For example, Alice can surround the typedef struct film by EXEC SQL BEGIN|END DECLARE SECTION statements to use it in an ECPG program.

Version 0.2

For the second version, Dave wants to add the movie length column, just as Bob did earlier. In his test environment, he can simply edit film.sql and add length SMALLINT at the end. Similarly, Carol can add rating CHAR(5) to her copy of film.sql and recreate the film table to test her changes. They could even merge their changes using the VCS and store a version 0.2 as follows:

CREATE TABLE film (
    id INTEGER NOT NULL PRIMARY KEY,
    title VARCHAR(32) NOT NULL,
    release_year INTEGER NOT NULL CHECK (release_year >= 1888),
    length SMALLINT CHECK (length > 0 AND length < 10000),
    rating CHAR(5)
);

This approach may be OK for testing (and new customers), but is not satisfactory for production environments with numerous records already in the database. For the flat-file scenario, there was no choice but to create a migration program to reformat or reload the file into the new structure. For a SQL database, you can use the following DDL statements:

ALTER TABLE film ADD COLUMN length SMALLINT
    CHECK (length > 0 AND length < 10000);
ALTER TABLE film ADD COLUMN rating CHAR(5);

Unfortunately, while the ALTER statement allows easy modification of the DBMS storage structures—even in tables with millions of rows, it is not orthogonal with respect to version control. Carol and Dave can commit the CREATE TABLE film.sql version 0.2 to the VCS for use with new installations, but they also have to put the ALTER TABLE statements under version control, probably as part of a script to upgrade existing databases.

The ideal solution may be a CREATE OR REPLACE TABLE statement, but that does not exist.

Version 0.3

For the next release, Dave adds the following:

CREATE TABLE language (
    language_id INTEGER NOT NULL PRIMARY KEY,
    name VARCHAR(20) NOT NULL,
    last_update TIMESTAMP WITH TIME ZONE NOT NULL
);
ALTER TABLE film ADD COLUMN language_id INTEGER NOT NULL;
ALTER TABLE film ADD FOREIGN KEY (language_id)
    REFERENCES language (language_id);

And Carol adds this:

CREATE TABLE category (
    category_id SERIAL NOT NULL PRIMARY KEY,
    name VARCHAR(25) NOT NULL,
    last_update TIMESTAMP WITH TIME ZONE NOT NULL
);
CREATE TABLE film_category (
    film_id INTEGER NOT NULL REFERENCES film (id),
    category_id INTEGER NOT NULL
        REFERENCES category (category_id),
    last_update TIMESTAMP WITH TIME ZONE NOT NULL,
    PRIMARY KEY (film_id, category_id)
);

We again have to deal with storing in the VCS both the CREATE and ALTER statements in order to accomodate new as well as existing customers at different release levels.  However, these changes also exemplify two other issues. The first is that some default DBMS actions are not visible in the DDL.  For example, PostgreSQL will give default names to PRIMARY and FOREIGN KEYS, e.g., film_language_id_fkey. It will also create a SEQUENCE in response to the SERIAL data type specification and add a column DEFAULT based on that sequence.  It may be useful to record these names and other objects in the VCS.

More importantly, there are now dependencies between the various tables. Previously, issuing a certain ALTER TABLE ADD COLUMN statement before another did not (or should not) have much impact on the database or application. Now, film_category can only be created after category and film exist, and language must be created before film (or before the FOREIGN KEY is added to it).

How can projects using PostgreSQL or other RDBMSs deal with the complexities of version control over the database? We’ll examine some solutions in forthcoming posts.

Version Control, Part I: Pre-SQL

A version control system (VCS) such as Git or Subversion is an indispensable component of modern software projects. Many applications use a relational database management system (DBMS) such as PostgreSQL. While a DBMS facilitates an application’s data management tasks, it presents some version control challenges.

To understand the version control issues we’ll examine a project with two programmers, Alice and Bob, developing an application loosely based on the Pagila sample database. We’ll start our tour with an application predating the use of SQL.

When developing a file-based application in C, C++ or other programming languages, you typically define the data structures in your source code, e.g., in header files that are included or otherwise referenced by the application code.

Version 0.1

For example, a C application to maintain information about movies may start off by defining the following in a file named film.h:

typedef struct
{
    int id;
    char title[32];
    int release_year;
} FILM;

The typedef may represent the actual layout of the file records as well as the application buffers used to hold the data input by users or to be displayed on screen.  In addition, the application will need code to ensure the uniqueness of the film identifiers, e.g., to maintain a file index, and to validate the release year upon input, e.g., release_year >= 1888.

The film.h file and the rest of the application are submitted to a VCS and are released as version 0.1.

Version 0.2

For the next version, Bob adds short length; to the end of the FILM structure.  He also adds help text to tell users the new field is to capture the length of the movie in minutes and code to validate that length > 0 and length < 10000.

Separately, Alice adds char rating[6]; to the end of the struct, application help text explaining this is for the MPAA rating, and code to validate the input against a list of known values.

Finally, Bob and Alice submit their changes to the VCS and either he or she (or an integration manager) merge the changes and settle on the following structure:

typedef struct
{
    int id;
    char title[32];
    int release_year;
    short length;
    char rating[6];
} FILM;

The VCS allows Bob and Alice to keep track of changes to the FILM structure, e.g., to diff the tagged revisions of film.h:

$ git diff 0.1 0.2
diff --git a/film.h b/film.h
index 8670425..1df5aaa 100644
--- a/film.h
+++ b/film.h
@@ -5,4 +5,6 @@ typedef struct
     int id;
     char title[32];
     int release_year;
+    short length;
+    char rating[6];
 } FILM;

To allow users to migrate existing applications, a conversion program is also needed to reformat the data file.

Version 0.3

For the third revision, Alice adds a genre array to FILM, and a CATEGORY structure to allow users to maintain the allowed values for genres.  In the meantime, Bob adds a language field to the FILM structure, together with an application-maintained array of language codes and names.  Once merged and released, the structures look as follows (with comments):

typedef struct
{
    int id;            /* genre/category ID */
    char name[26];     /* genre name */
} CATEGORY;

typedef struct
{
    int id;            /* film ID */
    char title[32];    /* film title */
    int release_year;  /* release year */
    short length;      /* length in minutes */
    char rating[6];    /* MPAA rating */
    int categories[5]; /* film genres */
    int language;      /* spoken language code */
} FILM;

The conversion program has to be updated to migrate to the expanded structure, but the VCS still supports comparison of different revisions.  You can see it also allows each programmer to work independently on unrelated features and a distributed VCS such as Git can keep track of the merge actions, as seen below—version 0.2 was merged by the project manager whereas for version 0.3 Bob pulled Alice’s change before committing:

$ git log --pretty=format:"%h %s" --graph
* eb0a385 Add language.
* e6568c5 Add film genres.
*   f404fe2 Merge Alice's add rating.
|\ 
| * bdc8247 Add MPAA rating.
* | 5772f38 Add length.
|/ 
* 76ecd04 First version

With this preliminary flat-file scenario established, we’re ready to move to our next installment where we will contrast the above to the situation using a SQL DBMS.