Where to CHECK your premises

“Contradictions do not exist. Whenever you think that you are facing a contradiction, check your premises. You will find that one of them is wrong.” — Ayn Rand 1

In both the command line and WSGI database user interfaces that I presented previously, there is an overlap (or redundancy) between the application and the database. For example, in the WSGI version, film.py has the following code:

    if not self.release_year.isdigit() or int(self.release_year) < 1888:
        self.errors['release_year'] = \
            "Release year must be a number greater than 1887"

This overlaps with the CHECK constraint on the film table, seen below in the film.yaml file:

  table film:
        - release_year
        expression: (release_year >= 1888)

Since the application prevents the user from entering a value less than 1888 for the release year, it seems the database CHECK is redundant and could be eliminated.

On the other hand, the application does not verify the uniqueness of the film’s PRIMARY KEY, its id field. Instead, if the user enters a duplicate id, the application will get a “duplicate key value violates unique constraint” error from PostgreSQL and will inform the user accordingly. Should the application implement the uniqueness check itself, by querying the database prior to sending the INSERT?

The reason for waxing philosophical about these alternatives is that in order to design a generic database user interface it would be helpful to have guidelines that resulted, as much as possible, in an optimal trade-off between redundancy and efficiency or performance.

Database Primacy

The first rule, I believe, needs to be that the database takes precedence: PRIMARY KEYs, UNIQUEness constraints, FOREIGN KEYs (referential integrity), CHECK and other constraints should be implemented, preferably declaratively, in the database schema.

Type and Attribute Constraints

In section 9.9 of An Introduction to Database Systems (8th edition), C.J. Date presents a constraint classification scheme. He defines a type constraint as “a specification of the values that make up the type in question.” The second proposed guideline is that type constraints —such as the release year CHECK constraint— should also be implemented in the user interface.

This makes it possible to give near immediate feedback to the user at a small cost in redundancy. The database will still invoke its constraint checking so the application has to be prepared to handle a database error message. Consider for example, if—in order to better restrict the input—the database were changed as follows:

ALTER TABLE film DROP CONSTRAINT film_release_year_check;
ALTER TABLE film ADD CONSTRAINT film_release_year_check
    CHECK (release_year >= 1888
           AND release_year < extract(year FROM current_date)::integer + 5);

The application could still use the simpler CHECK without affecting the database integrity. On the other hand, if we drop the constraint from the database, an invalid value can be input from outside the application.

Attribute constraints, i.e., whether “a specified attribute … is of a specified type,” are essentially handled by the application mapping an input field to a suitable programming language data type.

Table and Database Constraints

Relvar and database constraints, i.e., those involving one relvar (what most people refer as a table) or two or more relvars, are probably best checked by the DBMS, rather than pre-validated by the application. For example, although we could check if the film id was already present in the table before doing the INSERT, some other transaction could INSERT that same id just ahead of ours, and we’d still get the duplicate key message.

There may be exceptions to these rules, but I think they’ll serve most applications well.

1 Francisco d’Anconia speaking in Part I, Chapter VII, “The Exploiters and the Exploited,” of Ayn Rand’s Atlas Shrugged.