Where to CHECK your premises

“Contradictions do not exist. Whenever you think that you are facing a contradiction, check your premises. You will find that one of them is wrong.” — Ayn Rand 1

In both the command line and WSGI database user interfaces that I presented previously, there is an overlap (or redundancy) between the application and the database. For example, in the WSGI version, film.py has the following code:

    if not self.release_year.isdigit() or int(self.release_year) < 1888:
        self.errors['release_year'] = \
            "Release year must be a number greater than 1887"

This overlaps with the CHECK constraint on the film table, seen below in the film.yaml file:

  table film:
        - release_year
        expression: (release_year >= 1888)

Since the application prevents the user from entering a value less than 1888 for the release year, it seems the database CHECK is redundant and could be eliminated.

On the other hand, the application does not verify the uniqueness of the film’s PRIMARY KEY, its id field. Instead, if the user enters a duplicate id, the application will get a “duplicate key value violates unique constraint” error from PostgreSQL and will inform the user accordingly. Should the application implement the uniqueness check itself, by querying the database prior to sending the INSERT?

The reason for waxing philosophical about these alternatives is that in order to design a generic database user interface it would be helpful to have guidelines that resulted, as much as possible, in an optimal trade-off between redundancy and efficiency or performance.

Database Primacy

The first rule, I believe, needs to be that the database takes precedence: PRIMARY KEYs, UNIQUEness constraints, FOREIGN KEYs (referential integrity), CHECK and other constraints should be implemented, preferably declaratively, in the database schema.

Type and Attribute Constraints

In section 9.9 of An Introduction to Database Systems (8th edition), C.J. Date presents a constraint classification scheme. He defines a type constraint as “a specification of the values that make up the type in question.” The second proposed guideline is that type constraints —such as the release year CHECK constraint— should also be implemented in the user interface.

This makes it possible to give near immediate feedback to the user at a small cost in redundancy. The database will still invoke its constraint checking so the application has to be prepared to handle a database error message. Consider for example, if—in order to better restrict the input—the database were changed as follows:

ALTER TABLE film DROP CONSTRAINT film_release_year_check;
ALTER TABLE film ADD CONSTRAINT film_release_year_check
    CHECK (release_year >= 1888
           AND release_year < extract(year FROM current_date)::integer + 5);

The application could still use the simpler CHECK without affecting the database integrity. On the other hand, if we drop the constraint from the database, an invalid value can be input from outside the application.

Attribute constraints, i.e., whether “a specified attribute … is of a specified type,” are essentially handled by the application mapping an input field to a suitable programming language data type.

Table and Database Constraints

Relvar and database constraints, i.e., those involving one relvar (what most people refer as a table) or two or more relvars, are probably best checked by the DBMS, rather than pre-validated by the application. For example, although we could check if the film id was already present in the table before doing the INSERT, some other transaction could INSERT that same id just ahead of ours, and we’d still get the duplicate key message.

There may be exceptions to these rules, but I think they’ll serve most applications well.

1 Francisco d’Anconia speaking in Part I, Chapter VII, “The Exploiters and the Exploited,” of Ayn Rand’s Atlas Shrugged.

About these ads

6 thoughts on “Where to CHECK your premises”

  1. The argument has been going on since IDMS/IMS came on the scene. OO proponents are just louder, and dumber.

    One of the MySql begetters (or someone like him) asserted that there are two kinds of databases: application databases and others (enterprise, generally). This, of course, is crap. The one and only reason for using a RDBMS is to support arbitrary operations by arbitrary applications against a datastore; and have the datastore come out the other side intact.

    If your data is *only* for some code you’re going to write, don’t bother with the overhead of an RDBMS, just write to *your* files and be done with it. Suck it up, and write native file I/O. You do know how to do that, Right?

    By the mentioned definition, application databases are just a sql parser in front of the file system, which is what MySql is sans innoDB. Most coders look at MySql that way and have from the beginning.

    If you’re going to pay the overhead of an RDBMS, you may as well get the benefit. Which benefit is that any client can use your data. Back in the Olde Days of VT-100 and *nix databases on RS-232, the client was just a patch of memory in the same server as the database. “Client” code executed against the “server”; the client only sent a screen image back to the VT-100. AJAX, and similar, support just this semantic just with pretty pixels instead of characters.

    Failing that, generating client side edit code from the schema is doable and has been done.

    By having the database as the implementer of constraints, one gets additional benefits, among them clean data loads from foreign sources. No need to write convoluted extra code to replicate screens for mass loads. Very neat.

    And, of course, moving data from (un/de)normalized schemas to BCNF running on multi-core/processor SSD machines means that 5NF is SOP, and fairly easy to implement with even naive’ code generators.

    1. I never imagined that my waxing philosophical would elicit such a response (I’ll let readers come up with suitable adjectives).

  2. Well, just remember, this is the title of Dr. Codd’s paper: “A Relational Model of Data for Large Shared Data Banks”. If one parses that title, and takes the stated intent seriously, as well as its corollaries, then one has the answer to the question.

    1. The point of my post is that although the database should be the primary repository and actor when it comes to checking integrity constraints, it is also reasonable to validate certain kinds of constraints by the application. Your rambling comments speak only peripherally to that (what does BCNF, 5NF and SSD have to do with my topic?).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s