Business Logic in the Database

Chris Travers recently responded to Tony Marston’s critique of an earlier post where Chris advocated “intelligent databases”1. Chris’ response is well reasoned, particularly his point that once a database is accessed by more than a single application or via third-party tools, it’s almost a given that one should attempt to push “intelligence” and business logic into the database if possible.

However, there is a paragraph in Tony’s post that merits further scrutiny:

The database has always been a dumb data store, with all the business logic held separately within the application. This is an old idea which is now reinforced by the single responsibility principle. It is the application code which is responsible for the application logic while the database is responsible for the storing and retrieval of data. Modern databases also have the ability to enforce data integrity, manage concurrency control, backup, recovery and replication while also controlling data access and maintaining database security.

If a database (actually, a shortening of DBMS—a combination of the data and the software that manages it) has always been dumb, then presumably one would never specify UNIQUE indexes. It is a business requirement that invoice or employee numbers be unique, so if all the business logic should reside in the application, then the DBA should only create a regular index and the application —all the applications and tools!— should enforce uniqueness.

Tony’s mention of “data integrity” is somewhat ambiguous because different people have varied understandings of what that covers. As C. J. Date points out, “integrity … is the part [of the relational model] that has changed [or evolved] the most over the years.”2 Perhaps Tony believes that primary keys, and unique and referential constraints should be enforced by the DBMS, but apparently an integrity constraint such as “No supplier with status less than 20 supplies any part in a quantity greater than 500”3 should instead only be code in an application (make that all applications that access that database).

As for me, as I pointed out earlier, “constraints should be implemented, preferably declaratively, in the database schema” while “type constraints … should also be implemented in the user interface” (emphasis added). Ideally, the user interface should derive its code directly from the schema.

Update: Many thanks to everyone who participated in the discussion. I think we’ve covered just about every angle pro or con incorporating business logic in the database, so I’ve closed comments now.

Update 2: Chris has an update post that may be of interest.

1 Interestingly, the first time I heard the term “intelligent database” it was from Dave Kellogg giving a marketing presentation for Ingres 6.3, which had incorporated some of the features in the UC Berkeley POSTGRES project.
2 Date, C. J. An Introduction to Database Systems, 8th Edition. 2004, p. 253.
3 Ibid., p. 254.


Where to CHECK your premises

“Contradictions do not exist. Whenever you think that you are facing a contradiction, check your premises. You will find that one of them is wrong.” — Ayn Rand 1

In both the command line and WSGI database user interfaces that I presented previously, there is an overlap (or redundancy) between the application and the database. For example, in the WSGI version, film.py has the following code:

    if not self.release_year.isdigit() or int(self.release_year) < 1888:
        self.errors['release_year'] = \
            "Release year must be a number greater than 1887"

This overlaps with the CHECK constraint on the film table, seen below in the film.yaml file:

  table film:
        - release_year
        expression: (release_year >= 1888)

Since the application prevents the user from entering a value less than 1888 for the release year, it seems the database CHECK is redundant and could be eliminated.

On the other hand, the application does not verify the uniqueness of the film’s PRIMARY KEY, its id field. Instead, if the user enters a duplicate id, the application will get a “duplicate key value violates unique constraint” error from PostgreSQL and will inform the user accordingly. Should the application implement the uniqueness check itself, by querying the database prior to sending the INSERT?

The reason for waxing philosophical about these alternatives is that in order to design a generic database user interface it would be helpful to have guidelines that resulted, as much as possible, in an optimal trade-off between redundancy and efficiency or performance.

Database Primacy

The first rule, I believe, needs to be that the database takes precedence: PRIMARY KEYs, UNIQUEness constraints, FOREIGN KEYs (referential integrity), CHECK and other constraints should be implemented, preferably declaratively, in the database schema.

Type and Attribute Constraints

In section 9.9 of An Introduction to Database Systems (8th edition), C.J. Date presents a constraint classification scheme. He defines a type constraint as “a specification of the values that make up the type in question.” The second proposed guideline is that type constraints —such as the release year CHECK constraint— should also be implemented in the user interface.

This makes it possible to give near immediate feedback to the user at a small cost in redundancy. The database will still invoke its constraint checking so the application has to be prepared to handle a database error message. Consider for example, if—in order to better restrict the input—the database were changed as follows:

ALTER TABLE film DROP CONSTRAINT film_release_year_check;
ALTER TABLE film ADD CONSTRAINT film_release_year_check
    CHECK (release_year >= 1888
           AND release_year < extract(year FROM current_date)::integer + 5);

The application could still use the simpler CHECK without affecting the database integrity. On the other hand, if we drop the constraint from the database, an invalid value can be input from outside the application.

Attribute constraints, i.e., whether “a specified attribute … is of a specified type,” are essentially handled by the application mapping an input field to a suitable programming language data type.

Table and Database Constraints

Relvar and database constraints, i.e., those involving one relvar (what most people refer as a table) or two or more relvars, are probably best checked by the DBMS, rather than pre-validated by the application. For example, although we could check if the film id was already present in the table before doing the INSERT, some other transaction could INSERT that same id just ahead of ours, and we’d still get the duplicate key message.

There may be exceptions to these rules, but I think they’ll serve most applications well.

1 Francisco d’Anconia speaking in Part I, Chapter VII, “The Exploiters and the Exploited,” of Ayn Rand’s Atlas Shrugged.