The Phantom of the Database – Part 4

At the end of November, I finished the third episode with mild suspense: I suggested that the problem of optimistic “locking” could perhaps be solved in PostgreSQL with something other than extra qualifications, row version numbers or timestamps.

Let’s start this episode with action!

moviesdb=> INSERT INTO film VALUES (47478, 'Seven  Samurai', 1956);
INSERT 0 1
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |     title      | release_year
--------+------+-------+----------------+--------------
 853969 |    0 | (0,1) | Seven  Samurai |         1956
(1 row)

moviesdb=> UPDATE film SET title = 'Seven Samurai' WHERE id = 47478;
UPDATE 1
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |     title     | release_year
--------+------+-------+---------------+--------------
 853970 |    0 | (0,2) | Seven Samurai |         1956
(1 row)

moviesdb=> UPDATE film SET title = 'Sichinin Samurai' WHERE id = 47478;
UPDATE 1
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
--------+------+-------+------------------+--------------
 853971 |    0 | (0,3) | Sichinin Samurai |         1956
(1 row)

moviesdb=> UPDATE film SET release_year = 1954 WHERE id = 47478;
UPDATE 1
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
--------+------+-------+------------------+--------------
 853972 |    0 | (0,4) | Sichinin Samurai |         1954
(1 row)

moviesdb=> VACUUM FULL film;
VACUUM
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
--------+------+-------+------------------+--------------
 853972 |    0 | (0,1) | Sichinin Samurai |         1954
(1 row)

moviesdb=> BEGIN; DELETE FROM film WHERE id = 47478; ROLLBACK;
BEGIN
DELETE 1
ROLLBACK
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  |  xmax  | ctid  |      title       | release_year
--------+--------+-------+------------------+--------------
 853972 | 853974 | (0,1) | Sichinin Samurai |         1954
(1 row)

moviesdb=> BEGIN; UPDATE film SET release_year = 1956 WHERE id = 47478;
BEGIN
UPDATE 1
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
--------+------+-------+------------------+--------------
 853975 |    0 | (0,2) | Sichinin Samurai |         1956
(1 row)

moviesdb=> ROLLBACK;
ROLLBACK
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  |  xmax  | ctid  |      title       | release_year
--------+--------+-------+------------------+--------------
 853972 | 853975 | (0,1) | Sichinin Samurai |         1954
(1 row)

What element in the queries above could be used as a surrogate “row version identifier?”  If you examine the changes carefully, you’ll notice that the xmin system column provides that capability. The ctid, a row locator, on the other hand, does not survive the VACUUM operation, and xmax is only used when a row is deleted (or updated, causing it to move).

So my suggestion, in terms of web user interfaces, is that fetching a row for a possible update should include the xmin value, e.g., in the get() method of the Film class, use the following:

    def get(self, db):
        try:
            row = db.fetchone(
                "SELECT xmin, title, release_year FROM film WHERE id = %s",
                (self.id,))

The xmin value can then be sent as a hidden field to the web client, and used in the update() method to implement optimistic concurrency control.

Update: The DB UI Tutorial tag v0.4.3 now has an implementation of this concept for the command line client.

Python Web Frameworks – Application Object

In response to my previous post, someone asked “… what’s the point? How one way is better than the other?” My response was that I’m “not trying to judge these frameworks from the perspective of a developer (at least not completely), but rather from the POV of ‘if I were writing a framework, which features are essential, which nice to have, which I can disregard'” (emphasis added). Of course, I have my own biases as a developer as to what I consider “nice” or essential.

With that clarification out of the way, let’s look at the next feature: the WSGI callable generally known as the application object.

Django doesn’t want developers to be concerned with this artifact. A Django application is the collection of model, views (controllers in standard MVC terminology) and templates (views) that gets run by the server. For those interested, the Django WSGI callable is an instance of WSGIHandler (in django.core.handlers.wsgi).

As best as I can tell, Twisted Web expects the developer to write the application object. It will get called from the WSGIResponse run() method in twisted.web.wsgi. As its documentation says, Twisted Web is “a framework for doing things with the web” but is “not a ‘web framework’ in the same sense” as Django, so it doesn’t fit well in this comparison.

Pyramid‘s Router class (in pyramid.router) is what implements the WSGI callable. The Router instance gets created from the Configurator class’s make_wsgi_app() method. Select Router attributes and methods:

  • logger
  • root_factory
  • routes_mapper
  • request_factory
  • registry (from args)
  • handle_request()

In CherryPy, the Application class is the application object, although it’s not generallly used directly. Instead a user root object is mounted on a Tree which is a registry of these applications. Major attributes and methods:

  • root (from args): root object (application handler)
  • namespaces, config: for configuration
  • wsgiapp(): a CPWSGIApp instance, to handle apps in a pipeline

Being a support library, Werkzeug doesn’t include an application object, but it has a sample in its Shortly example. Major attributes and methods:

  • dispatch_request(): dispatcher
  • jinja_env: template enviroment
  • url_map: Map() taking list of Rule()’s
  • error_404(): error handler
  • render_template(): template renderer

Like Django, Web2py isn’t interested in having developers worry about the application object. Its WSGI callable is the standalone wsgibase() function, in gluon.main.

The web.py application class (yes, not capitalized) implements its WSGI callable. Major attributes and methods:

  • autoreload (from args)
  • init_mapping (from args)
  • request()
  • handle(): matches path to mapping
  • run(): runs the developer server
  • notfound(), internalerror(): error handlers

Flask‘s Flask class is its application object, derived from flask.helpers._PackageBoundObject. Selected attributes and methods (in addition to those in Werkzeug’s example above):

  • static_url_path, static_folder (from args)
  • instance_path (from args or auto-determined)
  • view_functions
  • before/after_xxx_funcs (dicts)
  • logger
  • run()
  • route(): decorator for URL rules
  • error_handler_spec (a dict)
  • errorhandler(): decorator for error handler functions

Like Werkzeug, WebOb doesn’t include an application object, but it has a sample in its wiki example, most of it very specific to the example.

Bottle‘s Bottle class is its WSGI callable. Major attributes and methods:

  • routes (list) and router (class Router instance)
  • config (class ConfigDict instance)
  • mount(): to mount an app at a given point
  • match(): to search for matching routes
  • route(): decorator to bind a function to a URL

Pesto‘s DispatcherApp class is its application object. Major attributes and methods:

  • prefix (from args): a “mount” point
  • matchpattern(), match(): matches URLs (match is decorator)
  • urlfor(): handler/dispatcher to URL converter
  • gettarget(): returns a four-part tuple from the URI
  • status404_application(): error handler

Diva‘s Application class is the object of interest. Select attributes and methods:

  • config (from kwargs), configure()
  • routing_cfg and routing
  • locale_dir (from args)
  • template_dirs (from args)
  • templates(): template loader
  • prepare() and cleanup(): before and after methods

Summary: For developers who care about writing application code, Django and Web2py and to some extent Pyramid and CherryPy hide the existence of the WSGI callable, while other frameworks require that the programmer instantiate it or invoke it. If I were to write a framework, I’d want to make it easy on the developer as the former projects do, but wouldn’t keep it completely out of sight.

Business Logic in the Database

Chris Travers recently responded to Tony Marston’s critique of an earlier post where Chris advocated “intelligent databases”1. Chris’ response is well reasoned, particularly his point that once a database is accessed by more than a single application or via third-party tools, it’s almost a given that one should attempt to push “intelligence” and business logic into the database if possible.

However, there is a paragraph in Tony’s post that merits further scrutiny:

The database has always been a dumb data store, with all the business logic held separately within the application. This is an old idea which is now reinforced by the single responsibility principle. It is the application code which is responsible for the application logic while the database is responsible for the storing and retrieval of data. Modern databases also have the ability to enforce data integrity, manage concurrency control, backup, recovery and replication while also controlling data access and maintaining database security.

If a database (actually, a shortening of DBMS—a combination of the data and the software that manages it) has always been dumb, then presumably one would never specify UNIQUE indexes. It is a business requirement that invoice or employee numbers be unique, so if all the business logic should reside in the application, then the DBA should only create a regular index and the application —all the applications and tools!— should enforce uniqueness.

Tony’s mention of “data integrity” is somewhat ambiguous because different people have varied understandings of what that covers. As C. J. Date points out, “integrity … is the part [of the relational model] that has changed [or evolved] the most over the years.”2 Perhaps Tony believes that primary keys, and unique and referential constraints should be enforced by the DBMS, but apparently an integrity constraint such as “No supplier with status less than 20 supplies any part in a quantity greater than 500″3 should instead only be code in an application (make that all applications that access that database).

As for me, as I pointed out earlier, “constraints should be implemented, preferably declaratively, in the database schema” while “type constraints … should also be implemented in the user interface” (emphasis added). Ideally, the user interface should derive its code directly from the schema.

Update: Many thanks to everyone who participated in the discussion. I think we’ve covered just about every angle pro or con incorporating business logic in the database, so I’ve closed comments now.

Update 2: Chris has an update post that may be of interest.


1 Interestingly, the first time I heard the term “intelligent database” it was from Dave Kellogg giving a marketing presentation for Ingres 6.3, which had incorporated some of the features in the UC Berkeley POSTGRES project.
2 Date, C. J. An Introduction to Database Systems, 8th Edition. 2004, p. 253.
3 Ibid., p. 254.