Dueling Frameworks

Cue the music

In this corner, the time-tested CherryPy, standing at version 3.2, six years old, enabled for Python 3, sporting an HTTP 1.1-compliant WSGI webserver, support for other WSGI servers or adapters, a plugin mechanism, built-in tools and much more.

In the other corner, the newcomer Flask, at version 0.7.2, about one year old (but with older ancestry), claiming to be a “microframework … with good intentions” and “fun.”

In my post A Funny Thing Happened on the Way to the Webserver I mentioned I took a look at the second contender and liked what I saw. First impressions, as they say, can be deceiving. What may seem appealing is sometimes harder to use or disappointing when you get down to writing code and testing it.

Coincidentally, after I had decided on the “Dueling Banjos” theme, Audrey Roy pointed me1 to Richard Jones’ “Web micro-framework Battle!” which provided additional food for thought.

As you may expect, my selection criteria don’t exactly match Richard’s. Perhaps the most striking difference is how he implemented routing (URL mapping, page 14 of the presentation) for the baseline cgi+wsgiref implementation vs. my choice for doing dispatching in my minimalist WSGI database UI (see in particular, the dispatch methods in dbapp.py and film.py).

Like Richard, I don’t appreciate “magic” in a framework or application. but I’m less concerned with strictly RESTful URLs. Richard appears to prefer the decorator approach to URL mapping (as seen by his Bottle, Flask and Pesto examples, among others), but also used URL mapping tables (seen in the web.py, Werkzeug and baseline examples).

The latter approach reminds me of a generic menu interface I wrote eons ago in C. It may be OK for a vtable in C++, but I usually find it inappropriate for application level code. In fact, the mapping table seems “magical” since it doesn’t encapsulate the code with the corresponding URL.

Although Flask looked attractive due to its URL routing decorator, it now appears limiting because it can only be used on functions but not on class methods. (Note: I haven’t explored all of Flask, e.g., haven’t looked at its Blueprints).

As a result, refactoring the WSGI application to use CherryPy was quite straightforward, whereas changing it to accomodate Flask involved quite a bit of surgery. This is exemplified by these Git stats:

 cherrypy/film.py |   73 ++++++++++++++++-------------------------------------
 1 files changed, 22 insertions(+), 51 deletions(-)

 flask/film.py |  235 ++++++++++++++++++++++++++-------------------------------
 1 files changed, 107 insertions(+), 128 deletions(-)

While many modifications simply involved indentation and removing the self parameter, other changes were necessary. For example, as there can only be one mapping to a function named ‘index’, I had to rename the film.py ‘index’ function/method to ‘list’ to avoid conflict with dbapp.py. This does not bode well for extensibility: a real application would support multiple entities—films, actors, customers, etc.—with similar function/methods. Maybe this is what Blueprints are for, but why invent another wheel when Python classes are available?

There were other minor annoyances with Flask, like how to instantiate its application object without adding a subdirectory, or how to pass the database connection to the film.py functions without using a global variable.

I may continue to explore Flask, although at the moment CherryPy seems more suitable to my purposes. I’m also intrigued by some concepts of Richard’s winners—Pesto and Bottle. A Pesto-like dispatcher with support for class methods on top of CherryPy would probably make my day.

The code implementing the CherryPy database UI interface is on GitHub tagged as version v0.3.0 and the corresponding Flask code is also there tagged as v0.3.1.

1 BTW, +1 for Audrey’s wish list item: Python in the Browser (see pages 16-18 of her presentation).

Audit Trails

Back in February, I defined the film table version 0.1 with just three columns:

    title VARCHAR(32) NOT NULL,
    release_year INTEGER NOT NULL CHECK (release_year >= 1888)

It was unclear then how far I’d go in supporting all of PostgreSQL data definition features. Now, Pyrseas is making progress towards covering all those capabilities.

Consequently, I’d like to add one column to the above and tag it as version 0.1.1. This is the column to be added (from the Pagila sample database):

   last_update timestamp with time zone NOT NULL DEFAULT now()

With this new column, I’ll be able to explore another kind of field in the database user interface: one that will not be present on new records (because the DEFAULT will take care of filling in that data), but will be displayed, in readonly mode, for updates. Note: In Pagila, last_update is defined as timestamp without time zone, but that seems too ambiguous for my taste.

To complete this, we need an ON UPDATE trigger together with a function to be called by the trigger. These are also provided by Pagila, but to highlight some of the capabilities of Pyrseas, I’ll first show the output of dbtoyaml against a database with all the necessary objects:

language plpgsql:
  trusted: true
schema public:
  description: standard public schema
  function last_updated():
    language: plpgsql
    returns: trigger
    source: "\nBEGIN\n    NEW.last_update = CURRENT_TIMESTAMP;\n    RETURN NEW;\n\
      END "
  table film:
        - release_year
        expression: (release_year >= 1888)
    - id:
        not_null: true
        type: integer
    - title:
        not_null: true
        type: character varying(32)
    - release_year:
        not_null: true
        type: integer
    - last_update:
        default: now()
        not_null: true
        type: timestamp with time zone
        access_method: btree
        - id
        - update
        level: row
        procedure: last_updated()
        timing: before

If you feed this YAML to yamltodb against a database that already has the film table version 0.1, it generates the following (against a PostgreSQL 8.4 DBMS—vs. 9.0 the creation of PL/pgSQL is bypassed):

SET check_function_bodies = false;
CREATE FUNCTION last_updated() RETURNS trigger
    LANGUAGE plpgsql
    AS $_$
    NEW.last_update = CURRENT_TIMESTAMP;
END $_$;
    ADD COLUMN last_update timestamp with time zone NOT NULL DEFAULT now();
CREATE TRIGGER last_updated
    EXECUTE PROCEDURE last_updated();

Thus we can successfully upgrade the version 0.1 database to version 0.1.1, with our audit trail column.

Happy trails to you, ’till we meet again …

A Funny Thing Happened on the Way to the Webserver

Some years ago, I was looking for forum software, preferably written in Python, so I found Pocoo’s original bulletin board project. Since then, I’ve kept an eye on the Pocoo team, experimenting early on with Werkzeug, and using Pygments (indirectly) and Sphinx (for the documentation of the Pyrseas project). I also heard about Jinja but, I guess because I was already invested in Mako Templates, wasn’t much interested in a Django Templates lookalike.

Fast forward to the present and this database user interfaces tutorial and project. My next step was going to replace my dinky WSGI “server” and associated routing code by a robust and substantial alternative. I was almost set on using CherryPy, but had some doubts.

CherryPy is a solid WSGI server framework. I chose it on previous projects, over others such as Zope, Django, TurboGears and Pylons, primarily because it’s relatively lightweight and, as they say, “agnostic.” CP doesn’t include nor even suggest its own template language, ORM or database layer, or JavaScript library. The one thing that may take some getting used is its URI mapping scheme.

In spite of the above, I thought that CherryPy was perhaps too heavy a dependency to add to the project and particularly to this tutorial. So I kept my eyes and ears open to alternatives.

About a month ago, I briefly encountered Flask and dismissed it, thinking, “Hmm, just Werkzeug and Jinja put together.” I’m not quite sure what prompted me to look further this past week. Maybe it was the recent comments by Armin and P.J. Eby’s responses introducing WSGI Lite.

The funny thing is I delved into Flask and liked what I saw. The URL routing decorator is quite attractive, and useful. I was largely convinced by the argument for having one template engine. The question then became: should I replace Mako by Jinja2? And, can Jinja2 do something similar to those Mako <%def%> tags I talked about earlier?

I was pleasantly surprised. In fact, Jinja’s equivalent for those tags are called macros and its documentation uses an HTML <input> for its example of a macro.

The bottom line is I very easily replaced Mako by Jinja2 (code in GitHub tagged as v0.2.3). I’m still exploring Flask and considering it as an alternative to CherryPy as the underlying framework.

A More Robust Database Interface

The PostgreSQL server, like other client-server DBMSs, is complex. Many things can go awry. Here is a sampling of possible problems, as reported by psycopg:

psycopg2.OperationalError: FATAL:  database "None" does not exist

psycopg2.OperationalError: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

psycopg2.InterfaceError: connection already closed

psycopg2.ProgrammingError: column "film_id" does not exist
LINE 1: select * from film where film_id=19777

psycopg2.DataError: integer out of range

psycopg2.IntegrityError: duplicate key value violates unique constraint "film_pkey"

psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block

The first version of dblib.py, extracted from the minimalist command line interface, paid little or no attention to such errors. For example, connection errors simply caused a Python traceback and exit. The fetch functions caught exceptions, but returned None, implying to the caller that simply no rows were found. The more recent version (v0.2.1) used with the WSGI interface didn’t improve on much, therefore causing either HTTP 500 Internal Server errors or misleading the user about the problem.

This installment of the continuing database user interface tutorial aims to correct these shortcomings.

Since eventually we’ll deal with more logical entities than just films, the bl.py module has been moved to a bl subdirectory and renamed film.py. Instead of standalone functions, we now have a Film class, similar to the one present in the command line interface, but with the functions implemented as instance methods of the class (the get_all function is a classmethod). Each method catches database exceptions and re-raises them.

The dblib.py module now includes all the other elements of the PostgreSQL connection string. The connect method uses a DictConnection factory to allow easier (more readable) reference to row attributes. The execute method takes an additional argument, expcount, which indicates the number of rows expected to be affected by the database statement (with 2 meaning “more than 1″) and raises a DatabaseError if the rowcount does not match expectations.  The method also does a rollback automatically in case of an error (thus avoiding the InternalError seen above), and connects to the database if not already connected. The fetch methods raise exceptions rather than returning None.

Finally, the top level film.py module has been reorganized to respond to the changes above. The FilmHandler methods no longer have to connect prior to invoking a business logic or database method. The methods also catch exceptions and deal with them appropriately.

The new code is available on GitHub tagged as v0.2.2.

Templating for Web Database Interfaces

If you count the number of lines in the HTML files of the (corrected) v0.2 of the minimalist WSGI database interface I presented earlier, you’ll find 228 non-blank lines. However, there are only 91 unique lines. In other words, there is 60% repetition. Not a good example of the Don’t Repeat Yourself (DRY) principle.

In addition, film.py has seven lines with embedded HTML tags. It would be best if the HTML formatting code (i.e., the View code in MVC terms) resided all within the templates.

Therefore, it is time to replace the bare-bones templating scheme with something more full-featured.

Over the past few years, I have used or experimented with Cheetah, Django templates, Genshi, Kid, Mako, Mighty,  Zope’s TAL and perhaps others that don’t readily come to mind. I’ve settled on Mako primarily for the features I’ll explain below.

When creating database interfaces, particularly of the CRUD or administrative type, much of the repetition comes from defining input elements for each database attribute, e.g., <input>, <select> and <textarea> HTML elements. Moreover, the HTML form to input a new record may be almost identical —but not quite— to the form to update or display an existing record.

My solution to this DRY dilemma has been to use Mako’s <%def> tags, in conjunction with Mako namespaces, to create a library of HTML form “functions” that simplify the specification of input elements.

The following excerpt from templates/forms.html illustrates these concepts:

% if errors and name in errors:
  <div class="errmsg">${errors[name]}</div>
% endif
<%def name="text(name, label, size, value=None, readonly=False)">
  <label for="${name}">${label}: </label>
  <input type="text" id="${name}" name="${name}" size="${size}"\
${value and (' value="%s"' % unicode(value).replace('"', '&quot;')) or ''}\
${readonly and ' readonly="readonly"' or ''} />

The following is templates/film/fields.html which is used by both the new.html and edit.html templates to define the same three input fields:

<%namespace name="form" file="/forms.html"/>
<%page args="edit=False"/>
  <%form:text name="id" label="Id" size="10" value="${id}"
  <%form:text name="title" label="Title" size="64" value="${title}"/>
  <%form:text name="release_year" label="Release Year" size="10"

The templates/film/new.html file incorporates the above as follows:

<%namespace file="fields.html" name="fields"/>

If you run the application and press the Save button on the New Film page without entering anything, the three files above combine to generate the following HTML for the Id field:

<div class="form-row">
  <div class="errmsg">Id must be a positive integer</div>
  <label for="id">Id: </label>
  <input type="text" id="id" name="id" size="10" />

The errmsg <div> comes courtesy of the application passing a form ‘errors’ argument to the render() function and the opening %if in forms.html. You can also see that the text %def creates the form-row <div> as well as both the <label> and <input> elements. In edit.html, fields.body() is called with edit=True and as a result, the input element also gets a readonly attribute.

The forms.html “library” includes support for text, checkbox, select and textarea elements (the latter three are not currently used by this application). Counting unique vs. overall lines in the templates we now find only 25% repetition which is a good improvement over the original.

The complete code is available at GitHub, tagged as v0.2.1.

I’d like to take this opportunity to thank Mike Bayer for creating Mako templates and to encourage others to give them a try.

Where to CHECK your premises

“Contradictions do not exist. Whenever you think that you are facing a contradiction, check your premises. You will find that one of them is wrong.” — Ayn Rand 1

In both the command line and WSGI database user interfaces that I presented previously, there is an overlap (or redundancy) between the application and the database. For example, in the WSGI version, film.py has the following code:

    if not self.release_year.isdigit() or int(self.release_year) < 1888:
        self.errors['release_year'] = \
            "Release year must be a number greater than 1887"

This overlaps with the CHECK constraint on the film table, seen below in the film.yaml file:

  table film:
        - release_year
        expression: (release_year >= 1888)

Since the application prevents the user from entering a value less than 1888 for the release year, it seems the database CHECK is redundant and could be eliminated.

On the other hand, the application does not verify the uniqueness of the film’s PRIMARY KEY, its id field. Instead, if the user enters a duplicate id, the application will get a “duplicate key value violates unique constraint” error from PostgreSQL and will inform the user accordingly. Should the application implement the uniqueness check itself, by querying the database prior to sending the INSERT?

The reason for waxing philosophical about these alternatives is that in order to design a generic database user interface it would be helpful to have guidelines that resulted, as much as possible, in an optimal trade-off between redundancy and efficiency or performance.

Database Primacy

The first rule, I believe, needs to be that the database takes precedence: PRIMARY KEYs, UNIQUEness constraints, FOREIGN KEYs (referential integrity), CHECK and other constraints should be implemented, preferably declaratively, in the database schema.

Type and Attribute Constraints

In section 9.9 of An Introduction to Database Systems (8th edition), C.J. Date presents a constraint classification scheme. He defines a type constraint as “a specification of the values that make up the type in question.” The second proposed guideline is that type constraints —such as the release year CHECK constraint— should also be implemented in the user interface.

This makes it possible to give near immediate feedback to the user at a small cost in redundancy. The database will still invoke its constraint checking so the application has to be prepared to handle a database error message. Consider for example, if—in order to better restrict the input—the database were changed as follows:

ALTER TABLE film DROP CONSTRAINT film_release_year_check;
ALTER TABLE film ADD CONSTRAINT film_release_year_check
    CHECK (release_year >= 1888
           AND release_year < extract(year FROM current_date)::integer + 5);

The application could still use the simpler CHECK without affecting the database integrity. On the other hand, if we drop the constraint from the database, an invalid value can be input from outside the application.

Attribute constraints, i.e., whether “a specified attribute … is of a specified type,” are essentially handled by the application mapping an input field to a suitable programming language data type.

Table and Database Constraints

Relvar and database constraints, i.e., those involving one relvar (what most people refer as a table) or two or more relvars, are probably best checked by the DBMS, rather than pre-validated by the application. For example, although we could check if the film id was already present in the table before doing the INSERT, some other transaction could INSERT that same id just ahead of ours, and we’d still get the duplicate key message.

There may be exceptions to these rules, but I think they’ll serve most applications well.

1 Francisco d’Anconia speaking in Part I, Chapter VII, “The Exploiters and the Exploited,” of Ayn Rand’s Atlas Shrugged.