ANFSCD: Revisiting the Web Server

Nearly two years ago, I was considering which Python web framework to use for a user interface to Postgres: CherryPy, Flask, Werkzeug? Not entirely satisfied with the choices, I started reviewing even more frameworks thinking I might want to write my own minimalist framework.

Several months later, somebody (through Planet Python, IIRC) referred me to a presentation by Jacob Kaplan-Moss on the history and future of Python on the web. Surprisingly, halfway through the talk Jacob started raving about Meteor, a pure JavaScript framework, saying “we’re deluding ourselves if we think this [something like Meteor] is not the future of web applications.” This prompted me to take a close look at Meteor and several other JS frameworks.

Tarek Ziadé’s “A new development era” essay reinforced this change in direction. Ultimately, I settled on AngularJS as the (client) framework. Two-way data binding, dependency injection and testability are some of the features that won me over.

Angular opened the door to the Node.js world—which appears somewhat chaotic compared to Python’s (and even more to the staidness of Postgres). Like Python, Node.js has an abundance of web frameworks, templating libraries and other tools to choose from (and master). Aside from that, are there any negatives in continuing down this path?

For one, although Angular is an open source project, unlike Python and PostgreSQL, its destiny is controlled by a behemoth. A saving grace is its large community of contributors. And perhaps some of Angular’s innovations may eventually become part of standard HTML.

Second, in spite of Selena Deckelmann’s recent comments on JS and PG, I’m strongly partial to Python and not fond of JavaScript as an implementation language. It’s liberating not to have to use braces (and semicolons) for code structure! To compensate, CoffeeScript appears to be the obvious alternative.

When it comes to interfacing to Postgres, although I haven’t explored it enough to do justice, node-postgres doesn’t seem to be up to par with psycopg, and I’m not about to throw away the work I’ve done on Pyrseas, in particular the TTM-inspired interface. So Werkzeug may still play a part, as a Postgres-Python-to-JSON service, particularly now that it support Python 3. However, for contrast I will use node-postgres in an early implementation.

Last, the Angular team’s choice for “workflow” tool (Yeoman) did not sit well with me: I don’t care for “scaffolding” and my first experience with Grunt rubbed me the wrong way. Fortunately, in the Node.js “chaos” I found Brunch, which although not without problems, looks suitable for my purposes.

Having addressed the negatives, I’ve started work on this at GitHub, and plan to post more about it later on.

Update: Due to the change in direction, I was wondering whether I should also change the title of this blog to something like “Taming Serpents, Pachyderms and White A’s in Red Shields”, but fortunately I discovered that at least O’Reilly uses a rhinoceros as the JavaScript mascot and rhinos are considered pachyderms. 🙂


Testing Python and PostgreSQL on Windows, Part 3

As a commenter mentioned in response to Part 2, an alternative to using pip install psycopg2, which requires that you first install VC++ 2008 Express, is to download and install the Windows port, aka win-psycopg. Jason Erickson makes these builds available for several versions of Python for both 32- and 64-bit Windows.

If you’re not planning to use virtualenvs or tox (which creates virtualenvs for you), then the win-psycopg installer is the easiest way to satisfy the psycopg2 dependency (in fact, that’s how I started after the initial failure with pip). Simply download the Python 2.7 and 3.2 installers, run them and you’re done. However, the installers don’t work in a virtualenv (there is a workaround to extract the files, but I didn’t explore it because I wanted to use tox).

Another option is to build psycopg2 with the MinGW compiler. Daniele Varrazzo has a post describing this. Daniele used a special MinGW package, but I chose to install the latest (mingw-get-inst-20120426.exe) from MinGW.org (click on the Navigation – Downloads link which will take you to SourceForge).

To get the psycopg2 sources, you can download the .tar.gz package, or, since you have Git, do this from Git Bash:

git clone git://luna.dndg.it/public/psycopg2.git
cd psycopg2
git checkout 2_4_5  # or latest tag

Create a pydistutils.cfg file in your home directory (%USERPROFILE%) with the following (an alternative is to use the --compiler option to the python setup.py command below):


Make sure MinGW, Python, and PostgreSQL are in your PATH, e.g., set PATH=C:\MinGW\bin;C:\Python27;C:\Program Files\PostgreSQL\8.4\bin;%PATH%, and then run:

python setup.py build_ext build

With the latest MinGW and unless Python bug 12641 has been resolved, you’ll see the following error message:

cc1.exe: error: unrecognized command line option '-mno-cygwin'

The workaround is to edit the cygwincompiler.py file in your Python Lib\distutils directory and remove all instances of -mno-cygwin. Hopefully after that, the python setup.py above will work and then you can run the following to install it:

python setup.py install

Rinse and repeat for the Python 3.2 version and you should be ready to test connecting from both Python versions to PostgreSQL.

Aside: To remove the pip-installed psycopg2, you run pip uninstall psycopg2 from the corresponding Python environment. To remove win-psycopg, you use Control Panel’s Add or Remove Programs and click Remove on the desired version. To remove the versions installed with MinGW, I’m afraid you’ll have to resort to deleting the Lib\site-packages\psycopg2 directories and the related psycopg2-*.egg-info files.


Testing Python and PostgreSQL on Windows, Part 2

In the previous post, I covered installation of Git, PostgreSQL and Python under Windows in order to set up a Pyrseas testing and development environment. Today, we’ll explore installation of the Python dependencies.

The Hitchhiker’s Guide to Python recommends first downloading and running the distribute_setup.py script. This gives you the easy_install command but the Guide recommends installing pip (with easy_install pip) and then using pip to install all other modules.

You can use pip to install pyyaml with the following command:

pip install pyyaml

However, if you try pip install psycopg2 (or even easy_install psycopg2), it’s very likely you’ll see the error:

error: Unable to find vcvarsall.bat

As best as I’ve been able to determine the only way to get around this is by installing Microsoft Visual Express Studio. According to this email and this post, for Python 2.7, it must be the 2008 Express Studio which, to make things interesting, is no longer available from the download links given. If you search enough you may find it here (download vcsetup.exe) (Update below). After installing VC++ 2008 Express (and if you haven’t installed Strawberry Perl—a later installment in our saga), the pip install psycopg2 command should succeed.

However, if you try to import psycopg2 at the Python 2.7 prompt you may be surprised with a traceback ending in:

    from psycopg2._psycopg import BINARY, NUMBER, STRING, DATETIME, ROWID
ImportError: DLL load failed: The specified module could not be found.

Ahh … the mysteries of Windows DLLs. Don’t despair: this probably means you don’t have the PostgreSQL DLLs (libpq.dll in particular) in your PATH. Add one of the postgres\x.x\bin directories to your PATH and (hopefully) you should then be able to connect from Python 2.7 to your PostgreSQL installations.

OK, let’s turn our attention to Python 3.2. If you followed the Hitchhiker’s Guide instructions previously and added C:\Python27 to your PATH, you’ll now have to change that to C:\Python32. Suggestion: create a couple of batch scripts, e.g., env27.bat and env32.bat, so you can easily switch between the two Python installations. And don’t forget to add the postgres\x.x\bin directory as well.

For 3.2, once again run the distribute_setup.py script, easy_install pip, and pip install pyyaml, as for 2.7 above. Then you can run pip install psycopg2, and if you installed VC++ previously, the gods may smile upon you and you may see the following message:

Successfully installed psycopg2
Cleaning up...

At this point, if you followed along, you’ll have four versions of PostgreSQL (8.4 through 9.2), two versions of Python (2.7 and 3.2), each with PyYAML and psycopg2, ready for testing. If you’re anxious to check things out, invoke one of the PATH setup scripts and try the following, from the Pyrseas source directory:

C:\...\src\Pyrseas>python tests\dbobject\test_schema.py
Ran 12 tests in 1.452s


There are some alternatives to installing psycopg2 using pip and VC++ 2008.  I’ll cover those in a subsequent post.

Update: Microsoft seems to keep changing download URLs. Your best bet is to search for “Visual C++ 2008 Express download.” Currently, that should lead you to the following download link.


Database User Interfaces – Pagination

Since it’s been a while from my last post on this subject, let me recap what we’ve covered:

I also took a detour to explore Python web frameworks. One of the first comments inquired about Tornado. I recently had the opportunity to experiment with it. There’s much to admire, particularly in terms of speed. However, the use of one-or-two-method handler classes essentially for each action (see the blog demo, for example) struck me as counterproductive. It reminded me of Jack Diederich’s “Stop Writing Classes” presentation.

Getting back to database UIs, I have added basic pagination to the listing of films. Here is an example of what it looks like:

The number of lines per page is currently hard-coded (see maxlines in FilmHandler.index), but eventually it should be configurable. The principal additions are the count() and slice() methods in the bl/film.py module. The first does a SELECT COUNT(*) FROM films, so that we can determine how many pages will be needed (the upcoming PostgreSQL 9.2 should improve performance of that query). The slice() method does a SELECT like the all() method but uses LIMIT and OFFSET to retrieve the subset of rows needed for the requested page number.

There are further refinements possible. For example, the list of page numbers/links at the bottom could get very long given enough data in the table. For now, correcting this is left as an exercise for the reader.

1 I’m glad to hear that Kenneth Reitz, Armin Ronacher and others are working on “merging” Werkzeug with Requests.


A Multi-Layered Test Cake

Dobos layered cake*

A recurrent theme in software engineering are the multiple layers (or tiers) into which systems are subdivided. In the ideal case, a given layer only interacts with the immediate layers “above” and “below” it. In practice, that clean separation of responsibilities is not always possible or, some may argue, desirable.

Nevertheless, thinking in terms of layers is quite useful. For example, the database application I’ve been presenting in this series consists of a PostgreSQL database, psycopg2 (a Python DB-API adapter for PostgreSQL), a business logic module that calls on psycopg2, Werkzeug (a WSGI library), the application modules that call on the business logic and Werkzeug, the HTTP protocol, and the user’s web browser.

As application developers, we need not concern ourselves with the details of how PostgreSQL stores the data that we send to it, nor with testing its internal functions. We rely on PG hackers to cover that testing. Similarly, we rely on Psycopg2 programmers to test communications with PostgreSQL and the Werkzeug team to test its WSGI and HTTP-related functionality. We do have responsibility for testing the business logic, the application code, and the interface presented to the user via the browser.

Tag v0.4.1 of the Database UI Tutorial at GitHub adds a set of unit tests for exercising the business logic module (i.e., werkzeug/bl/film.py). The test module (werkzeug/tests/bl/test_film.py) is pretty straightforward: there is a test method for each method in the business logic class. In addition, there are methods for verifying failure conditions, such as duplicate inserts or deleting a missing record.

Although the unit tests were written after the business logic module (i.e., not in keeping with test-driven development), they helped to resolve an open design issue. Earlier, some methods in dblib.py (thin interface to psycopg2) took an “expected count” argument to check the number of rows affected by a database operation. That didn’t look right, so I had removed it.  While developing the unit tests, it became clear that the checks ought to live in the business logic layer.

Tag v0.4.2 adds unit tests for testing the WSGI or web interface. I was very pleased with how easy it was to use Werkzeug’s testing faciities, Client and BaseResponse, to create the tests: simply “drop” the database app instance into the Client and then call its get or post methods to get the response headers and data. I only had to add the standard library’s xml.dom.minidom to examine particular elements in the HTML body.

Both the business logic and web test modules depend on a utility module which is “reheated” from the Pyrseas version control testing. A nice capability is that the base DbAppTestCase, in its setUpClass method (a Python 2.7 feature), ensures the test database is in sync with the YAML spec.

Coming up: testing the user (browser) interface, via Selenium.

* Dobos cake photograph, courtesy of Bruce Tuten.


A More Robust Database Interface

The PostgreSQL server, like other client-server DBMSs, is complex. Many things can go awry. Here is a sampling of possible problems, as reported by psycopg:

psycopg2.OperationalError: FATAL:  database "None" does not exist

psycopg2.OperationalError: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

psycopg2.InterfaceError: connection already closed

psycopg2.ProgrammingError: column "film_id" does not exist
LINE 1: select * from film where film_id=19777

psycopg2.DataError: integer out of range

psycopg2.IntegrityError: duplicate key value violates unique constraint "film_pkey"

psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block

The first version of dblib.py, extracted from the minimalist command line interface, paid little or no attention to such errors. For example, connection errors simply caused a Python traceback and exit. The fetch functions caught exceptions, but returned None, implying to the caller that simply no rows were found. The more recent version (v0.2.1) used with the WSGI interface didn’t improve on much, therefore causing either HTTP 500 Internal Server errors or misleading the user about the problem.

This installment of the continuing database user interface tutorial aims to correct these shortcomings.

Since eventually we’ll deal with more logical entities than just films, the bl.py module has been moved to a bl subdirectory and renamed film.py. Instead of standalone functions, we now have a Film class, similar to the one present in the command line interface, but with the functions implemented as instance methods of the class (the get_all function is a classmethod). Each method catches database exceptions and re-raises them.

The dblib.py module now includes all the other elements of the PostgreSQL connection string. The connect method uses a DictConnection factory to allow easier (more readable) reference to row attributes. The execute method takes an additional argument, expcount, which indicates the number of rows expected to be affected by the database statement (with 2 meaning “more than 1”) and raises a DatabaseError if the rowcount does not match expectations.  The method also does a rollback automatically in case of an error (thus avoiding the InternalError seen above), and connects to the database if not already connected. The fetch methods raise exceptions rather than returning None.

Finally, the top level film.py module has been reorganized to respond to the changes above. The FilmHandler methods no longer have to connect prior to invoking a business logic or database method. The methods also catch exceptions and deal with them appropriately.

The new code is available on GitHub tagged as v0.2.2.


To MVC or not to MVC

Many (or most) application frameworks have adopted a Model-View-Controller (MVC) architecture, for example, Django (they call it MTV but it’s very similar), Pylons, Ruby on Rails, and Struts.

In The Helsinki Declaration (IT Version), Toon Koppelaars makes the case for a different method to architect and implement database applications. Instead of using MVC for what he calls “Window-on-Data” (WoD) applications, Toon suggests using the Helsinki approach, which using acronyms could perhaps be named the UI-BL-DL architecture. It consists of three layers:

  • User Interface (UI): this is equivalent to the View and Controller of MVC. It can be implemented with the front-end technology “du jour.”
  • Business Logic (BL): this is the Model code that implements queries (the read-BL sublayer) or transactions (the write-BL sublayer) in a manner required by the enterprise using the application.
  • Data Logic (DL): this includes the database design as well as integrity constraints.

In the last point, integrity constraints are meant in the broadest sense, as used in Chris Date’s An Introduction to Database Systems (8th edition), Chapter 9, i.e., much more than primary and foreign keys. Toon has in mind something like the SQL Standard CREATE ASSERTION, but he’ll settle for TRIGGERs as an implementation vehicle.

The Helsinki approach appears eminently suitable for the design of a generic database user interface (in this context, the latter is more than the UI layer listed above). Therefore, let me apply these concepts to the minimalist command line interface (a technology “du jour” at some distant time in the past) I presented before.

As a first step, I factored all the calls to Psycopg2 objects out of dbapp.py and into their own module named dblib.py. Then I went further and split out of the latter the application-specific code into a “business logic” module named bl.py, shown below:

# -*- coding: utf-8 -*-

from dblib import fetchall, fetchone, execute

def get_all(dbconn):
    return fetchall(
        dbconn, "SELECT id, title, release_year FROM film ORDER BY id")

def get_one(dbconn, id):
    return fetchone(
        dbconn, "SELECT id, title, release_year FROM film WHERE id = %s",

def insert(dbconn, film):
    return execute(
        dbconn, "INSERT INTO film VALUES "
        "(%(id)s, %(title)s, %(release_year)s)", film.__dict__)

def update(dbconn, film):
    return execute(
        dbconn, "UPDATE film SET title = %s, release_year = %s "
        "WHERE id = %s", (film.title, film.release_year, film.id))

def delete(dbconn, id):
    return execute(dbconn, "DELETE FROM film WHERE id = %s", (id,))

As can be seen, segregating this code into its own module means it could easily be re-used by a web (or some other kind of) interface. If you look at the refactored dbapp.py (v0.1.1), you’ll notice that now you’d be hard-pressed to tell which database technology is behind the application. Finally, the dblib.py module is the start of a generic low-level PostgreSQL library.

There are still some rough edges and “magic.” For example, the edit function in dbapp.py validates the release year redundantly with the CHECK constraint in the database. As I understand it, according to the Helsinki method, the UI ought to be calling the DL layer to perform these validations. Furthermore, there is magic in the BL API: the structures (film, row) passed as arguments or returned lack proper definition. Toon recommends using Bertrand Meyer’s Design by Contract (see Object Oriented Object Construction) to specify the UI to BL interface. We’ll revisit these and other topics in forthcoming posts.

As before, the code is available at GitHub.


A Minimalist Command Line Database User Interface

To begin exploring the design of a generic database user interface as mentioned in my previous post, I’ve written a minimalist command line program. To try it, do this:

$ git clone git://github.com/jmafc/database-ui-tutorial.git dbui
$ cd dbui
$ createdb moviesdev
$ yamltodb moviesdev film.yaml | psql moviesdev

Of course, there are some pre-requisites: Git, PostgreSQL, Python, Psycopg2 and Pyrseas. If you don’t want to install the latter, create the film table according to the 0.1 version here. If you simply want to look at the code, you can find dbapp.py at GitHub here.

The following is a sample usage session:

$ cmdline/dbapp.py moviesdev
   A - Add
   L - List
   U - Update
   D - Delete
   Q - Quit
Command? l
    Id Title                            Year
 19777 The Cocoanuts                    1929
 20629 All Quiet on the Western Front   1930
 20640 Animal Crackers                  1930
   A - Add
   L - List
   U - Update
   D - Delete
   Q - Quit
Command? a
Id [0]: 1234
Title []: Testing
Release year [0]: 2010
Film 'Testing - 2010' added
   A - Add
   L - List
   U - Update
   D - Delete
   Q - Quit
Command? u
Id: 1234
Updating 'Testing - 2010'
Title [Testing]: A Test Movie
Release year [2010]:
Film 'A Test Movie - 2010' updated
   A - Add
   L - List
   U - Update
   D - Delete
   Q - Quit
Command? d
Id: 1234
Delete film 'A Test Movie - 2010' (y/n) [n]: y
Film 'A Test Movie - 2010' deleted
   A - Add
   L - List
   U - Update
   D - Delete
   Q - Quit
Command? q

Caveat emptor: The code is very succint and has limited error checking.

Database User Interface Basics

Examining the sample application leads us to identify the following essential features:

  1. The application needs to present data both as single records and in list or tabular form.
  2. The data is entered and displayed as characters but possibly stored or interfaced to the DBMS in binary formats, hence conversion procedures between external and internal formats are necessary.
  3. Data items need an initialization or default value.
  4. Many or most data items need validation procedures, including a failure message. Some validations are generic to the data type, others are specific to the item.
  5. Depending on the operation mode, modification of some items may be prevented, e.g., primary key cannot be entered or changed in Update mode.
  6. One or more facilities are needed to select a record for Update or Delete, e.g., by the primary key, from a list, etc.
  7. When reporting actions taken, a short external representation of a record is desirable1.

Some of these may seem tautological, but they apply regardless of whether the user interface is line-oriented, character-oriented (like ncurses), graphical or web-based.

A program to develop database applications needs to assist the developer in specifying these features. For example, the program has to provide a means to design a single record or tabular format. The advantage of having access to the database catalogs is that the program can facilitate this by supplying useful defaults at various stages in the design process.

Once the design is finalized, we’ll probably want to store it in the database itself, but –mindful of version control– it’s recommended we keep the design in a YAML specification file that can be stored in a VCS.

1 The primary key for the film table is an integer, and in the sample session, the IMDb identifier was used as it were a “natural” key. In most applications, such an id would be auto-generated and possibly not shown to the user.


SQL Database Version Control – Pyrseas

Now that Pyrseas has been released, it’s time to see how the DBAs Carol and Dave would use it to manage the changes described in Version Control, Part 2: SQL Databases.

Version 0.1

Carol starts by creating the film table in her database, caroldev, using the CREATE TABLE script, version 0.1, which we saw in the original article. She then uses the Pyrseas dbtoyaml tool to ouput the database specification:

$ dbtoyaml caroldev
schema public:
  table film:
        - release_year
        expression: (release_year >= 1888)
    - id:
        not_null: true
        type: integer
    - title:
        not_null: true
        type: character varying(32)
    - release_year:
        not_null: true
        type: integer
        access_method: btree
        - id

This should be mostly self-explanatory, but let’s go over it. Indentation indicates structure (nesting) and hyphens are used for each element in a list. Thus, there is one schema (public) and one table (film) defined within it. The table has three columns (id, title and release_year) with the given data types and none of them are nullable. The PRIMARY KEY (film_pkey) is on the id column and there is a CHECK constraint on the release_year column. By the way, the alphabetical order within each level is simply the default output format of PyYAML.

Carol can redirect the output of dbtoyaml to a file, say, film.yaml and store it in a Git repository or some other VCS. Dave will then be able to pull the file from the repository and use it to generate the SQL statements needed to create the table in his database, or the development, test and production databases, using the yamltodb tool as follows:

$ yamltodb -1 davedev film.yaml
    id integer NOT NULL,
    title character varying(32) NOT NULL,
    release_year integer NOT NULL);
ALTER TABLE film ADD CONSTRAINT film_release_year_check CHECK (release_year >= 1888);

If he is satisfied with the SQL, he can pipe the output to psql, e.g.,

$ yamltodb -1 davedev film.yaml | psql davedev

Version 0.2

For the second version, Dave issues an ALTER TABLE to add the length column, invokes dbtoyaml to create a new film.yaml and commits that to his repository. Carol uses another ALTER TABLE to add the rating column in her database, and stores a new film.yaml in her repository.

Manny, the integration manager, merges the new film.yaml versions into the project-wide repository. He uses yamltodb to output the statements needed to upgrade to version 0.2:

$ git diff 0.1 0.2
diff --git a/film.yaml b/film.yaml
index 0633c1b..98f19c4 100644
--- a/film.yaml
+++ b/film.yaml
@@ -1,6 +1,10 @@
 schema public:
   table film:
+      film_length_check:
+        columns:
+        - length
+        expression: ((length > 0) AND (length < 10000))
         - release_year
@@ -15,6 +19,10 @@ schema public:
     - release_year:
         not_null: true
         type: integer
+    - length:
+        type: smallint
+    - rating:
+        type: character(5)
         access_method: btree
$ yamltodb moviesdev film.yaml
    ADD COLUMN length smallint;
    ADD COLUMN rating character(5);
ALTER TABLE film ADD CONSTRAINT film_length_check CHECK ((length > 0) AND (length < 10000));

Version 0.3

For the next round, Carol and Dave issue the statements shown under Version 0.3 in the previous article and commit newer versions of film.yaml which Manny integrates into the project repository. The yamltodb output against the project database is as follows:

$ yamltodb -1 film.yaml
CREATE SEQUENCE category_category_id_seq
    CACHE 1;
CREATE TABLE film_category (
    film_id integer NOT NULL,
    category_id integer NOT NULL,
    last_update timestamp with time zone NOT NULL);
CREATE TABLE category (
    category_id integer NOT NULL DEFAULT nextval('category_category_id_seq'::regclass),
    name character varying(25) NOT NULL,
    last_update timestamp with time zone NOT NULL);
ALTER SEQUENCE category_category_id_seq OWNED BY category.category_id;
ALTER TABLE category ADD CONSTRAINT category_pkey PRIMARY KEY (category_id);
ALTER TABLE film_category ADD CONSTRAINT film_category_pkey PRIMARY KEY (film_id, category_id);
ALTER TABLE film_category ADD CONSTRAINT film_category_category_id_fkey FOREIGN KEY (category_id) REFERENCES category (category_id);
ALTER TABLE film_category ADD CONSTRAINT film_category_film_id_fkey FOREIGN KEY (film_id) REFERENCES film (id);


I cannot objectively review the tools I have created, but sincerely hope others will find them useful in managing changes to PostgreSQL databases in conjunction with a VCS.

Pyrseas dbtoyaml and yamltodb require Python, psycopg2 and the PyYAML library. They currently support the basic PostgreSQL features (schemas, tables, PRIMARY KEYs, FOREIGN KEYs, UNIQUE and CHECK constraints, indexes) but the intention is to expand them to cover all features.

The primary audience of the tools are DBAs and developers, but they can be used in conjunction with something like depesz Versioning to implement a stricter system for control over production databases.

I’d like to take this opportunity to thank Adam Cornett, an early adopter/tester, who found an issue with a FOREIGN KEY across schemas, which has been fixed in the master repository.