More Database Tools?

It’s been over year since I started blogging on these pages about Pyrseas and version control. In a month it will also be the first anniversary of the initial commit to GitHub. Much code and many words have flown under these “bridges,” so this seems an appropriate time to reflect.

When I discovered Andromeda, I was looking for a framework to do simple (CRUD-type) database updates

  • with more flexibility (read, programability) than pgAdmin or phpPgAdmin
  • without being tied to an object-relational mapper, either built-in as in Django or external as SQLAlchemy (Pylons/Pyramid)
  • without having to write repetitive code, either SQL or ORM.

Andromeda appeared to satisfy these objectives (although I wasn’t thrilled about having to customize it in PHP).

When I conceived dbtoyaml, I was being lazy: reacting to Andromeda’s requirement to handcraft a YAML description of a database before I could use it to manage SQL changes to it. I thought: why not create the YAML from the database catalogs?

Since my concept for a YAML database specification didn’t match well to Andromeda’s, that led to yamltodb, my attempt to recreate the SQL “diff’ing” features of Andromeda in Python. Andromeda did it using the information_schema catalogs, which made it portable to other DBMSs that had those. Andromeda also did the comparisons by issuing SQL queries (which didn’t perform well). I chose to use the pg_catalog tables and did the comparisons directly on Python structures.

At first, I had intended to only diff schemas and tables and not much more, since that sufficed for my purposes. However, Peter Eisentraut’s comment eventually convinced me that Pyrseas had to support ALL PostgreSQL DDL features1. I’m very pleased with what was accomplished. Pyrseas 0.5, to be released shortly2, will add support for TEXTSEARCH and FOREIGN DATA WRAPPER related objects. The only gaps left are TABLESPACE, GROUP/ROLE and the EXTENSIONs added in PG 9.1.

2012 brought another turn of events. My post on the controversy between Chris Travers and Tony Marston on whether business logic ought to reside in the database led to collaboration with Roger Hunwicks to create dbextend, a tool to automate database augmentation. A first submission was made and work continues on that front.

The latter effort raises other possibilities. For example, since yamltodb already knows how to create nearly all PG objects, it would be trivial to create a schemadump utility (equivalent to pg_dump -s). Another potential tool of interest to PostgreSQL advocates: dbtoyaml for other databases (mytoyaml, oratoyaml anyone?) together with a conversion utility that operates on the YAML specification so it can be accepted by the PG-only yamltodb (the YAML converter seems should be easier than editing SQL statements). The YAML/JSON output from dbtoyaml is amenable to other analysis or automation tasks.

I hope to get back to my database user interface “dream” … one of these days, but in the meantime, I’m glad for having taken these detours. I’d like to thank those who helped along the way: Josh Berkus, Robert Brewer, Adam Cornett, Ronan Dunklau, Peter Eisentraut, David Fetter, Dickson Guedes, Matthias Howell, Roger Hunwicks, Toon Koppelaars, Marko Kreen, Fabrízio Mello, Regina Obe, Filip Rembialkowski, Dariusz Suchojad, Daniele Varrazzo, Evgeni Vasilev, David Wheeler and others I may have missed.

1 Actually, Josh Berkus was the first one who mentioned (in a private email) that I ought to support all PG objects.
2 And just in time for PyCon, I’m happy to announce that it will support Python 3.


The Phantom of the Database – Part 4

At the end of November, I finished the third episode with mild suspense: I suggested that the problem of optimistic “locking” could perhaps be solved in PostgreSQL with something other than extra qualifications, row version numbers or timestamps.

Let’s start this episode with action!

moviesdb=> INSERT INTO film VALUES (47478, 'Seven  Samurai', 1956);
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |     title      | release_year
 853969 |    0 | (0,1) | Seven  Samurai |         1956
(1 row)

moviesdb=> UPDATE film SET title = 'Seven Samurai' WHERE id = 47478;
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |     title     | release_year
 853970 |    0 | (0,2) | Seven Samurai |         1956
(1 row)

moviesdb=> UPDATE film SET title = 'Sichinin Samurai' WHERE id = 47478;
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
 853971 |    0 | (0,3) | Sichinin Samurai |         1956
(1 row)

moviesdb=> UPDATE film SET release_year = 1954 WHERE id = 47478;
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
 853972 |    0 | (0,4) | Sichinin Samurai |         1954
(1 row)

moviesdb=> VACUUM FULL film;
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
 853972 |    0 | (0,1) | Sichinin Samurai |         1954
(1 row)

moviesdb=> BEGIN; DELETE FROM film WHERE id = 47478; ROLLBACK;
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  |  xmax  | ctid  |      title       | release_year
 853972 | 853974 | (0,1) | Sichinin Samurai |         1954
(1 row)

moviesdb=> BEGIN; UPDATE film SET release_year = 1956 WHERE id = 47478;
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
 853975 |    0 | (0,2) | Sichinin Samurai |         1956
(1 row)

moviesdb=> ROLLBACK;
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  |  xmax  | ctid  |      title       | release_year
 853972 | 853975 | (0,1) | Sichinin Samurai |         1954
(1 row)

What element in the queries above could be used as a surrogate “row version identifier?”  If you examine the changes carefully, you’ll notice that the xmin system column provides that capability. The ctid, a row locator, on the other hand, does not survive the VACUUM operation, and xmax is only used when a row is deleted (or updated, causing it to move).

So my suggestion, in terms of web user interfaces, is that fetching a row for a possible update should include the xmin value, e.g., in the get() method of the Film class, use the following:

    def get(self, db):
            row = db.fetchone(
                "SELECT xmin, title, release_year FROM film WHERE id = %s",

The xmin value can then be sent as a hidden field to the web client, and used in the update() method to implement optimistic concurrency control.

Update: The DB UI Tutorial tag v0.4.3 now has an implementation of this concept for the command line client.


Python Web Frameworks – Application Object

In response to my previous post, someone asked “… what’s the point? How one way is better than the other?” My response was that I’m “not trying to judge these frameworks from the perspective of a developer (at least not completely), but rather from the POV of ‘if I were writing a framework, which features are essential, which nice to have, which I can disregard'” (emphasis added). Of course, I have my own biases as a developer as to what I consider “nice” or essential.

With that clarification out of the way, let’s look at the next feature: the WSGI callable generally known as the application object.

Django doesn’t want developers to be concerned with this artifact. A Django application is the collection of model, views (controllers in standard MVC terminology) and templates (views) that gets run by the server. For those interested, the Django WSGI callable is an instance of WSGIHandler (in django.core.handlers.wsgi).

As best as I can tell, Twisted Web expects the developer to write the application object. It will get called from the WSGIResponse run() method in twisted.web.wsgi. As its documentation says, Twisted Web is “a framework for doing things with the web” but is “not a ‘web framework’ in the same sense” as Django, so it doesn’t fit well in this comparison.

Pyramid‘s Router class (in pyramid.router) is what implements the WSGI callable. The Router instance gets created from the Configurator class’s make_wsgi_app() method. Select Router attributes and methods:

  • logger
  • root_factory
  • routes_mapper
  • request_factory
  • registry (from args)
  • handle_request()

In CherryPy, the Application class is the application object, although it’s not generallly used directly. Instead a user root object is mounted on a Tree which is a registry of these applications. Major attributes and methods:

  • root (from args): root object (application handler)
  • namespaces, config: for configuration
  • wsgiapp(): a CPWSGIApp instance, to handle apps in a pipeline

Being a support library, Werkzeug doesn’t include an application object, but it has a sample in its Shortly example. Major attributes and methods:

  • dispatch_request(): dispatcher
  • jinja_env: template enviroment
  • url_map: Map() taking list of Rule()’s
  • error_404(): error handler
  • render_template(): template renderer

Like Django, Web2py isn’t interested in having developers worry about the application object. Its WSGI callable is the standalone wsgibase() function, in gluon.main.

The web.py application class (yes, not capitalized) implements its WSGI callable. Major attributes and methods:

  • autoreload (from args)
  • init_mapping (from args)
  • request()
  • handle(): matches path to mapping
  • run(): runs the developer server
  • notfound(), internalerror(): error handlers

Flask‘s Flask class is its application object, derived from flask.helpers._PackageBoundObject. Selected attributes and methods (in addition to those in Werkzeug’s example above):

  • static_url_path, static_folder (from args)
  • instance_path (from args or auto-determined)
  • view_functions
  • before/after_xxx_funcs (dicts)
  • logger
  • run()
  • route(): decorator for URL rules
  • error_handler_spec (a dict)
  • errorhandler(): decorator for error handler functions

Like Werkzeug, WebOb doesn’t include an application object, but it has a sample in its wiki example, most of it very specific to the example.

Bottle‘s Bottle class is its WSGI callable. Major attributes and methods:

  • routes (list) and router (class Router instance)
  • config (class ConfigDict instance)
  • mount(): to mount an app at a given point
  • match(): to search for matching routes
  • route(): decorator to bind a function to a URL

Pesto‘s DispatcherApp class is its application object. Major attributes and methods:

  • prefix (from args): a “mount” point
  • matchpattern(), match(): matches URLs (match is decorator)
  • urlfor(): handler/dispatcher to URL converter
  • gettarget(): returns a four-part tuple from the URI
  • status404_application(): error handler

Diva‘s Application class is the object of interest. Select attributes and methods:

  • config (from kwargs), configure()
  • routing_cfg and routing
  • locale_dir (from args)
  • template_dirs (from args)
  • templates(): template loader
  • prepare() and cleanup(): before and after methods

Summary: For developers who care about writing application code, Django and Web2py and to some extent Pyramid and CherryPy hide the existence of the WSGI callable, while other frameworks require that the programmer instantiate it or invoke it. If I were to write a framework, I’d want to make it easy on the developer as the former projects do, but wouldn’t keep it completely out of sight.


Python Web Frameworks – Development Server

One of the most frequent tasks of a web developer is running the app while it’s being written. So we’ll begin our examination of Python web frameworks by looking at the development servers provided and how does one run the app.

Django developers are used to invoking

python manage.py runserver

The runserver option runs Django’s development server which is invoked via the run() function in django.core.servers.basehttp. This in turn runs a WSGIServer which is derived from Python’s wsgiref’s simple_server.

Twisted Web has its own server based on its core reactor object. Here’s a simple usage example:

from twisted.web import server, resource
from twisted.internet import reactor

class HelloResource(resource.Resource):

reactor.listenTCP(8080, server.Site(HelloResource()))

Pyramid uses Paste’s paste.httpserver which in turn is based on Python’s BaseHTTPServer. Sample usage:

from paste.httpserver import serve
from pyramid.config import Configurator

def hello_world(request):

if __name__ == '__main__':
   config = Configurator()
   app = config.make_wsgi_app()
   serve(app, host='')

In CherryPy, the normal server is cherrypy.engine which is invoked as:


cherrypy.engine is actually an instance of cherrypy.process.wspbus.Bus. A simpler alternative to the engine calls is to use:


Under the covers, CherryPy also defines a global server variable that is an instance of Server which implements the HTTP server, with help from cherrypy._cpwsgi_server.CPWSGIServer.

Werkzeug has a built-in server, used as follows:

from werkzeug.serving import run_simple
from myproject import make_app

app = make_app(...)
run_simple('localhost', 8080, app, use_reloader=True)

run_simple() invokes its own (werkzeug.serving) make_server which creates one of three types of XxxWSGIServer’s, which are derived from Python’s BaseHTTPServer.HTTPServer.

When Web2py is invoked, a GUI dialog is shown. The dialog is implemented in gluon.widget and its web2pyDialog.start method creates a main.HttpServer. gluon.main’s HttpServer class uses the Rocket server whose code is in gluon.rocket.

In web.py, to run an application you instantiate a web.application and invoke its run() method, e.g.:

app = web.application(urls, globals())

This eventually runs a BaseHTTPServer.HTTPServer from the Python standard library.

Flask‘s WSGI application class (Flask) has a run() method:

run(host='', port=5000, debug=None, **options)

This invokes the Werkzeug serving.run_simple function (see above).

WebOb doesn’t offer any server per se, but suggests using Paste’s httpserver.serve or wsgiref.simple_server.

Bottle has a standalone run() function:

run(app=None, server='wsgiref', host='', port=8080,
    interval=1, reloader=False, quiet=False, plugins=None, **kargs)

This runs one of several web servers supported, by default, wsgiref, which uses the wsgiref.simple_server.

Pesto suggests using the Python wsgiref directly, e.g.:

import pesto
dispatcher = pesto.dispatcher_app()
httpd = make_server('', 8080, dispatcher)

Diva uses Python’s WSGIServer from the standard library to implement its main() and serve() standalone functions. The former can be used as follows:

from diva.core import Application
from diva.server import main

class MyApp(Application): pass


Summary: With the exception of Twisted and CherryPy, all frameworks base their development servers on the standard library’s wsgiref or BaseHTTPServer, Paste or Rocket (which claims to be CherryPy-compatible). Lesson for prospective framework creators: Don’t write your own server. However, you may want to enhance your server with auto-reloading, threading, debugging. etc.

There are two approaches to running the app or server. Django and Web2py don’t want the developer to bother with writing app.run() or similar, the other frameworks expect the programmer to plug the various pieces together.

Best wishes for 2012 to all readers!


Python Web Frameworks – Candidates

I’ve been researching web frameworks, partly with a view to writing my own, and thought I’d share my findings. This first post will present the candidates briefly and subsequent posts will delve into particular features.

The candidates are presented based on the number of users as found on Ohloh, as a rough measure of popularity.  Each includes the number of downloads for the latest release on PyPI (sometimes misleading if a package had a recent release) and the salient parts of its advertised description (caveat emptor!). All of them are open source and written in Python, but some of them are WSGI libraries rather than full frameworks.

Django (843 users, 126,426 downloads): A “high-level … Web framework that encourages rapid development and clean, pragmatic design.”

Twisted Web (136 users, 14,413 downloads [both for Twisted]):  An “HTPP server that can be used as a library or run … stand-alone …, an HTML templating engine [and] an HTTP client library.”

Pyramid (aka Pylons on Ohloh, 125 users, 606 + 640 downloads): A “very general … web framework [designed] to make it easier for a developer to create an arbitrary web application.”

CherryPy (75 users, PyPI download info not available): A “minimalist … pythonic, object-oriented web framework.”

Werkzeug (31 users, 4225 downloads): A “WSGI utility library, … [it] is Simple … And Powerful.”

Web2py (21 users, 1157 downloads): A “full-stack framework for rapid development of fast, scalable, secure and portable database-driven web-based applications.”

Web.py (18 users, 11,708 downloads): A “web framework … that is as simple as it is powerful” (hmmm … see Werkzeug above).

Flask (16 users, 42,657 downloads): A “microframework … based on Werkzeug, Jinja 2 and good intentions.”

WebOb (16 users, 29,664 downloads): A “library that provides wrappers around the WSGI request environment, and an object to help create WSGI responses.”

Bottle (4 users, 163 downloads): A “fast, simple and lightweight WSGI micro web-framework.”

Pesto (no users, 421 downloads): Not a framework, but a library for “writing WSGI web applications.”

Diva (not on Ohloh or PyPI): This is Christopher Lenz’s  framework experimentation sandbox: “a lightweight web framework … built on top of WSGI and integrated with … Genshi.” My first, unpublished attempt at writing a framework was based on Diva.

Left Out: I had to draw the line somewhere. Compared to Richard Jones’ micro-framework battle, I included three of what he called “mega frameworks” and excluded three of his micro frameworks. For the record, here are the ones I left out: Aspen, Bobo, CubicWeb, Grok, Itty, Milla, Nagare, Nevow, Pump, Pyjamas, Pylons, Python Paste, Quixote, Spyce, Tornado, TurboGears, Webware, Zope 2 and Zope 3Bluebream. But that’s not all: see the Python.org wiki, WSGI.org and Wikipedia.

Unlike Richard, I will not attempt to develop an application with each framework. Instead, I’ll analyze specific features along the lines of Christopher’s Diva documentation.


Design Notes on Database Application Development

I recently took a brief look at web2py since its Database Abstraction Layer (DAL) sounded interesting. After following the tutorials in the Overview chapter, I ended up not delving into DAL.  Instead, I revisited Ruby on Rails which I had last looked circa 2005 when its support for PostgreSQL was rather weak. Although that has improved, the basic approach of Rails hasn’t and is very similar to web2py’s as well as Django.

The essence of these full stack frameworks is that the developer —or should I say programmer— knows how to design classes and that that design can be transformed into a relational database design mechanically and flawlessly. Maybe I’m biased because my first involvements in software engineering were in the areas of IT system design, but the latter premise seems like a very weak foundation for developing robust systems.

Maybe this also explains, in part, why NoSQL solutions have cropped up and found a following. It’s much easier —albeit not reliable or sound— to design a system when you “liberate” yourself from the “straitjacket” constraints of a database design (aka schema). However, in my experience, most systems beyond the blog/wiki/toy database examples used in tutorials benefit from specialized data analysis and modeling that result in a formal database design.

If the database precedes the application development, it appears the Rails/Django/web2py approach is back asswards. If the database is available during development, doesn’t it make more sense to re-use the database design effort in creating the application rather than try to repeat it with Python or Ruby classes?

The Pyrseas project, in its quest to rescue Andromeda, plans to implement a utility to help in database application development (tentatively named dbappgen but that may change). The utility will start off by connecting to a database and will allow the developer to create the application based on the tables and other objects present in it, not the other way around. If database changes are needed, they can be made there first, not by imposing some application-induced “migration” (as web2py does).

Since Pyrseas’ dbtoyaml utility already outputs a YAML/JSON description of a database, it may even be possible to conduct an application development exercise, at least in part, without connecting to the database. For example, if the developer invokes dbappgen moviesdb, the utility could first check whether a file named moviesdb.yaml (or with similar extensions) exists in the current directory and if so, read it and recreate in memory the PostgreSQL catalogs, thus allowing “offline” development.

In other news, due to the recent discussions regarding Python 3, my hopes that Werkzeug will be ported soon have been dashed. It appears it’s either use WebOb or write my own framework (which ironically is what Armin recommended).


The Phantom of the Database – Part 3

In the previous episode: Alice and Bob were trying to simultaneously update the title of the following row of the film table:

  id   |     title      | release_year
 47478 | Seven  Samurai |         1956

User Transaction Correctness

In the first episode, Carol also wanted to concurrently change the release_year to 1954. Let us now consider her update in conjunction with one of the others.

If Bob modifies the title and Carol changes the year —provided our SQL statements only include the updated fields— there will be no problem. In other words, the changes can be merged, just as when Alice updates one section of a Wikipedia page and Bob changes another section of the same page (Note to self: have to talk to them about this moonlighting).

So the problem with Alice’s and Bob’s updates to the title (one to remove the extra space, the other to change it to Japanese) is that they cannot be merged. How can we implement a correct user transaction without incurring the penalty of locking the row in question?

Optimistic Concurrency Control

Richard Stephan was on the right track when he asked in a comment to the previous post: Should the application be using a pessimistic or an optimistic locking technique?

Optimistic concurrency control (aka optimistic “locking”) is a transaction design approach that avoids the overhead of locking. As stated in the Wikipedia article, the “stateless nature of HTTP makes locking infeasible for web user interfaces,” so it’s almost a given that we should use OCC for our interface.

OCC involves validating that the record we updated was the correct one. A simple way to do this is to add a qualification to the original UPDATE that does the validation for us. For example, the DB-API cursor execute call could be coded as follows:

cursor.execute("UPDATE film SET title = %s "
               "WHERE id = %s AND title = %s",
               (newrec.title, oldrec.id, oldrec.title))

Above, newrec refers to the new record instance and oldrec to the old one. So both Alice and Bob would send UPDATEs with “AND title = ‘Seven  Samurai'”. Naturally, since they’re both modifying the title, only one of the statements can succeed.

The downside of having to add the extra qualification —in the general case— is that the user could’ve changed several fields so each UPDATE would have to be customized accordingly. This will have a negative impact, among other things, on the query planner. In addition, the web interface would have to deal with two values for each field.

Alternatives to the extra qualification are to add either a version number for each row or a sufficiently granular timestamp that also acts as a versioning scheme. However, since we’re using PostgreSQL, perhaps we can use something else …

Stay tuned.