Python Web Frameworks – Application Object

In response to my previous post, someone asked “… what’s the point? How one way is better than the other?” My response was that I’m “not trying to judge these frameworks from the perspective of a developer (at least not completely), but rather from the POV of ‘if I were writing a framework, which features are essential, which nice to have, which I can disregard’” (emphasis added). Of course, I have my own biases as a developer as to what I consider “nice” or essential.

With that clarification out of the way, let’s look at the next feature: the WSGI callable generally known as the application object.

Django doesn’t want developers to be concerned with this artifact. A Django application is the collection of model, views (controllers in standard MVC terminology) and templates (views) that gets run by the server. For those interested, the Django WSGI callable is an instance of WSGIHandler (in django.core.handlers.wsgi).

As best as I can tell, Twisted Web expects the developer to write the application object. It will get called from the WSGIResponse run() method in twisted.web.wsgi. As its documentation says, Twisted Web is “a framework for doing things with the web” but is “not a ‘web framework’ in the same sense” as Django, so it doesn’t fit well in this comparison.

Pyramid‘s Router class (in pyramid.router) is what implements the WSGI callable. The Router instance gets created from the Configurator class’s make_wsgi_app() method. Select Router attributes and methods:

  • logger
  • root_factory
  • routes_mapper
  • request_factory
  • registry (from args)
  • handle_request()

In CherryPy, the Application class is the application object, although it’s not generallly used directly. Instead a user root object is mounted on a Tree which is a registry of these applications. Major attributes and methods:

  • root (from args): root object (application handler)
  • namespaces, config: for configuration
  • wsgiapp(): a CPWSGIApp instance, to handle apps in a pipeline

Being a support library, Werkzeug doesn’t include an application object, but it has a sample in its Shortly example. Major attributes and methods:

  • dispatch_request(): dispatcher
  • jinja_env: template enviroment
  • url_map: Map() taking list of Rule()’s
  • error_404(): error handler
  • render_template(): template renderer

Like Django, Web2py isn’t interested in having developers worry about the application object. Its WSGI callable is the standalone wsgibase() function, in gluon.main.

The web.py application class (yes, not capitalized) implements its WSGI callable. Major attributes and methods:

  • autoreload (from args)
  • init_mapping (from args)
  • request()
  • handle(): matches path to mapping
  • run(): runs the developer server
  • notfound(), internalerror(): error handlers

Flask‘s Flask class is its application object, derived from flask.helpers._PackageBoundObject. Selected attributes and methods (in addition to those in Werkzeug’s example above):

  • static_url_path, static_folder (from args)
  • instance_path (from args or auto-determined)
  • view_functions
  • before/after_xxx_funcs (dicts)
  • logger
  • run()
  • route(): decorator for URL rules
  • error_handler_spec (a dict)
  • errorhandler(): decorator for error handler functions

Like Werkzeug, WebOb doesn’t include an application object, but it has a sample in its wiki example, most of it very specific to the example.

Bottle‘s Bottle class is its WSGI callable. Major attributes and methods:

  • routes (list) and router (class Router instance)
  • config (class ConfigDict instance)
  • mount(): to mount an app at a given point
  • match(): to search for matching routes
  • route(): decorator to bind a function to a URL

Pesto‘s DispatcherApp class is its application object. Major attributes and methods:

  • prefix (from args): a “mount” point
  • matchpattern(), match(): matches URLs (match is decorator)
  • urlfor(): handler/dispatcher to URL converter
  • gettarget(): returns a four-part tuple from the URI
  • status404_application(): error handler

Diva‘s Application class is the object of interest. Select attributes and methods:

  • config (from kwargs), configure()
  • routing_cfg and routing
  • locale_dir (from args)
  • template_dirs (from args)
  • templates(): template loader
  • prepare() and cleanup(): before and after methods

Summary: For developers who care about writing application code, Django and Web2py and to some extent Pyramid and CherryPy hide the existence of the WSGI callable, while other frameworks require that the programmer instantiate it or invoke it. If I were to write a framework, I’d want to make it easy on the developer as the former projects do, but wouldn’t keep it completely out of sight.

Posted in Python, User interfaces | Tagged , , , , , , , , , , , , , | 3 Comments

Business Logic in the Database

Chris Travers recently responded to Tony Marston’s critique of an earlier post where Chris advocated “intelligent databases”1. Chris’ response is well reasoned, particularly his point that once a database is accessed by more than a single application or via third-party tools, it’s almost a given that one should attempt to push “intelligence” and business logic into the database if possible.

However, there is a paragraph in Tony’s post that merits further scrutiny:

The database has always been a dumb data store, with all the business logic held separately within the application. This is an old idea which is now reinforced by the single responsibility principle. It is the application code which is responsible for the application logic while the database is responsible for the storing and retrieval of data. Modern databases also have the ability to enforce data integrity, manage concurrency control, backup, recovery and replication while also controlling data access and maintaining database security.

If a database (actually, a shortening of DBMS—a combination of the data and the software that manages it) has always been dumb, then presumably one would never specify UNIQUE indexes. It is a business requirement that invoice or employee numbers be unique, so if all the business logic should reside in the application, then the DBA should only create a regular index and the application —all the applications and tools!— should enforce uniqueness.

Tony’s mention of “data integrity” is somewhat ambiguous because different people have varied understandings of what that covers. As C. J. Date points out, “integrity … is the part [of the relational model] that has changed [or evolved] the most over the years.”2 Perhaps Tony believes that primary keys, and unique and referential constraints should be enforced by the DBMS, but apparently an integrity constraint such as “No supplier with status less than 20 supplies any part in a quantity greater than 500″3 should instead only be code in an application (make that all applications that access that database).

As for me, as I pointed out earlier, “constraints should be implemented, preferably declaratively, in the database schema” while “type constraints … should also be implemented in the user interface” (emphasis added). Ideally, the user interface should derive its code directly from the schema.

Update: Many thanks to everyone who participated in the discussion. I think we’ve covered just about every angle pro or con incorporating business logic in the database, so I’ve closed comments now.

Update 2: Chris has an update post that may be of interest.


1 Interestingly, the first time I heard the term “intelligent database” it was from Dave Kellogg giving a marketing presentation for Ingres 6.3, which had incorporated some of the features in the UC Berkeley POSTGRES project.
2 Date, C. J. An Introduction to Database Systems, 8th Edition. 2004, p. 253.
3 Ibid., p. 254.

Posted in PostgreSQL | Tagged , , | 55 Comments

Python Web Frameworks – Development Server

One of the most frequent tasks of a web developer is running the app while it’s being written. So we’ll begin our examination of Python web frameworks by looking at the development servers provided and how does one run the app.

Django developers are used to invoking

python manage.py runserver

The runserver option runs Django’s development server which is invoked via the run() function in django.core.servers.basehttp. This in turn runs a WSGIServer which is derived from Python’s wsgiref’s simple_server.

Twisted Web has its own server based on its core reactor object. Here’s a simple usage example:

from twisted.web import server, resource
from twisted.internet import reactor

class HelloResource(resource.Resource):
    ...

reactor.listenTCP(8080, server.Site(HelloResource()))
reactor.run()

Pyramid uses Paste’s paste.httpserver which in turn is based on Python’s BaseHTTPServer. Sample usage:

from paste.httpserver import serve
from pyramid.config import Configurator

def hello_world(request):
   ... 

if __name__ == '__main__':
   config = Configurator()
   ...
   app = config.make_wsgi_app()
   serve(app, host='0.0.0.0')

In CherryPy, the normal server is cherrypy.engine which is invoked as:

cherrypy.engine.start()
cherrypy.engine.block()

cherrypy.engine is actually an instance of cherrypy.process.wspbus.Bus. A simpler alternative to the engine calls is to use:

cherrypy.quickstart(app())

Under the covers, CherryPy also defines a global server variable that is an instance of Server which implements the HTTP server, with help from cherrypy._cpwsgi_server.CPWSGIServer.

Werkzeug has a built-in server, used as follows:

from werkzeug.serving import run_simple
from myproject import make_app

app = make_app(...)
run_simple('localhost', 8080, app, use_reloader=True)

run_simple() invokes its own (werkzeug.serving) make_server which creates one of three types of XxxWSGIServer’s, which are derived from Python’s BaseHTTPServer.HTTPServer.

When Web2py is invoked, a GUI dialog is shown. The dialog is implemented in gluon.widget and its web2pyDialog.start method creates a main.HttpServer. gluon.main’s HttpServer class uses the Rocket server whose code is in gluon.rocket.

In web.py, to run an application you instantiate a web.application and invoke its run() method, e.g.:

app = web.application(urls, globals())
app.run()

This eventually runs a BaseHTTPServer.HTTPServer from the Python standard library.

Flask‘s WSGI application class (Flask) has a run() method:

run(host='127.0.0.1', port=5000, debug=None, **options)

This invokes the Werkzeug serving.run_simple function (see above).

WebOb doesn’t offer any server per se, but suggests using Paste’s httpserver.serve or wsgiref.simple_server.

Bottle has a standalone run() function:

run(app=None, server='wsgiref', host='127.0.0.1', port=8080,
    interval=1, reloader=False, quiet=False, plugins=None, **kargs)

This runs one of several web servers supported, by default, wsgiref, which uses the wsgiref.simple_server.

Pesto suggests using the Python wsgiref directly, e.g.:

import pesto
dispatcher = pesto.dispatcher_app()
...
httpd = make_server('', 8080, dispatcher)
httpd.serve_forever()

Diva uses Python’s WSGIServer from the standard library to implement its main() and serve() standalone functions. The former can be used as follows:

from diva.core import Application
from diva.server import main

class MyApp(Application): pass

main(MyApp())

Summary: With the exception of Twisted and CherryPy, all frameworks base their development servers on the standard library’s wsgiref or BaseHTTPServer, Paste or Rocket (which claims to be CherryPy-compatible). Lesson for prospective framework creators: Don’t write your own server. However, you may want to enhance your server with auto-reloading, threading, debugging. etc.

There are two approaches to running the app or server. Django and Web2py don’t want the developer to bother with writing app.run() or similar, the other frameworks expect the programmer to plug the various pieces together.

Best wishes for 2012 to all readers!

Posted in Python, User interfaces | Tagged , , , , , , , , , , , , , | 15 Comments

Python Web Frameworks – Candidates

I’ve been researching web frameworks, partly with a view to writing my own, and thought I’d share my findings. This first post will present the candidates briefly and subsequent posts will delve into particular features.

The candidates are presented based on the number of users as found on Ohloh, as a rough measure of popularity.  Each includes the number of downloads for the latest release on PyPI (sometimes misleading if a package had a recent release) and the salient parts of its advertised description (caveat emptor!). All of them are open source and written in Python, but some of them are WSGI libraries rather than full frameworks.

Django (843 users, 126,426 downloads): A “high-level … Web framework that encourages rapid development and clean, pragmatic design.”

Twisted Web (136 users, 14,413 downloads [both for Twisted]):  An “HTPP server that can be used as a library or run … stand-alone …, an HTML templating engine [and] an HTTP client library.”

Pyramid (aka Pylons on Ohloh, 125 users, 606 + 640 downloads): A “very general … web framework [designed] to make it easier for a developer to create an arbitrary web application.”

CherryPy (75 users, PyPI download info not available): A ”minimalist … pythonic, object-oriented web framework.”

Werkzeug (31 users, 4225 downloads): A “WSGI utility library, … [it] is Simple … And Powerful.”

Web2py (21 users, 1157 downloads): A “full-stack framework for rapid development of fast, scalable, secure and portable database-driven web-based applications.”

Web.py (18 users, 11,708 downloads): A “web framework … that is as simple as it is powerful” (hmmm … see Werkzeug above).

Flask (16 users, 42,657 downloads): A “microframework … based on Werkzeug, Jinja 2 and good intentions.”

WebOb (16 users, 29,664 downloads): A “library that provides wrappers around the WSGI request environment, and an object to help create WSGI responses.”

Bottle (4 users, 163 downloads): A “fast, simple and lightweight WSGI micro web-framework.”

Pesto (no users, 421 downloads): Not a framework, but a library for “writing WSGI web applications.”

Diva (not on Ohloh or PyPI): This is Christopher Lenz’s  framework experimentation sandbox: “a lightweight web framework … built on top of WSGI and integrated with … Genshi.” My first, unpublished attempt at writing a framework was based on Diva.

Left Out: I had to draw the line somewhere. Compared to Richard Jones’ micro-framework battle, I included three of what he called “mega frameworks” and excluded three of his micro frameworks. For the record, here are the ones I left out: Aspen, Bobo, CubicWeb, Grok, Itty, Milla, Nagare, Nevow, Pump, Pyjamas, Pylons, Python Paste, Quixote, Spyce, Tornado, TurboGears, Webware, Zope 2 and Zope 3Bluebream. But that’s not all: see the Python.org wiki, WSGI.org and Wikipedia.

Unlike Richard, I will not attempt to develop an application with each framework. Instead, I’ll analyze specific features along the lines of Christopher’s Diva documentation.

Posted in Python, User interfaces | Tagged , , , , , , , , , , , , , | 8 Comments

Design Notes on Database Application Development

I recently took a brief look at web2py since its Database Abstraction Layer (DAL) sounded interesting. After following the tutorials in the Overview chapter, I ended up not delving into DAL.  Instead, I revisited Ruby on Rails which I had last looked circa 2005 when its support for PostgreSQL was rather weak. Although that has improved, the basic approach of Rails hasn’t and is very similar to web2py’s as well as Django.

The essence of these full stack frameworks is that the developer —or should I say programmer— knows how to design classes and that that design can be transformed into a relational database design mechanically and flawlessly. Maybe I’m biased because my first involvements in software engineering were in the areas of IT system design, but the latter premise seems like a very weak foundation for developing robust systems.

Maybe this also explains, in part, why NoSQL solutions have cropped up and found a following. It’s much easier —albeit not reliable or sound— to design a system when you “liberate” yourself from the “straitjacket” constraints of a database design (aka schema). However, in my experience, most systems beyond the blog/wiki/toy database examples used in tutorials benefit from specialized data analysis and modeling that result in a formal database design.

If the database precedes the application development, it appears the Rails/Django/web2py approach is back asswards. If the database is available during development, doesn’t it make more sense to re-use the database design effort in creating the application rather than try to repeat it with Python or Ruby classes?

The Pyrseas project, in its quest to rescue Andromeda, plans to implement a utility to help in database application development (tentatively named dbappgen but that may change). The utility will start off by connecting to a database and will allow the developer to create the application based on the tables and other objects present in it, not the other way around. If database changes are needed, they can be made there first, not by imposing some application-induced “migration” (as web2py does).

Since Pyrseas’ dbtoyaml utility already outputs a YAML/JSON description of a database, it may even be possible to conduct an application development exercise, at least in part, without connecting to the database. For example, if the developer invokes dbappgen moviesdb, the utility could first check whether a file named moviesdb.yaml (or with similar extensions) exists in the current directory and if so, read it and recreate in memory the PostgreSQL catalogs, thus allowing “offline” development.

In other news, due to the recent discussions regarding Python 3, my hopes that Werkzeug will be ported soon have been dashed. It appears it’s either use WebOb or write my own framework (which ironically is what Armin recommended).

Posted in PostgreSQL, Python, User interfaces | Tagged , , , , , , , | 26 Comments

The Phantom of the Database – Part 3

In the previous episode: Alice and Bob were trying to simultaneously update the title of the following row of the film table:

  id   |     title      | release_year
-------+----------------+--------------
 47478 | Seven  Samurai |         1956

User Transaction Correctness

In the first episode, Carol also wanted to concurrently change the release_year to 1954. Let us now consider her update in conjunction with one of the others.

If Bob modifies the title and Carol changes the year —provided our SQL statements only include the updated fields— there will be no problem. In other words, the changes can be merged, just as when Alice updates one section of a Wikipedia page and Bob changes another section of the same page (Note to self: have to talk to them about this moonlighting).

So the problem with Alice’s and Bob’s updates to the title (one to remove the extra space, the other to change it to Japanese) is that they cannot be merged. How can we implement a correct user transaction without incurring the penalty of locking the row in question?

Optimistic Concurrency Control

Richard Stephan was on the right track when he asked in a comment to the previous post: Should the application be using a pessimistic or an optimistic locking technique?

Optimistic concurrency control (aka optimistic “locking”) is a transaction design approach that avoids the overhead of locking. As stated in the Wikipedia article, the “stateless nature of HTTP makes locking infeasible for web user interfaces,” so it’s almost a given that we should use OCC for our interface.

OCC involves validating that the record we updated was the correct one. A simple way to do this is to add a qualification to the original UPDATE that does the validation for us. For example, the DB-API cursor execute call could be coded as follows:

cursor.execute("UPDATE film SET title = %s "
               "WHERE id = %s AND title = %s",
               (newrec.title, oldrec.id, oldrec.title))

Above, newrec refers to the new record instance and oldrec to the old one. So both Alice and Bob would send UPDATEs with “AND title = ‘Seven  Samurai’”. Naturally, since they’re both modifying the title, only one of the statements can succeed.

The downside of having to add the extra qualification —in the general case— is that the user could’ve changed several fields so each UPDATE would have to be customized accordingly. This will have a negative impact, among other things, on the query planner. In addition, the web interface would have to deal with two values for each field.

Alternatives to the extra qualification are to add either a version number for each row or a sufficiently granular timestamp that also acts as a versioning scheme. However, since we’re using PostgreSQL, perhaps we can use something else …

Stay tuned.

Posted in PostgreSQL, Python, User interfaces | Tagged , , , | 9 Comments

The Phantom of the Database – Part 2

In the previous episode: Alice, Bob and Carol were trying to simultaneously update the following row of the film table:

  id   |     title      | release_year
-------+----------------+--------------
 47478 | Seven  Samurai |         1956

Alice wanted to remove the extra space in the title, Bob was trying to change the title to the phonetic Japanese version and Carol was correcting the release year to 1954. To simplify the analysis we’ll now limit ourselves to Alice’s and Bob’s updates.

The Lost Update Problem

If Alice’s statement executes first, Bob’s change will overwrite her update. Similarly, if Bob’s statement takes precedence, his change will be overwritten. Appropriately, this is conventionally known as the lost update problem. The updates are known as blind writes or dirty writes. How can our Python user interface prevent this problem?

The traditional solution to the lost update problem is to use two-phase locking. You can use PostgreSQL’s psql application to verify how this works (I used the PROMPT1 variable to show which user is issuing the statements).

Alice’s session starts as follows:

alice@moviesdb=> begin;
BEGIN
alice@moviesdb=> select title from film where id = 47478;
     title      
----------------
 Seven  Samurai
(1 row)

Bob’s session is identical but then he issues the UPDATE statement:

bob@moviesdb=> begin;
BEGIN
bob@moviesdb=> select title from film where id = 47478;
     title      
----------------
 Seven  Samurai
(1 row)

bob@moviesdb=> update film set title = 'Sichinin no Samurai' where id = 47478;
UPDATE 1

When Alice tries her UPDATE, her session hangs:

alice@moviesdb=> update film set title = 'Seven Samurai' where id = 47478;

You can examine the situation from another psql session (I cheated and excluded that session’s data). I won’t try to explain (or understand) all this but you can see that Alice’s session is waiting due to an ungranted lock.

moviesdb=# select procpid, waiting, current_query from pg_stat_activity;
 procpid | waiting |                       current_query
---------+---------+-----------------------------------------------------------
   25747 | t       | update film set title = 'Seven Samurai' where id = 47478;
   25900 | f       | <IDLE> in transaction
(2 rows)

moviesdb=# select pid, relation::regclass, locktype, page, tuple, mode, granted
moviesdb-# from pg_locks order by pid, relation, locktype;
  pid  | relation  |   locktype    | page | tuple |       mode       | granted
-------+-----------+---------------+------+-------+------------------+---------
 25747 | film      | relation      |      |       | AccessShareLock  | t
 25747 | film      | relation      |      |       | RowExclusiveLock | t
 25747 | film      | tuple         |    0 |    37 | ExclusiveLock    | t
 25747 | film_pkey | relation      |      |       | AccessShareLock  | t
 25747 | film_pkey | relation      |      |       | RowExclusiveLock | t
 25747 |           | transactionid |      |       | ShareLock        | f
 25747 |           | transactionid |      |       | ExclusiveLock    | t
 25747 |           | virtualxid    |      |       | ExclusiveLock    | t
 25900 | film      | relation      |      |       | RowExclusiveLock | t
 25900 | film      | relation      |      |       | AccessShareLock  | t
 25900 | film_pkey | relation      |      |       | RowExclusiveLock | t
 25900 | film_pkey | relation      |      |       | AccessShareLock  | t
 25900 |           | transactionid |      |       | ExclusiveLock    | t
 25900 |           | virtualxid    |      |       | ExclusiveLock    | t
(14 rows)

Bob’s COMMIT releases his locks …

bob@moviesdb=> commit;
COMMIT

and Alice’s UPDATE now goes through:

UPDATE 1
alice@moviesdb=> commit;
COMMIT
alice@moviesdb=> select title from film where id = 47478;
     title     
---------------
 Seven Samurai
(1 row)

Hey, what happened? Alice’s UPDATE overwrote Bob’s! Wasn’t that supposed to be prevented?

Here is the rub: if it is important for the application to update the row as was presented to the user, then we need to add another qualification to the UPDATE, i.e., we need something like “and title = 'Seven  Samurai'“. We’ll discuss this in a future installment.

Posted in PostgreSQL, Python, User interfaces | Tagged , , | 16 Comments