Automated Database Augmentation

Suppose you have a PostgreSQL database like the Pagila sample with 14 tables, each with a last_update timestamp column to record the date and time each row was modified, and it is now a requirement to capture which user effected each change. Or perhaps you have several tables without such audit trail columns and need to add them quickly. Or maybe you have decided to denormalize your design by adding a calculated column, e.g., extended price = unit price times quantity ordered, or a derived column, e.g., carrying the customer name in the invoice table.

If you have some experience as a DBA, the word “drudgery” may have come to mind at the prospect of implementing the above features. It’s possible that, after a while, you’ve developed an approach for dealing with some of them but still wish there’d be some way to automate these thankless tasks.

You may have looked at the Andromeda project’s “automations” which provide some of these capabilities. However, in order to take advantage of the automations, you’ll first have to manually describe your database in a YAML format (and you’ll have to install Apache and PHP). Or you could have tried to use the follow-on project, Triangulum, but essentially you’d still have to create a YAML schema (no need for Apache, but you still need PHP).

Some relief is forthcoming. As a result of discussions resulting from my Business Logic in the Database post, I have been collaborating with Roger Hunwicks on a potential solution to these common DBA needs. The new Pyrseas tool is tentatively named dbextend1 and its initial documentation is available in the Pyrseas extender branch. This is how I envision dbextend being used.

Consider the opening example. The DBA would create a simple YAML file such as the (abbreviated) one below, listing the tables and the needed features:

schema public:
  table actor:
    audit_columns: default
  table category:
    audit_columns: default
...
  table store:
    audit_columns: default

The DBA would then use this file, say audext.yaml, as input to dbextend, e.g.,

dbextend pagiladb audext.yaml

dbextend reads the PostgreSQL catalogs (using code shared with dbtoyaml and yamltodb), building its internal representation. It also reads the YAML extensions file and builds a parallel (albeit much smaller) structure. Thirdly, it reads extension configuration information, e.g., a definition of what columns need to be added for “audit_columns: default“, for example, modified_timestamp and modified_by_user, what trigger(s) to add, and what function(s) to be created.

The output of dbextend is a YAML schema file, just like the one output by dbtoyaml, which can be piped directly to yamltodb to generate SQL to implement the desired features.

In case you’re wondering, dbextend —like other Pyrseas tools— will require Python, psycopg2 and pyyaml.

What features would you like to see automated? What are your suggested best practices for automating these common needs?


Picture credit: Thanks to Mr. O’Brien, a fourth-grade teacher in Minnesota.

1 We’re still receptive to some other suitable name.

Posted in PostgreSQL, Python, Database tools | Tagged , , , , , , , | 5 Comments

The Phantom of the Database – Part 4

At the end of November, I finished the third episode with mild suspense: I suggested that the problem of optimistic “locking” could perhaps be solved in PostgreSQL with something other than extra qualifications, row version numbers or timestamps.

Let’s start this episode with action!

moviesdb=> INSERT INTO film VALUES (47478, 'Seven  Samurai', 1956);
INSERT 0 1
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |     title      | release_year
--------+------+-------+----------------+--------------
 853969 |    0 | (0,1) | Seven  Samurai |         1956
(1 row)

moviesdb=> UPDATE film SET title = 'Seven Samurai' WHERE id = 47478;
UPDATE 1
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |     title     | release_year
--------+------+-------+---------------+--------------
 853970 |    0 | (0,2) | Seven Samurai |         1956
(1 row)

moviesdb=> UPDATE film SET title = 'Sichinin Samurai' WHERE id = 47478;
UPDATE 1
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
--------+------+-------+------------------+--------------
 853971 |    0 | (0,3) | Sichinin Samurai |         1956
(1 row)

moviesdb=> UPDATE film SET release_year = 1954 WHERE id = 47478;
UPDATE 1
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
--------+------+-------+------------------+--------------
 853972 |    0 | (0,4) | Sichinin Samurai |         1954
(1 row)

moviesdb=> VACUUM FULL film;
VACUUM
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
--------+------+-------+------------------+--------------
 853972 |    0 | (0,1) | Sichinin Samurai |         1954
(1 row)

moviesdb=> BEGIN; DELETE FROM film WHERE id = 47478; ROLLBACK;
BEGIN
DELETE 1
ROLLBACK
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  |  xmax  | ctid  |      title       | release_year
--------+--------+-------+------------------+--------------
 853972 | 853974 | (0,1) | Sichinin Samurai |         1954
(1 row)

moviesdb=> BEGIN; UPDATE film SET release_year = 1956 WHERE id = 47478;
BEGIN
UPDATE 1
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
--------+------+-------+------------------+--------------
 853975 |    0 | (0,2) | Sichinin Samurai |         1956
(1 row)

moviesdb=> ROLLBACK;
ROLLBACK
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  |  xmax  | ctid  |      title       | release_year
--------+--------+-------+------------------+--------------
 853972 | 853975 | (0,1) | Sichinin Samurai |         1954
(1 row)

What element in the queries above could be used as a surrogate “row version identifier?”  If you examine the changes carefully, you’ll notice that the xmin system column provides that capability. The ctid, a row locator, on the other hand, does not survive the VACUUM operation, and xmax is only used when a row is deleted (or updated, causing it to move).

So my suggestion, in terms of web user interfaces, is that fetching a row for a possible update should include the xmin value, e.g., in the get() method of the Film class, use the following:

    def get(self, db):
        try:
            row = db.fetchone(
                "SELECT xmin, title, release_year FROM film WHERE id = %s",
                (self.id,))

The xmin value can then be sent as a hidden field to the web client, and used in the update() method to implement optimistic concurrency control.

Update: The DB UI Tutorial tag v0.4.3 now has an implementation of this concept for the command line client.

Posted in PostgreSQL, Python, User interfaces | Tagged , , , | 2 Comments

Python Web Frameworks – Application Object

In response to my previous post, someone asked “… what’s the point? How one way is better than the other?” My response was that I’m “not trying to judge these frameworks from the perspective of a developer (at least not completely), but rather from the POV of ‘if I were writing a framework, which features are essential, which nice to have, which I can disregard’” (emphasis added). Of course, I have my own biases as a developer as to what I consider “nice” or essential.

With that clarification out of the way, let’s look at the next feature: the WSGI callable generally known as the application object.

Django doesn’t want developers to be concerned with this artifact. A Django application is the collection of model, views (controllers in standard MVC terminology) and templates (views) that gets run by the server. For those interested, the Django WSGI callable is an instance of WSGIHandler (in django.core.handlers.wsgi).

As best as I can tell, Twisted Web expects the developer to write the application object. It will get called from the WSGIResponse run() method in twisted.web.wsgi. As its documentation says, Twisted Web is “a framework for doing things with the web” but is “not a ‘web framework’ in the same sense” as Django, so it doesn’t fit well in this comparison.

Pyramid‘s Router class (in pyramid.router) is what implements the WSGI callable. The Router instance gets created from the Configurator class’s make_wsgi_app() method. Select Router attributes and methods:

  • logger
  • root_factory
  • routes_mapper
  • request_factory
  • registry (from args)
  • handle_request()

In CherryPy, the Application class is the application object, although it’s not generallly used directly. Instead a user root object is mounted on a Tree which is a registry of these applications. Major attributes and methods:

  • root (from args): root object (application handler)
  • namespaces, config: for configuration
  • wsgiapp(): a CPWSGIApp instance, to handle apps in a pipeline

Being a support library, Werkzeug doesn’t include an application object, but it has a sample in its Shortly example. Major attributes and methods:

  • dispatch_request(): dispatcher
  • jinja_env: template enviroment
  • url_map: Map() taking list of Rule()’s
  • error_404(): error handler
  • render_template(): template renderer

Like Django, Web2py isn’t interested in having developers worry about the application object. Its WSGI callable is the standalone wsgibase() function, in gluon.main.

The web.py application class (yes, not capitalized) implements its WSGI callable. Major attributes and methods:

  • autoreload (from args)
  • init_mapping (from args)
  • request()
  • handle(): matches path to mapping
  • run(): runs the developer server
  • notfound(), internalerror(): error handlers

Flask‘s Flask class is its application object, derived from flask.helpers._PackageBoundObject. Selected attributes and methods (in addition to those in Werkzeug’s example above):

  • static_url_path, static_folder (from args)
  • instance_path (from args or auto-determined)
  • view_functions
  • before/after_xxx_funcs (dicts)
  • logger
  • run()
  • route(): decorator for URL rules
  • error_handler_spec (a dict)
  • errorhandler(): decorator for error handler functions

Like Werkzeug, WebOb doesn’t include an application object, but it has a sample in its wiki example, most of it very specific to the example.

Bottle‘s Bottle class is its WSGI callable. Major attributes and methods:

  • routes (list) and router (class Router instance)
  • config (class ConfigDict instance)
  • mount(): to mount an app at a given point
  • match(): to search for matching routes
  • route(): decorator to bind a function to a URL

Pesto‘s DispatcherApp class is its application object. Major attributes and methods:

  • prefix (from args): a “mount” point
  • matchpattern(), match(): matches URLs (match is decorator)
  • urlfor(): handler/dispatcher to URL converter
  • gettarget(): returns a four-part tuple from the URI
  • status404_application(): error handler

Diva‘s Application class is the object of interest. Select attributes and methods:

  • config (from kwargs), configure()
  • routing_cfg and routing
  • locale_dir (from args)
  • template_dirs (from args)
  • templates(): template loader
  • prepare() and cleanup(): before and after methods

Summary: For developers who care about writing application code, Django and Web2py and to some extent Pyramid and CherryPy hide the existence of the WSGI callable, while other frameworks require that the programmer instantiate it or invoke it. If I were to write a framework, I’d want to make it easy on the developer as the former projects do, but wouldn’t keep it completely out of sight.

Posted in Python, User interfaces | Tagged , , , , , , , , , , , , , | 3 Comments

Business Logic in the Database

Chris Travers recently responded to Tony Marston’s critique of an earlier post where Chris advocated “intelligent databases”1. Chris’ response is well reasoned, particularly his point that once a database is accessed by more than a single application or via third-party tools, it’s almost a given that one should attempt to push “intelligence” and business logic into the database if possible.

However, there is a paragraph in Tony’s post that merits further scrutiny:

The database has always been a dumb data store, with all the business logic held separately within the application. This is an old idea which is now reinforced by the single responsibility principle. It is the application code which is responsible for the application logic while the database is responsible for the storing and retrieval of data. Modern databases also have the ability to enforce data integrity, manage concurrency control, backup, recovery and replication while also controlling data access and maintaining database security.

If a database (actually, a shortening of DBMS—a combination of the data and the software that manages it) has always been dumb, then presumably one would never specify UNIQUE indexes. It is a business requirement that invoice or employee numbers be unique, so if all the business logic should reside in the application, then the DBA should only create a regular index and the application —all the applications and tools!— should enforce uniqueness.

Tony’s mention of “data integrity” is somewhat ambiguous because different people have varied understandings of what that covers. As C. J. Date points out, “integrity … is the part [of the relational model] that has changed [or evolved] the most over the years.”2 Perhaps Tony believes that primary keys, and unique and referential constraints should be enforced by the DBMS, but apparently an integrity constraint such as “No supplier with status less than 20 supplies any part in a quantity greater than 500″3 should instead only be code in an application (make that all applications that access that database).

As for me, as I pointed out earlier, “constraints should be implemented, preferably declaratively, in the database schema” while “type constraints … should also be implemented in the user interface” (emphasis added). Ideally, the user interface should derive its code directly from the schema.

Update: Many thanks to everyone who participated in the discussion. I think we’ve covered just about every angle pro or con incorporating business logic in the database, so I’ve closed comments now.

Update 2: Chris has an update post that may be of interest.


1 Interestingly, the first time I heard the term “intelligent database” it was from Dave Kellogg giving a marketing presentation for Ingres 6.3, which had incorporated some of the features in the UC Berkeley POSTGRES project.
2 Date, C. J. An Introduction to Database Systems, 8th Edition. 2004, p. 253.
3 Ibid., p. 254.

Posted in PostgreSQL | Tagged , , | 55 Comments

Python Web Frameworks – Development Server

One of the most frequent tasks of a web developer is running the app while it’s being written. So we’ll begin our examination of Python web frameworks by looking at the development servers provided and how does one run the app.

Django developers are used to invoking

python manage.py runserver

The runserver option runs Django’s development server which is invoked via the run() function in django.core.servers.basehttp. This in turn runs a WSGIServer which is derived from Python’s wsgiref’s simple_server.

Twisted Web has its own server based on its core reactor object. Here’s a simple usage example:

from twisted.web import server, resource
from twisted.internet import reactor

class HelloResource(resource.Resource):
    ...

reactor.listenTCP(8080, server.Site(HelloResource()))
reactor.run()

Pyramid uses Paste’s paste.httpserver which in turn is based on Python’s BaseHTTPServer. Sample usage:

from paste.httpserver import serve
from pyramid.config import Configurator

def hello_world(request):
   ... 

if __name__ == '__main__':
   config = Configurator()
   ...
   app = config.make_wsgi_app()
   serve(app, host='0.0.0.0')

In CherryPy, the normal server is cherrypy.engine which is invoked as:

cherrypy.engine.start()
cherrypy.engine.block()

cherrypy.engine is actually an instance of cherrypy.process.wspbus.Bus. A simpler alternative to the engine calls is to use:

cherrypy.quickstart(app())

Under the covers, CherryPy also defines a global server variable that is an instance of Server which implements the HTTP server, with help from cherrypy._cpwsgi_server.CPWSGIServer.

Werkzeug has a built-in server, used as follows:

from werkzeug.serving import run_simple
from myproject import make_app

app = make_app(...)
run_simple('localhost', 8080, app, use_reloader=True)

run_simple() invokes its own (werkzeug.serving) make_server which creates one of three types of XxxWSGIServer’s, which are derived from Python’s BaseHTTPServer.HTTPServer.

When Web2py is invoked, a GUI dialog is shown. The dialog is implemented in gluon.widget and its web2pyDialog.start method creates a main.HttpServer. gluon.main’s HttpServer class uses the Rocket server whose code is in gluon.rocket.

In web.py, to run an application you instantiate a web.application and invoke its run() method, e.g.:

app = web.application(urls, globals())
app.run()

This eventually runs a BaseHTTPServer.HTTPServer from the Python standard library.

Flask‘s WSGI application class (Flask) has a run() method:

run(host='127.0.0.1', port=5000, debug=None, **options)

This invokes the Werkzeug serving.run_simple function (see above).

WebOb doesn’t offer any server per se, but suggests using Paste’s httpserver.serve or wsgiref.simple_server.

Bottle has a standalone run() function:

run(app=None, server='wsgiref', host='127.0.0.1', port=8080,
    interval=1, reloader=False, quiet=False, plugins=None, **kargs)

This runs one of several web servers supported, by default, wsgiref, which uses the wsgiref.simple_server.

Pesto suggests using the Python wsgiref directly, e.g.:

import pesto
dispatcher = pesto.dispatcher_app()
...
httpd = make_server('', 8080, dispatcher)
httpd.serve_forever()

Diva uses Python’s WSGIServer from the standard library to implement its main() and serve() standalone functions. The former can be used as follows:

from diva.core import Application
from diva.server import main

class MyApp(Application): pass

main(MyApp())

Summary: With the exception of Twisted and CherryPy, all frameworks base their development servers on the standard library’s wsgiref or BaseHTTPServer, Paste or Rocket (which claims to be CherryPy-compatible). Lesson for prospective framework creators: Don’t write your own server. However, you may want to enhance your server with auto-reloading, threading, debugging. etc.

There are two approaches to running the app or server. Django and Web2py don’t want the developer to bother with writing app.run() or similar, the other frameworks expect the programmer to plug the various pieces together.

Best wishes for 2012 to all readers!

Posted in Python, User interfaces | Tagged , , , , , , , , , , , , , | 14 Comments

Python Web Frameworks – Candidates

I’ve been researching web frameworks, partly with a view to writing my own, and thought I’d share my findings. This first post will present the candidates briefly and subsequent posts will delve into particular features.

The candidates are presented based on the number of users as found on Ohloh, as a rough measure of popularity.  Each includes the number of downloads for the latest release on PyPI (sometimes misleading if a package had a recent release) and the salient parts of its advertised description (caveat emptor!). All of them are open source and written in Python, but some of them are WSGI libraries rather than full frameworks.

Django (843 users, 126,426 downloads): A “high-level … Web framework that encourages rapid development and clean, pragmatic design.”

Twisted Web (136 users, 14,413 downloads [both for Twisted]):  An “HTPP server that can be used as a library or run … stand-alone …, an HTML templating engine [and] an HTTP client library.”

Pyramid (aka Pylons on Ohloh, 125 users, 606 + 640 downloads): A “very general … web framework [designed] to make it easier for a developer to create an arbitrary web application.”

CherryPy (75 users, PyPI download info not available): A ”minimalist … pythonic, object-oriented web framework.”

Werkzeug (31 users, 4225 downloads): A “WSGI utility library, … [it] is Simple … And Powerful.”

Web2py (21 users, 1157 downloads): A “full-stack framework for rapid development of fast, scalable, secure and portable database-driven web-based applications.”

Web.py (18 users, 11,708 downloads): A “web framework … that is as simple as it is powerful” (hmmm … see Werkzeug above).

Flask (16 users, 42,657 downloads): A “microframework … based on Werkzeug, Jinja 2 and good intentions.”

WebOb (16 users, 29,664 downloads): A “library that provides wrappers around the WSGI request environment, and an object to help create WSGI responses.”

Bottle (4 users, 163 downloads): A “fast, simple and lightweight WSGI micro web-framework.”

Pesto (no users, 421 downloads): Not a framework, but a library for “writing WSGI web applications.”

Diva (not on Ohloh or PyPI): This is Christopher Lenz’s  framework experimentation sandbox: “a lightweight web framework … built on top of WSGI and integrated with … Genshi.” My first, unpublished attempt at writing a framework was based on Diva.

Left Out: I had to draw the line somewhere. Compared to Richard Jones’ micro-framework battle, I included three of what he called “mega frameworks” and excluded three of his micro frameworks. For the record, here are the ones I left out: Aspen, Bobo, CubicWeb, Grok, Itty, Milla, Nagare, Nevow, Pump, Pyjamas, Pylons, Python Paste, Quixote, Spyce, Tornado, TurboGears, Webware, Zope 2 and Zope 3Bluebream. But that’s not all: see the Python.org wiki, WSGI.org and Wikipedia.

Unlike Richard, I will not attempt to develop an application with each framework. Instead, I’ll analyze specific features along the lines of Christopher’s Diva documentation.

Posted in Python, User interfaces | Tagged , , , , , , , , , , , , , | 8 Comments

Design Notes on Database Application Development

I recently took a brief look at web2py since its Database Abstraction Layer (DAL) sounded interesting. After following the tutorials in the Overview chapter, I ended up not delving into DAL.  Instead, I revisited Ruby on Rails which I had last looked circa 2005 when its support for PostgreSQL was rather weak. Although that has improved, the basic approach of Rails hasn’t and is very similar to web2py’s as well as Django.

The essence of these full stack frameworks is that the developer —or should I say programmer— knows how to design classes and that that design can be transformed into a relational database design mechanically and flawlessly. Maybe I’m biased because my first involvements in software engineering were in the areas of IT system design, but the latter premise seems like a very weak foundation for developing robust systems.

Maybe this also explains, in part, why NoSQL solutions have cropped up and found a following. It’s much easier —albeit not reliable or sound— to design a system when you “liberate” yourself from the “straitjacket” constraints of a database design (aka schema). However, in my experience, most systems beyond the blog/wiki/toy database examples used in tutorials benefit from specialized data analysis and modeling that result in a formal database design.

If the database precedes the application development, it appears the Rails/Django/web2py approach is back asswards. If the database is available during development, doesn’t it make more sense to re-use the database design effort in creating the application rather than try to repeat it with Python or Ruby classes?

The Pyrseas project, in its quest to rescue Andromeda, plans to implement a utility to help in database application development (tentatively named dbappgen but that may change). The utility will start off by connecting to a database and will allow the developer to create the application based on the tables and other objects present in it, not the other way around. If database changes are needed, they can be made there first, not by imposing some application-induced “migration” (as web2py does).

Since Pyrseas’ dbtoyaml utility already outputs a YAML/JSON description of a database, it may even be possible to conduct an application development exercise, at least in part, without connecting to the database. For example, if the developer invokes dbappgen moviesdb, the utility could first check whether a file named moviesdb.yaml (or with similar extensions) exists in the current directory and if so, read it and recreate in memory the PostgreSQL catalogs, thus allowing “offline” development.

In other news, due to the recent discussions regarding Python 3, my hopes that Werkzeug will be ported soon have been dashed. It appears it’s either use WebOb or write my own framework (which ironically is what Armin recommended).

Posted in PostgreSQL, Python, User interfaces | Tagged , , , , , , , | 26 Comments