The Future of Pyrseas: Part 3

The second Andromeda feature, aside from schema version control, that Pyrseas wanted to rescue was the automatic generation of search and detailed editing screens.

It is somewhat amazing that 40 years after Moshé Zloof’s “Query by Example” and over 30 years after “A Forms Application Development System” by Prof. Larry Rowe and Kurt Shoens, most applications for searching, entering and maintaining data are still reinventing the wheel.

There are of course apps such as PgAdmin and Adminer that can be used for general querying and editing, but their target audience is developers and DBAs, not end users. Furthermore, these tools typically only allow querying a single table at a time (see QBE queries #3 and subsequent, in the aforementioned paper, for examples that usually can’t be handled).

At first, I thought I would develop a typical Python WSGI application. I planned to use Werkzeug because the extra machinery in larger frameworks such as Django is unnecessary for the envisioned purpose. An additional consideration was that a smaller library or framework would make it easier for anyone wanting to extend the automatically generated pieces into a full-fledged app.

I started designing a prototype application and also exploring other WSGI contenders. Then Jacob Kaplan-Moss and Tarek Ziadé jolted my thought process: a JavaScript-based framework was the way forward for a web-based app with Python (and Werkzeug) relegated to a REST API server.

So I took a look at AngularJS, Ember.js, Backbone.js and several other JavaScript frameworks and libraries. At this time, it seems a simpler library such as Backbone.js, or perhaps the Marionette framework, may be more appropriate. It is comforting to see that PgAdmin4 has chosen a similar path for its web-deployed client.

This capability is ultimately quite ambitious, so perhaps Pyrseas ought to stick just to schema version control. Nevertheless, whether integrated with Pyrseas or not, this functionality ought to re-use the database-catalog-to-Python-structures-to-YAML (and vice versa) features developed for Pyrseas.

This concludes this three-part series. Please also see part 1 and part 2.


A short detour into AngularJS and Brunch

I was planning to continue exploring database user interfaces using something completely different, as mentioned in the last post. However, I’ve taken a short detour to experiment further with the technologies in question.

In case you’re interested in AngularJS, the Karma test runner, the Brunch application assembler, CoffeeScript and the Jade templating engine, you can follow along at GitHub where I’ve created angular-phonecat-brunch, a derivative of the AngularJS phone catalog tutorial. And if you’re an expert on these topics, you can visit too, and let me know how to do it better!

Update: The tutorial has been completed, including changes to use the latest releases of Bower and Brunch, and also incorporates Stylus dynamic CSS stylesheets. Don’t miss the documentation README.



ANFSCD: Revisiting the Web Server

Nearly two years ago, I was considering which Python web framework to use for a user interface to Postgres: CherryPy, Flask, Werkzeug? Not entirely satisfied with the choices, I started reviewing even more frameworks thinking I might want to write my own minimalist framework.

Several months later, somebody (through Planet Python, IIRC) referred me to a presentation by Jacob Kaplan-Moss on the history and future of Python on the web. Surprisingly, halfway through the talk Jacob started raving about Meteor, a pure JavaScript framework, saying “we’re deluding ourselves if we think this [something like Meteor] is not the future of web applications.” This prompted me to take a close look at Meteor and several other JS frameworks.

Tarek Ziadé’s “A new development era” essay reinforced this change in direction. Ultimately, I settled on AngularJS as the (client) framework. Two-way data binding, dependency injection and testability are some of the features that won me over.

Angular opened the door to the Node.js world—which appears somewhat chaotic compared to Python’s (and even more to the staidness of Postgres). Like Python, Node.js has an abundance of web frameworks, templating libraries and other tools to choose from (and master). Aside from that, are there any negatives in continuing down this path?

For one, although Angular is an open source project, unlike Python and PostgreSQL, its destiny is controlled by a behemoth. A saving grace is its large community of contributors. And perhaps some of Angular’s innovations may eventually become part of standard HTML.

Second, in spite of Selena Deckelmann’s recent comments on JS and PG, I’m strongly partial to Python and not fond of JavaScript as an implementation language. It’s liberating not to have to use braces (and semicolons) for code structure! To compensate, CoffeeScript appears to be the obvious alternative.

When it comes to interfacing to Postgres, although I haven’t explored it enough to do justice, node-postgres doesn’t seem to be up to par with psycopg, and I’m not about to throw away the work I’ve done on Pyrseas, in particular the TTM-inspired interface. So Werkzeug may still play a part, as a Postgres-Python-to-JSON service, particularly now that it support Python 3. However, for contrast I will use node-postgres in an early implementation.

Last, the Angular team’s choice for “workflow” tool (Yeoman) did not sit well with me: I don’t care for “scaffolding” and my first experience with Grunt rubbed me the wrong way. Fortunately, in the Node.js “chaos” I found Brunch, which although not without problems, looks suitable for my purposes.

Having addressed the negatives, I’ve started work on this at GitHub, and plan to post more about it later on.

Update: Due to the change in direction, I was wondering whether I should also change the title of this blog to something like “Taming Serpents, Pachyderms and White A’s in Red Shields”, but fortunately I discovered that at least O’Reilly uses a rhinoceros as the JavaScript mascot and rhinos are considered pachyderms. 🙂


Tuples in the Pythonic, TTM-inspired interface to PostgreSQL

The Third Manifesto formally describes tuple types (RM prescription 6), tuple values (prescription 9), tuple variables (prescription 12) as well as other tuple-related elements. As mentioned in the previous post, a tuple value is a set of ordered triples each consisting of attribute name, type and value.

Class Tuple of the TTM-inspired interface to PostgreSQL models TTM tuples as Python lists of TTM Attribute objects. Lists were used rather than sets because for many practical purposes the order of the attributes is useful (or has “meaning”), e.g., the first attribute listed is most often –even in purist relational theory presentations– the primary key or part of the primary key.

The interface stores the Tuple heading as a (Python) n-tuple of name-type tuples, in the “internal use” _heading attribute. The n-tuple was chosen over a list due to its immutability. The interface also sets each Attribute as a Python attribute of the Tuple object. Thus, if you define a Tuple variable as follows:

film = Tuple([
    Attribute('id', int, sysdefault=True),
    Attribute('release_year', int)])

You can then assign or access an Attribute using simple Python syntax:

film.title = "Seven Samurai"
if film.year == 1954:
    do something

The interface also stores two other internal use lists, one for nullable attributes and another for attributes that allow default values. These are to be used by upstream classes such as RelVar.

Class Tuple has a __setattr__ method tailored to deal with assignment to TTM Attributes. It disallows assignment to internal attributes after initialization, with one exception: the _tuple_version attribute (used by RelVar for optimistic concurrency). It also doesn’t allow assignment to undefined Attributes, e.g., given the film variable above, attempting to assign to film.length will raise an AttributeError. Finally, the assignment is “filtered” through class Attribute, so that an attempt such as film.title = 8.5 will result in a ValueError from that class.

The pyrseas.relation.tuple module defines a standalone function: tuple_values_dict. This is used to generate a dictionary of attribute values suitable for passing to Psycopg’s cursor.execute method. For INSERT, a single currtuple argument is expected. For UPDATE, the modified Tuple is passed as a second argument and tuple_values_dict will return a dictionary solely for the attribute and values that have changed.


Attributes in the Pythonic, TTM-inspired interface to PostgreSQL

The Third Manifesto‘s Relational Model (RM) prescription 9 defines a relation heading as “a set of ordered pairs or attributes of the form <A,T>,” where A is the name of the attribute and T is the name of the declared type of the attribute. It then defines a tuple value (or tuple for short) as a set of ordered triples of form <A,T,v> where v is an arbitrary value of type T, called the attribute value.

Class Attribute of the TTM-inspired interface to PostgreSQL models the latter ordered triple rather than the ordered pair. I considered implementing the pair, say as AttribType, and deriving Attribute from it, but opted for the leaner, no-hierarchy solution for now.

An Attribute object should be initialized with a name, Python type and a value. However, only the name is required. The default type is str and for Python 2, unicode is treated almost as a synonym for str (I realize this goes against everything that your mother taught you, but I’m hoping that widespread Python 3 adoption will make this moot soon).

I wholehearteadly agree with TTM RM proscription 5 banning attributes that do not have a value, i.e., like SQL NULLs. However, I hope the interface is useful on existing databases, so in the interest of practicality, class Attribute has two additional, optional arguments: nullable and sysdefault. The former is to specify that an attribute allows NULLs (or None in Python). The latter can be used to indicate the corresponding table column has an SQL DEFAULT specification (this includes those defined as SERIAL or BIGSERIAL).

If an attribute is not nullable and not sysdefault, a value must be specified. If omitted, a suitable default is created based on the type (if possible), i.e., an empty string for str, 0 for int, 0.0 for float. If the attribute is nullable, empty string, 0 or 0.0 are converted to None. This approach is to facilitate dealing with empty HTML form fields, i.e., if the user skips a nullable field, the attribute should end up as a NULL in the database.

If a value is provided, the code raises ValueError if the value does not agree with the specified (or defaulted) Python type. The only exceptions are that an int value is cast to float, and a unicode value is allowed if the type is str and we’re working in Python 2.x.


A Pythonic, TTM-inspired interface to PostgreSQL – Requirements

Several moons ago, I started a series of posts about “designing and implementing a generic end user interface for PostgreSQL.” After a while, the series got sidetracked by other issues.

More recently, I have returned to the original endeavor. Partly from reading Database Explorations: Essays on The Third Manifesto and related topics by C.J. Date and Hugh Darwen, I decided to use relational concepts as presented in The Third Manifesto (TTM) in my implementation. This post provides an overview of the requirements.

Limited Scope

The interface is not a full-blown replacement for an object-relational mapper (ORM) (although in theory it could eventually grow in that direction). The interface is intended to assist with two typical needs of a database “admin” application: browsing and CRUD.

Browsing refers to presenting a subset of rows (tuples) of a table (relation variable or relvar) for subsequent editing. The relvar will typically be normalized so it may be necessary to join it to other relvars. Browsing will usually display a limited number of columns (attributes) so relational projection will be needed.

CRUD refers to the ability to create, read, update and delete single tuples in a relvar. The interface should only support relvars with a properly defined, possibly composite primary key.


The user (developer) should have to define only the attributes of each relvar together with the key, and for browsing, the projected attributes plus a JOIN specification if multiple relvars are involved. The definitions should be simple enough so that most of them could be (at a later date) derived automatically from the database catalogs.

From the definitions, the interface should generate all necessary SQL commands to INSERT a single tuple (possibly returning a generated key value), retrieve, UPDATE or DELETE a single tuple using the key, and fetch subsets of projected/joined tuples in a given order.

Optimistic Concurrency Control

The interface should take advantage of PostgreSQL features to implement optimistic locking when handling updates or deletes, as described in a previous series of posts.

Query by Example Support

The interface should facilitate querying of the browsed tuples using something similar to Query-By-Example. For example, when browsing movies if the argument release_year is passed as “>= 1969“, the results should only include films released on that year or later. This feature was not discussed in a post but had been committed to the tutorial repository.


The interface should follow the TTM guidelines when possible. For example, although implemented in Python, assignment to a relvar attribute defined as int should not be allowed if the value is of type str, and duplicate attribute names in a join expression should not be permitted. However, since the interface ought to be usable against existing SQL databases, allowance should be made for certain SQL features such as nullable attributes.

The implementation has been committed to the Pyrseas repository and changes were made to the DBUI tutorial to use the new interface. Subsequent posts will cover the interface in more detail.


Database User Interfaces – Pagination

Since it’s been a while from my last post on this subject, let me recap what we’ve covered:

I also took a detour to explore Python web frameworks. One of the first comments inquired about Tornado. I recently had the opportunity to experiment with it. There’s much to admire, particularly in terms of speed. However, the use of one-or-two-method handler classes essentially for each action (see the blog demo, for example) struck me as counterproductive. It reminded me of Jack Diederich’s “Stop Writing Classes” presentation.

Getting back to database UIs, I have added basic pagination to the listing of films. Here is an example of what it looks like:

The number of lines per page is currently hard-coded (see maxlines in FilmHandler.index), but eventually it should be configurable. The principal additions are the count() and slice() methods in the bl/film.py module. The first does a SELECT COUNT(*) FROM films, so that we can determine how many pages will be needed (the upcoming PostgreSQL 9.2 should improve performance of that query). The slice() method does a SELECT like the all() method but uses LIMIT and OFFSET to retrieve the subset of rows needed for the requested page number.

There are further refinements possible. For example, the list of page numbers/links at the bottom could get very long given enough data in the table. For now, correcting this is left as an exercise for the reader.

1 I’m glad to hear that Kenneth Reitz, Armin Ronacher and others are working on “merging” Werkzeug with Requests.


More Database Tools?

It’s been over year since I started blogging on these pages about Pyrseas and version control. In a month it will also be the first anniversary of the initial commit to GitHub. Much code and many words have flown under these “bridges,” so this seems an appropriate time to reflect.

When I discovered Andromeda, I was looking for a framework to do simple (CRUD-type) database updates

  • with more flexibility (read, programability) than pgAdmin or phpPgAdmin
  • without being tied to an object-relational mapper, either built-in as in Django or external as SQLAlchemy (Pylons/Pyramid)
  • without having to write repetitive code, either SQL or ORM.

Andromeda appeared to satisfy these objectives (although I wasn’t thrilled about having to customize it in PHP).

When I conceived dbtoyaml, I was being lazy: reacting to Andromeda’s requirement to handcraft a YAML description of a database before I could use it to manage SQL changes to it. I thought: why not create the YAML from the database catalogs?

Since my concept for a YAML database specification didn’t match well to Andromeda’s, that led to yamltodb, my attempt to recreate the SQL “diff’ing” features of Andromeda in Python. Andromeda did it using the information_schema catalogs, which made it portable to other DBMSs that had those. Andromeda also did the comparisons by issuing SQL queries (which didn’t perform well). I chose to use the pg_catalog tables and did the comparisons directly on Python structures.

At first, I had intended to only diff schemas and tables and not much more, since that sufficed for my purposes. However, Peter Eisentraut’s comment eventually convinced me that Pyrseas had to support ALL PostgreSQL DDL features1. I’m very pleased with what was accomplished. Pyrseas 0.5, to be released shortly2, will add support for TEXTSEARCH and FOREIGN DATA WRAPPER related objects. The only gaps left are TABLESPACE, GROUP/ROLE and the EXTENSIONs added in PG 9.1.

2012 brought another turn of events. My post on the controversy between Chris Travers and Tony Marston on whether business logic ought to reside in the database led to collaboration with Roger Hunwicks to create dbextend, a tool to automate database augmentation. A first submission was made and work continues on that front.

The latter effort raises other possibilities. For example, since yamltodb already knows how to create nearly all PG objects, it would be trivial to create a schemadump utility (equivalent to pg_dump -s). Another potential tool of interest to PostgreSQL advocates: dbtoyaml for other databases (mytoyaml, oratoyaml anyone?) together with a conversion utility that operates on the YAML specification so it can be accepted by the PG-only yamltodb (the YAML converter seems should be easier than editing SQL statements). The YAML/JSON output from dbtoyaml is amenable to other analysis or automation tasks.

I hope to get back to my database user interface “dream” … one of these days, but in the meantime, I’m glad for having taken these detours. I’d like to thank those who helped along the way: Josh Berkus, Robert Brewer, Adam Cornett, Ronan Dunklau, Peter Eisentraut, David Fetter, Dickson Guedes, Matthias Howell, Roger Hunwicks, Toon Koppelaars, Marko Kreen, Fabrízio Mello, Regina Obe, Filip Rembialkowski, Dariusz Suchojad, Daniele Varrazzo, Evgeni Vasilev, David Wheeler and others I may have missed.

1 Actually, Josh Berkus was the first one who mentioned (in a private email) that I ought to support all PG objects.
2 And just in time for PyCon, I’m happy to announce that it will support Python 3.


The Phantom of the Database – Part 4

At the end of November, I finished the third episode with mild suspense: I suggested that the problem of optimistic “locking” could perhaps be solved in PostgreSQL with something other than extra qualifications, row version numbers or timestamps.

Let’s start this episode with action!

moviesdb=> INSERT INTO film VALUES (47478, 'Seven  Samurai', 1956);
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |     title      | release_year
 853969 |    0 | (0,1) | Seven  Samurai |         1956
(1 row)

moviesdb=> UPDATE film SET title = 'Seven Samurai' WHERE id = 47478;
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |     title     | release_year
 853970 |    0 | (0,2) | Seven Samurai |         1956
(1 row)

moviesdb=> UPDATE film SET title = 'Sichinin Samurai' WHERE id = 47478;
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
 853971 |    0 | (0,3) | Sichinin Samurai |         1956
(1 row)

moviesdb=> UPDATE film SET release_year = 1954 WHERE id = 47478;
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
 853972 |    0 | (0,4) | Sichinin Samurai |         1954
(1 row)

moviesdb=> VACUUM FULL film;
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
 853972 |    0 | (0,1) | Sichinin Samurai |         1954
(1 row)

moviesdb=> BEGIN; DELETE FROM film WHERE id = 47478; ROLLBACK;
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  |  xmax  | ctid  |      title       | release_year
 853972 | 853974 | (0,1) | Sichinin Samurai |         1954
(1 row)

moviesdb=> BEGIN; UPDATE film SET release_year = 1956 WHERE id = 47478;
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
 853975 |    0 | (0,2) | Sichinin Samurai |         1956
(1 row)

moviesdb=> ROLLBACK;
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  |  xmax  | ctid  |      title       | release_year
 853972 | 853975 | (0,1) | Sichinin Samurai |         1954
(1 row)

What element in the queries above could be used as a surrogate “row version identifier?”  If you examine the changes carefully, you’ll notice that the xmin system column provides that capability. The ctid, a row locator, on the other hand, does not survive the VACUUM operation, and xmax is only used when a row is deleted (or updated, causing it to move).

So my suggestion, in terms of web user interfaces, is that fetching a row for a possible update should include the xmin value, e.g., in the get() method of the Film class, use the following:

    def get(self, db):
            row = db.fetchone(
                "SELECT xmin, title, release_year FROM film WHERE id = %s",

The xmin value can then be sent as a hidden field to the web client, and used in the update() method to implement optimistic concurrency control.

Update: The DB UI Tutorial tag v0.4.3 now has an implementation of this concept for the command line client.


Python Web Frameworks – Application Object

In response to my previous post, someone asked “… what’s the point? How one way is better than the other?” My response was that I’m “not trying to judge these frameworks from the perspective of a developer (at least not completely), but rather from the POV of ‘if I were writing a framework, which features are essential, which nice to have, which I can disregard'” (emphasis added). Of course, I have my own biases as a developer as to what I consider “nice” or essential.

With that clarification out of the way, let’s look at the next feature: the WSGI callable generally known as the application object.

Django doesn’t want developers to be concerned with this artifact. A Django application is the collection of model, views (controllers in standard MVC terminology) and templates (views) that gets run by the server. For those interested, the Django WSGI callable is an instance of WSGIHandler (in django.core.handlers.wsgi).

As best as I can tell, Twisted Web expects the developer to write the application object. It will get called from the WSGIResponse run() method in twisted.web.wsgi. As its documentation says, Twisted Web is “a framework for doing things with the web” but is “not a ‘web framework’ in the same sense” as Django, so it doesn’t fit well in this comparison.

Pyramid‘s Router class (in pyramid.router) is what implements the WSGI callable. The Router instance gets created from the Configurator class’s make_wsgi_app() method. Select Router attributes and methods:

  • logger
  • root_factory
  • routes_mapper
  • request_factory
  • registry (from args)
  • handle_request()

In CherryPy, the Application class is the application object, although it’s not generallly used directly. Instead a user root object is mounted on a Tree which is a registry of these applications. Major attributes and methods:

  • root (from args): root object (application handler)
  • namespaces, config: for configuration
  • wsgiapp(): a CPWSGIApp instance, to handle apps in a pipeline

Being a support library, Werkzeug doesn’t include an application object, but it has a sample in its Shortly example. Major attributes and methods:

  • dispatch_request(): dispatcher
  • jinja_env: template enviroment
  • url_map: Map() taking list of Rule()’s
  • error_404(): error handler
  • render_template(): template renderer

Like Django, Web2py isn’t interested in having developers worry about the application object. Its WSGI callable is the standalone wsgibase() function, in gluon.main.

The web.py application class (yes, not capitalized) implements its WSGI callable. Major attributes and methods:

  • autoreload (from args)
  • init_mapping (from args)
  • request()
  • handle(): matches path to mapping
  • run(): runs the developer server
  • notfound(), internalerror(): error handlers

Flask‘s Flask class is its application object, derived from flask.helpers._PackageBoundObject. Selected attributes and methods (in addition to those in Werkzeug’s example above):

  • static_url_path, static_folder (from args)
  • instance_path (from args or auto-determined)
  • view_functions
  • before/after_xxx_funcs (dicts)
  • logger
  • run()
  • route(): decorator for URL rules
  • error_handler_spec (a dict)
  • errorhandler(): decorator for error handler functions

Like Werkzeug, WebOb doesn’t include an application object, but it has a sample in its wiki example, most of it very specific to the example.

Bottle‘s Bottle class is its WSGI callable. Major attributes and methods:

  • routes (list) and router (class Router instance)
  • config (class ConfigDict instance)
  • mount(): to mount an app at a given point
  • match(): to search for matching routes
  • route(): decorator to bind a function to a URL

Pesto‘s DispatcherApp class is its application object. Major attributes and methods:

  • prefix (from args): a “mount” point
  • matchpattern(), match(): matches URLs (match is decorator)
  • urlfor(): handler/dispatcher to URL converter
  • gettarget(): returns a four-part tuple from the URI
  • status404_application(): error handler

Diva‘s Application class is the object of interest. Select attributes and methods:

  • config (from kwargs), configure()
  • routing_cfg and routing
  • locale_dir (from args)
  • template_dirs (from args)
  • templates(): template loader
  • prepare() and cleanup(): before and after methods

Summary: For developers who care about writing application code, Django and Web2py and to some extent Pyramid and CherryPy hide the existence of the WSGI callable, while other frameworks require that the programmer instantiate it or invoke it. If I were to write a framework, I’d want to make it easy on the developer as the former projects do, but wouldn’t keep it completely out of sight.