Category Archives: User interfaces

Python Web Frameworks – Candidates

I’ve been researching web frameworks, partly with a view to writing my own, and thought I’d share my findings. This first post will present the candidates briefly and subsequent posts will delve into particular features.

The candidates are presented based on the number of users as found on Ohloh, as a rough measure of popularity.  Each includes the number of downloads for the latest release on PyPI (sometimes misleading if a package had a recent release) and the salient parts of its advertised description (caveat emptor!). All of them are open source and written in Python, but some of them are WSGI libraries rather than full frameworks.

Django (843 users, 126,426 downloads): A “high-level … Web framework that encourages rapid development and clean, pragmatic design.”

Twisted Web (136 users, 14,413 downloads [both for Twisted]):  An “HTPP server that can be used as a library or run … stand-alone …, an HTML templating engine [and] an HTTP client library.”

Pyramid (aka Pylons on Ohloh, 125 users, 606 + 640 downloads): A “very general … web framework [designed] to make it easier for a developer to create an arbitrary web application.”

CherryPy (75 users, PyPI download info not available): A “minimalist … pythonic, object-oriented web framework.”

Werkzeug (31 users, 4225 downloads): A “WSGI utility library, … [it] is Simple … And Powerful.”

Web2py (21 users, 1157 downloads): A “full-stack framework for rapid development of fast, scalable, secure and portable database-driven web-based applications.”

Web.py (18 users, 11,708 downloads): A “web framework … that is as simple as it is powerful” (hmmm … see Werkzeug above).

Flask (16 users, 42,657 downloads): A “microframework … based on Werkzeug, Jinja 2 and good intentions.”

WebOb (16 users, 29,664 downloads): A “library that provides wrappers around the WSGI request environment, and an object to help create WSGI responses.”

Bottle (4 users, 163 downloads): A “fast, simple and lightweight WSGI micro web-framework.”

Pesto (no users, 421 downloads): Not a framework, but a library for “writing WSGI web applications.”

Diva (not on Ohloh or PyPI): This is Christopher Lenz’s  framework experimentation sandbox: “a lightweight web framework … built on top of WSGI and integrated with … Genshi.” My first, unpublished attempt at writing a framework was based on Diva.

Left Out: I had to draw the line somewhere. Compared to Richard Jones’ micro-framework battle, I included three of what he called “mega frameworks” and excluded three of his micro frameworks. For the record, here are the ones I left out: Aspen, Bobo, CubicWeb, Grok, Itty, Milla, Nagare, Nevow, Pump, Pyjamas, Pylons, Python Paste, Quixote, Spyce, Tornado, TurboGears, Webware, Zope 2 and Zope 3Bluebream. But that’s not all: see the Python.org wiki, WSGI.org and Wikipedia.

Unlike Richard, I will not attempt to develop an application with each framework. Instead, I’ll analyze specific features along the lines of Christopher’s Diva documentation.

Design Notes on Database Application Development

I recently took a brief look at web2py since its Database Abstraction Layer (DAL) sounded interesting. After following the tutorials in the Overview chapter, I ended up not delving into DAL.  Instead, I revisited Ruby on Rails which I had last looked circa 2005 when its support for PostgreSQL was rather weak. Although that has improved, the basic approach of Rails hasn’t and is very similar to web2py’s as well as Django.

The essence of these full stack frameworks is that the developer —or should I say programmer— knows how to design classes and that that design can be transformed into a relational database design mechanically and flawlessly. Maybe I’m biased because my first involvements in software engineering were in the areas of IT system design, but the latter premise seems like a very weak foundation for developing robust systems.

Maybe this also explains, in part, why NoSQL solutions have cropped up and found a following. It’s much easier —albeit not reliable or sound— to design a system when you “liberate” yourself from the “straitjacket” constraints of a database design (aka schema). However, in my experience, most systems beyond the blog/wiki/toy database examples used in tutorials benefit from specialized data analysis and modeling that result in a formal database design.

If the database precedes the application development, it appears the Rails/Django/web2py approach is back asswards. If the database is available during development, doesn’t it make more sense to re-use the database design effort in creating the application rather than try to repeat it with Python or Ruby classes?

The Pyrseas project, in its quest to rescue Andromeda, plans to implement a utility to help in database application development (tentatively named dbappgen but that may change). The utility will start off by connecting to a database and will allow the developer to create the application based on the tables and other objects present in it, not the other way around. If database changes are needed, they can be made there first, not by imposing some application-induced “migration” (as web2py does).

Since Pyrseas’ dbtoyaml utility already outputs a YAML/JSON description of a database, it may even be possible to conduct an application development exercise, at least in part, without connecting to the database. For example, if the developer invokes dbappgen moviesdb, the utility could first check whether a file named moviesdb.yaml (or with similar extensions) exists in the current directory and if so, read it and recreate in memory the PostgreSQL catalogs, thus allowing “offline” development.

In other news, due to the recent discussions regarding Python 3, my hopes that Werkzeug will be ported soon have been dashed. It appears it’s either use WebOb or write my own framework (which ironically is what Armin recommended).

The Phantom of the Database – Part 3

In the previous episode: Alice and Bob were trying to simultaneously update the title of the following row of the film table:

  id   |     title      | release_year
-------+----------------+--------------
 47478 | Seven  Samurai |         1956

User Transaction Correctness

In the first episode, Carol also wanted to concurrently change the release_year to 1954. Let us now consider her update in conjunction with one of the others.

If Bob modifies the title and Carol changes the year —provided our SQL statements only include the updated fields— there will be no problem. In other words, the changes can be merged, just as when Alice updates one section of a Wikipedia page and Bob changes another section of the same page (Note to self: have to talk to them about this moonlighting).

So the problem with Alice’s and Bob’s updates to the title (one to remove the extra space, the other to change it to Japanese) is that they cannot be merged. How can we implement a correct user transaction without incurring the penalty of locking the row in question?

Optimistic Concurrency Control

Richard Stephan was on the right track when he asked in a comment to the previous post: Should the application be using a pessimistic or an optimistic locking technique?

Optimistic concurrency control (aka optimistic “locking”) is a transaction design approach that avoids the overhead of locking. As stated in the Wikipedia article, the “stateless nature of HTTP makes locking infeasible for web user interfaces,” so it’s almost a given that we should use OCC for our interface.

OCC involves validating that the record we updated was the correct one. A simple way to do this is to add a qualification to the original UPDATE that does the validation for us. For example, the DB-API cursor execute call could be coded as follows:

cursor.execute("UPDATE film SET title = %s "
               "WHERE id = %s AND title = %s",
               (newrec.title, oldrec.id, oldrec.title))

Above, newrec refers to the new record instance and oldrec to the old one. So both Alice and Bob would send UPDATEs with “AND title = ‘Seven  Samurai'”. Naturally, since they’re both modifying the title, only one of the statements can succeed.

The downside of having to add the extra qualification —in the general case— is that the user could’ve changed several fields so each UPDATE would have to be customized accordingly. This will have a negative impact, among other things, on the query planner. In addition, the web interface would have to deal with two values for each field.

Alternatives to the extra qualification are to add either a version number for each row or a sufficiently granular timestamp that also acts as a versioning scheme. However, since we’re using PostgreSQL, perhaps we can use something else …

Stay tuned.

The Phantom of the Database – Part 2

In the previous episode: Alice, Bob and Carol were trying to simultaneously update the following row of the film table:

  id   |     title      | release_year
-------+----------------+--------------
 47478 | Seven  Samurai |         1956

Alice wanted to remove the extra space in the title, Bob was trying to change the title to the phonetic Japanese version and Carol was correcting the release year to 1954. To simplify the analysis we’ll now limit ourselves to Alice’s and Bob’s updates.

The Lost Update Problem

If Alice’s statement executes first, Bob’s change will overwrite her update. Similarly, if Bob’s statement takes precedence, his change will be overwritten. Appropriately, this is conventionally known as the lost update problem. The updates are known as blind writes or dirty writes. How can our Python user interface prevent this problem?

The traditional solution to the lost update problem is to use two-phase locking. You can use PostgreSQL’s psql application to verify how this works (I used the PROMPT1 variable to show which user is issuing the statements).

Alice’s session starts as follows:

alice@moviesdb=> begin;
BEGIN
alice@moviesdb=> select title from film where id = 47478;
     title      
----------------
 Seven  Samurai
(1 row)

Bob’s session is identical but then he issues the UPDATE statement:

bob@moviesdb=> begin;
BEGIN
bob@moviesdb=> select title from film where id = 47478;
     title      
----------------
 Seven  Samurai
(1 row)

bob@moviesdb=> update film set title = 'Sichinin no Samurai' where id = 47478;
UPDATE 1

When Alice tries her UPDATE, her session hangs:

alice@moviesdb=> update film set title = 'Seven Samurai' where id = 47478;

You can examine the situation from another psql session (I cheated and excluded that session’s data). I won’t try to explain (or understand) all this but you can see that Alice’s session is waiting due to an ungranted lock.

moviesdb=# select procpid, waiting, current_query from pg_stat_activity;
 procpid | waiting |                       current_query
---------+---------+-----------------------------------------------------------
   25747 | t       | update film set title = 'Seven Samurai' where id = 47478;
   25900 | f       | <IDLE> in transaction
(2 rows)

moviesdb=# select pid, relation::regclass, locktype, page, tuple, mode, granted
moviesdb-# from pg_locks order by pid, relation, locktype;
  pid  | relation  |   locktype    | page | tuple |       mode       | granted
-------+-----------+---------------+------+-------+------------------+---------
 25747 | film      | relation      |      |       | AccessShareLock  | t
 25747 | film      | relation      |      |       | RowExclusiveLock | t
 25747 | film      | tuple         |    0 |    37 | ExclusiveLock    | t
 25747 | film_pkey | relation      |      |       | AccessShareLock  | t
 25747 | film_pkey | relation      |      |       | RowExclusiveLock | t
 25747 |           | transactionid |      |       | ShareLock        | f
 25747 |           | transactionid |      |       | ExclusiveLock    | t
 25747 |           | virtualxid    |      |       | ExclusiveLock    | t
 25900 | film      | relation      |      |       | RowExclusiveLock | t
 25900 | film      | relation      |      |       | AccessShareLock  | t
 25900 | film_pkey | relation      |      |       | RowExclusiveLock | t
 25900 | film_pkey | relation      |      |       | AccessShareLock  | t
 25900 |           | transactionid |      |       | ExclusiveLock    | t
 25900 |           | virtualxid    |      |       | ExclusiveLock    | t
(14 rows)

Bob’s COMMIT releases his locks …

bob@moviesdb=> commit;
COMMIT

and Alice’s UPDATE now goes through:

UPDATE 1
alice@moviesdb=> commit;
COMMIT
alice@moviesdb=> select title from film where id = 47478;
     title     
---------------
 Seven Samurai
(1 row)

Hey, what happened? Alice’s UPDATE overwrote Bob’s! Wasn’t that supposed to be prevented?

Here is the rub: if it is important for the application to update the row as was presented to the user, then we need to add another qualification to the UPDATE, i.e., we need something like “and title = 'Seven  Samurai'“. We’ll discuss this in a future installment.

The Phantom of the Database – Part 1

Scenario: A row with “Seven  Samurai” (notice two spaces between the words) as the movie title and 1956 as the release year, in the film table.

Plot: User Alice updates the row to remove the extra space. In the meantime, user Bob updates it to change the title to its phonetic Japanese equivalent “Shichinin no Samurai” and user Carol updates the release date to 1954, the year the movie was first shown in Japan. Whose updates will survive?

If the updates take place sequentially and each user fetches the previously updated row prior to saving his or her change, we should end up with a row with “Shichinin no Samurai” as the title and 1954 as the year.

However, if each user starts viewing the original row in a web browser and the application is incorrectly designed, one or two changes may be overwritten.

For example, my minimalist user interface application even in its more robust incarnation, issues an UPDATE with both the title and the year as received from the user. This was done intentionally, for simplicity, to be corrected later. As a result, if the users press the “Save” button in the sequence shown, both Alice’s correction to the English title and Bob’s change to a Japanese title will be lost because they will be overwritten by Carol’s change.

If you’ve used a centralized VCS, like Subversion, you’re probably familiar with the issue. If our users were developers updating a source file with the movie information, they would (typically) be prevented from overwriting because the VCS would inform the second and third users that they had to merge the changes in prior commits.

So, is it enough if the application only updates the columns that were changed by the users, somewhat like a VCS automatically merging non-conflicting changed lines?

To be continued …

A Multi-Layered Test Cake

Dobos layered cake*

A recurrent theme in software engineering are the multiple layers (or tiers) into which systems are subdivided. In the ideal case, a given layer only interacts with the immediate layers “above” and “below” it. In practice, that clean separation of responsibilities is not always possible or, some may argue, desirable.

Nevertheless, thinking in terms of layers is quite useful. For example, the database application I’ve been presenting in this series consists of a PostgreSQL database, psycopg2 (a Python DB-API adapter for PostgreSQL), a business logic module that calls on psycopg2, Werkzeug (a WSGI library), the application modules that call on the business logic and Werkzeug, the HTTP protocol, and the user’s web browser.

As application developers, we need not concern ourselves with the details of how PostgreSQL stores the data that we send to it, nor with testing its internal functions. We rely on PG hackers to cover that testing. Similarly, we rely on Psycopg2 programmers to test communications with PostgreSQL and the Werkzeug team to test its WSGI and HTTP-related functionality. We do have responsibility for testing the business logic, the application code, and the interface presented to the user via the browser.

Tag v0.4.1 of the Database UI Tutorial at GitHub adds a set of unit tests for exercising the business logic module (i.e., werkzeug/bl/film.py). The test module (werkzeug/tests/bl/test_film.py) is pretty straightforward: there is a test method for each method in the business logic class. In addition, there are methods for verifying failure conditions, such as duplicate inserts or deleting a missing record.

Although the unit tests were written after the business logic module (i.e., not in keeping with test-driven development), they helped to resolve an open design issue. Earlier, some methods in dblib.py (thin interface to psycopg2) took an “expected count” argument to check the number of rows affected by a database operation. That didn’t look right, so I had removed it.  While developing the unit tests, it became clear that the checks ought to live in the business logic layer.

Tag v0.4.2 adds unit tests for testing the WSGI or web interface. I was very pleased with how easy it was to use Werkzeug’s testing faciities, Client and BaseResponse, to create the tests: simply “drop” the database app instance into the Client and then call its get or post methods to get the response headers and data. I only had to add the standard library’s xml.dom.minidom to examine particular elements in the HTML body.

Both the business logic and web test modules depend on a utility module which is “reheated” from the Pyrseas version control testing. A nice capability is that the base DbAppTestCase, in its setUpClass method (a Python 2.7 feature), ensures the test database is in sync with the YAML spec.

Coming up: testing the user (browser) interface, via Selenium.


* Dobos cake photograph, courtesy of Bruce Tuten.

Dueling Frameworks, revisited

Werkzeug

In the closing remarks of “Dueling Frameworks”, I hinted that I may continue to explore Flask. That road first led me to take a closer look at the recent state of Werkzeug.

As I had done with Flask and CherryPy, the natural first step was to port the minimalist WSGI database user interface (but taking advantage of the robustness and templating (Jinja2) enhancements) to Werkzeug. This turned out to be very straightforward (see GitHub tag v0.3.2). Key changes:

  • replace my Request/Response objects by Werkzeug’s own
  • add a request argument to the various handler methods
  • simplify the dispatch methods, using request.path and request.form
  • replace the DatabaseApp.__call__ by a much simpler wsgi_app
  • eliminate serve_css (static files are served by Werkzeug’s SharedDataMiddleware)
  • use Werkzeug’s development server.

The second modification (see tag v0.3.3) was to use Werkzeug’s Map and Rule classes to refactor the URL routing. What I liked about this, compared to Flask’s, is that I could keep the delegation of routing within the FilmHandler class.

The final version —for now (see tag v0.3.4), improves the URLs to make them more RESTful. While they’re not completely in agreement with Oliver Charles comments, they’re a step in the right direction. This is something I could not achieve with CherryPy (without bringing in or writing an external module).

The bottom line is that I’ve decided to use Werkzeug for continuing with this tutorial. Perhaps what won me over was its debugger. Also, Werkzeug is a library, not a framework or even a micro-framework, so it feels like a much lighter dependency than CherryPy or Flask. If there is a concern, it is that Werkzeug lacks support for Python 3. This may make a difference —in terms of “traction”— now that WebOb has been ported, but I’m hoping Werkzeug/Flask fans will correct that in the not too distant future.

Quo vadis, Pyrseas?

When I found Andromeda, over a year ago, I was searching for a tool that would help me maintain PostgreSQL tables supporting web sites, somewhat like a Django admin app, but without Django itself since that was not part of the stack. Ideally, this tool would also facilitate table redesign. I had already written a maintenance app, using CherryPy, Mako and SQLAlchemy, but it was cumbersome to add or make changes to it, and database version control was limited to storing the latest CREATE TABLE statements in a Subversion repository.

Andromeda looked attractive because of its version control features and the capability of automatically generating admin apps. However, after experimenting, I found it lacked the ability to instropect an existing database, requiring you to input the schema in its own YAML format. Pyrseas was born as a result of thinking on how to improve this.

Fast forward to the present: Pyrseas 0.4.0 has been released, supporting about 70% of the PostgreSQL DDL features. This seems like a good time to reflect on where to go next, both short term and long term.

Database Version Control

On the version control front, the following tasks come to mind:

  • Add support for missing DDL features: TEXT SEARCH objects, USERs/ROLEs, TABLESPACEs, FDWs and EXTENSIONs.
  • Ensure existing DDL features are covered fully, i.e., all syntax options
  • Add further options to dbtoyaml/yamltodb for selective output or SQL generation
  • Package Pyrseas as a PG 9.1 EXTENSION, allowing access from other languages and tools
  • Support migration of data, i.e., when ALTER TABLE is not enough
  • Your favorite version control feature

Application Generation

Progress on this front has been slow. Pyrseas could generate minimalist web admin apps with CherryPy, Jinja2 and perhaps something like WTForms, but how many users would want to use them? Conversely, how many would also want support for HTML5, JQuery, or some other WSGI server or templating engine?

I’m not decided, but I still need that admin app, so perhaps I’ll build the capability and blog about it, for tutorial purposes, as I have done so far.

What do you think? What is your favorite database version control feature? What would you like to see in a database web app generation tool? Let me know …

P.S. I’ll be attending PGBR 2011, so hopefully I can hear from some of you personally, in English, em Português o en español.

Database Redesign and User Interface Refactoring

In this series about database user interfaces, until now I’ve focused on a single table with just three attributes. As I did with version control, I’ll be introducing additional entities and attributes and exploring how they could be presented or updated via the interface.

Here are some of the attributes we could add to our lonely film table:

  • a single-valued attribute
  • a multi-valued attribute with few values per entity
  • a multi-valued attribute with a potentially large number of values per entity
  • an audit trail attribute

Single valued attributes

Practically every film in existence has a primary spoken language (or intertitle language in the case of silent films). Rather than using the language table I showed before (patterned after Pagila), i.e., a smallint language_id, it is preferable to use a standard coding scheme, i.e., ISO 639. For purposes of this tutorial, ISO 639-1 will suffice. Another attribute we’ll add is the film running time in minutes.

Multi-valued attribute, small fan-out

Films are classified into one or more categories, usually identifying a genre, e.g., action, comedy, science fiction. Although there is no definitive list of genres and there are disagreements on categorizing any given film, there is a limited set of genres and most films fall into one or two categories and rarely more than six. Instead of the serial integer and separate film_category table used before (also as in Pagila), since we are using PostgreSQL, perhaps we’ll experiment with an array of varchar (or text) to list the categories in the film table.

Multi-valued attribute, large fan-out

Making a movie is of course a collective endeavor, employing several individuals as cast and crew (even student films). We’ll need a table to record their names and possibly other information and a second table to connect each film to those involved in it.

Challenges

Each of these kinds of attributes present different issues for the user interface.

Assuming we limit ourselves to a short list of genres, we could use a drop-down list to allow the user to choose the categories for each film, but we have to cater for multiple items being selected.

We may allow the user to enter the film language using a two-letter code, but we probably want to display the full language name as confirmation.

On the other hand, we don’t want to force the user to remember (or even be aware of) person identifiers to enter the cast and crew, so some search capability seems necessary.

Refactoring the Interface

To explore these additions, we’ll go back to the minimalist command line interface. However, we’ll first refactor it with the knowledge we gained up to this point.

Instead of the bl.py module we had earlier, we’ll copy the bl/film.py module from the CherryPy interface. We’ll also replace the dblib.py module by the more robust version, but adding a DbConnection.close() method.

The major refactoring occurs in dbapp.py. First, we add a top level “menu” (only two options for now: Films and Quit) and split film-related functionality to a separate film.py module. The latter is patterned after the corresponding module in the CherryPy version. A brief review of the result may be useful:

  • The Film class (in bl/film.py) is the part that knows how to fetch from and update the film table
  • The FilmForm class is the component that obtains valid data from the user
  • The FilmHandler, for the most part, acts a controller, sequencing the various operations

The code is available on GitHub, tagged as v0.4.0.

Dueling Frameworks

Cue the music

In this corner, the time-tested CherryPy, standing at version 3.2, six years old, enabled for Python 3, sporting an HTTP 1.1-compliant WSGI webserver, support for other WSGI servers or adapters, a plugin mechanism, built-in tools and much more.

In the other corner, the newcomer Flask, at version 0.7.2, about one year old (but with older ancestry), claiming to be a “microframework … with good intentions” and “fun.”

In my post A Funny Thing Happened on the Way to the Webserver I mentioned I took a look at the second contender and liked what I saw. First impressions, as they say, can be deceiving. What may seem appealing is sometimes harder to use or disappointing when you get down to writing code and testing it.

Coincidentally, after I had decided on the “Dueling Banjos” theme, Audrey Roy pointed me1 to Richard Jones’ “Web micro-framework Battle!” which provided additional food for thought.

As you may expect, my selection criteria don’t exactly match Richard’s. Perhaps the most striking difference is how he implemented routing (URL mapping, page 14 of the presentation) for the baseline cgi+wsgiref implementation vs. my choice for doing dispatching in my minimalist WSGI database UI (see in particular, the dispatch methods in dbapp.py and film.py).

Like Richard, I don’t appreciate “magic” in a framework or application. but I’m less concerned with strictly RESTful URLs. Richard appears to prefer the decorator approach to URL mapping (as seen by his Bottle, Flask and Pesto examples, among others), but also used URL mapping tables (seen in the web.py, Werkzeug and baseline examples).

The latter approach reminds me of a generic menu interface I wrote eons ago in C. It may be OK for a vtable in C++, but I usually find it inappropriate for application level code. In fact, the mapping table seems “magical” since it doesn’t encapsulate the code with the corresponding URL.

Although Flask looked attractive due to its URL routing decorator, it now appears limiting because it can only be used on functions but not on class methods. (Note: I haven’t explored all of Flask, e.g., haven’t looked at its Blueprints).

As a result, refactoring the WSGI application to use CherryPy was quite straightforward, whereas changing it to accomodate Flask involved quite a bit of surgery. This is exemplified by these Git stats:

 cherrypy/film.py |   73 ++++++++++++++++-------------------------------------
 1 files changed, 22 insertions(+), 51 deletions(-)

 flask/film.py |  235 ++++++++++++++++++++++++++-------------------------------
 1 files changed, 107 insertions(+), 128 deletions(-)

While many modifications simply involved indentation and removing the self parameter, other changes were necessary. For example, as there can only be one mapping to a function named ‘index’, I had to rename the film.py ‘index’ function/method to ‘list’ to avoid conflict with dbapp.py. This does not bode well for extensibility: a real application would support multiple entities—films, actors, customers, etc.—with similar function/methods. Maybe this is what Blueprints are for, but why invent another wheel when Python classes are available?

There were other minor annoyances with Flask, like how to instantiate its application object without adding a subdirectory, or how to pass the database connection to the film.py functions without using a global variable.

I may continue to explore Flask, although at the moment CherryPy seems more suitable to my purposes. I’m also intrigued by some concepts of Richard’s winners—Pesto and Bottle. A Pesto-like dispatcher with support for class methods on top of CherryPy would probably make my day.

The code implementing the CherryPy database UI interface is on GitHub tagged as version v0.3.0 and the corresponding Flask code is also there tagged as v0.3.1.


1 BTW, +1 for Audrey’s wish list item: Python in the Browser (see pages 16-18 of her presentation).