Pyrseas/PostgreSQL Feature Matrix

In my last post, I wrote—referring to the state of Pyrseas after version 0.5 is released:

The only gaps left are TABLESPACE, GROUP/ROLE and the EXTENSIONs added in PG 9.1.

I’m afraid I should’ve double checked the list of 9.1 SQL CREATE statements. I missed COLLATIONs. I’ve created a new page, Feature Matrix, that shows the correct picture, which will be updated as subsequent releases are made.


More Database Tools?

It’s been over year since I started blogging on these pages about Pyrseas and version control. In a month it will also be the first anniversary of the initial commit to GitHub. Much code and many words have flown under these “bridges,” so this seems an appropriate time to reflect.

When I discovered Andromeda, I was looking for a framework to do simple (CRUD-type) database updates

  • with more flexibility (read, programability) than pgAdmin or phpPgAdmin
  • without being tied to an object-relational mapper, either built-in as in Django or external as SQLAlchemy (Pylons/Pyramid)
  • without having to write repetitive code, either SQL or ORM.

Andromeda appeared to satisfy these objectives (although I wasn’t thrilled about having to customize it in PHP).

When I conceived dbtoyaml, I was being lazy: reacting to Andromeda’s requirement to handcraft a YAML description of a database before I could use it to manage SQL changes to it. I thought: why not create the YAML from the database catalogs?

Since my concept for a YAML database specification didn’t match well to Andromeda’s, that led to yamltodb, my attempt to recreate the SQL “diff’ing” features of Andromeda in Python. Andromeda did it using the information_schema catalogs, which made it portable to other DBMSs that had those. Andromeda also did the comparisons by issuing SQL queries (which didn’t perform well). I chose to use the pg_catalog tables and did the comparisons directly on Python structures.

At first, I had intended to only diff schemas and tables and not much more, since that sufficed for my purposes. However, Peter Eisentraut’s comment eventually convinced me that Pyrseas had to support ALL PostgreSQL DDL features1. I’m very pleased with what was accomplished. Pyrseas 0.5, to be released shortly2, will add support for TEXTSEARCH and FOREIGN DATA WRAPPER related objects. The only gaps left are TABLESPACE, GROUP/ROLE and the EXTENSIONs added in PG 9.1.

2012 brought another turn of events. My post on the controversy between Chris Travers and Tony Marston on whether business logic ought to reside in the database led to collaboration with Roger Hunwicks to create dbextend, a tool to automate database augmentation. A first submission was made and work continues on that front.

The latter effort raises other possibilities. For example, since yamltodb already knows how to create nearly all PG objects, it would be trivial to create a schemadump utility (equivalent to pg_dump -s). Another potential tool of interest to PostgreSQL advocates: dbtoyaml for other databases (mytoyaml, oratoyaml anyone?) together with a conversion utility that operates on the YAML specification so it can be accepted by the PG-only yamltodb (the YAML converter seems should be easier than editing SQL statements). The YAML/JSON output from dbtoyaml is amenable to other analysis or automation tasks.

I hope to get back to my database user interface “dream” … one of these days, but in the meantime, I’m glad for having taken these detours. I’d like to thank those who helped along the way: Josh Berkus, Robert Brewer, Adam Cornett, Ronan Dunklau, Peter Eisentraut, David Fetter, Dickson Guedes, Matthias Howell, Roger Hunwicks, Toon Koppelaars, Marko Kreen, Fabrízio Mello, Regina Obe, Filip Rembialkowski, Dariusz Suchojad, Daniele Varrazzo, Evgeni Vasilev, David Wheeler and others I may have missed.

1 Actually, Josh Berkus was the first one who mentioned (in a private email) that I ought to support all PG objects.
2 And just in time for PyCon, I’m happy to announce that it will support Python 3.


The Phantom of the Database – Part 1

Scenario: A row with “Seven  Samurai” (notice two spaces between the words) as the movie title and 1956 as the release year, in the film table.

Plot: User Alice updates the row to remove the extra space. In the meantime, user Bob updates it to change the title to its phonetic Japanese equivalent “Shichinin no Samurai” and user Carol updates the release date to 1954, the year the movie was first shown in Japan. Whose updates will survive?

If the updates take place sequentially and each user fetches the previously updated row prior to saving his or her change, we should end up with a row with “Shichinin no Samurai” as the title and 1954 as the year.

However, if each user starts viewing the original row in a web browser and the application is incorrectly designed, one or two changes may be overwritten.

For example, my minimalist user interface application even in its more robust incarnation, issues an UPDATE with both the title and the year as received from the user. This was done intentionally, for simplicity, to be corrected later. As a result, if the users press the “Save” button in the sequence shown, both Alice’s correction to the English title and Bob’s change to a Japanese title will be lost because they will be overwritten by Carol’s change.

If you’ve used a centralized VCS, like Subversion, you’re probably familiar with the issue. If our users were developers updating a source file with the movie information, they would (typically) be prevented from overwriting because the VCS would inform the second and third users that they had to merge the changes in prior commits.

So, is it enough if the application only updates the columns that were changed by the users, somewhat like a VCS automatically merging non-conflicting changed lines?

To be continued …


Quo vadis, Pyrseas?

When I found Andromeda, over a year ago, I was searching for a tool that would help me maintain PostgreSQL tables supporting web sites, somewhat like a Django admin app, but without Django itself since that was not part of the stack. Ideally, this tool would also facilitate table redesign. I had already written a maintenance app, using CherryPy, Mako and SQLAlchemy, but it was cumbersome to add or make changes to it, and database version control was limited to storing the latest CREATE TABLE statements in a Subversion repository.

Andromeda looked attractive because of its version control features and the capability of automatically generating admin apps. However, after experimenting, I found it lacked the ability to instropect an existing database, requiring you to input the schema in its own YAML format. Pyrseas was born as a result of thinking on how to improve this.

Fast forward to the present: Pyrseas 0.4.0 has been released, supporting about 70% of the PostgreSQL DDL features. This seems like a good time to reflect on where to go next, both short term and long term.

Database Version Control

On the version control front, the following tasks come to mind:

  • Add support for missing DDL features: TEXT SEARCH objects, USERs/ROLEs, TABLESPACEs, FDWs and EXTENSIONs.
  • Ensure existing DDL features are covered fully, i.e., all syntax options
  • Add further options to dbtoyaml/yamltodb for selective output or SQL generation
  • Package Pyrseas as a PG 9.1 EXTENSION, allowing access from other languages and tools
  • Support migration of data, i.e., when ALTER TABLE is not enough
  • Your favorite version control feature

Application Generation

Progress on this front has been slow. Pyrseas could generate minimalist web admin apps with CherryPy, Jinja2 and perhaps something like WTForms, but how many users would want to use them? Conversely, how many would also want support for HTML5, JQuery, or some other WSGI server or templating engine?

I’m not decided, but I still need that admin app, so perhaps I’ll build the capability and blog about it, for tutorial purposes, as I have done so far.

What do you think? What is your favorite database version control feature? What would you like to see in a database web app generation tool? Let me know …

P.S. I’ll be attending PGBR 2011, so hopefully I can hear from some of you personally, in English, em Português o en español.


Version Control for PostGIS

A year ago, I was only trying to “rescue” Andromeda

When I started the Pyrseas project, I was mainly interested in improving on the basic table/column version control of Andromeda. I never thought I’d be adding support for “exotic” features such as base types, operators or operator classes. However, the initial feedback led me to set my sights much higher: Pyrseas was to offer support for all PostgreSQL data definition features.

Over the past two and a half months, I’ve added coverage for those advanced features and more. I’m glad to report that, with the help of Leo Hsu and Regina Obe, the Pyrseas tools are now able to output all the objects1 in a database loaded with postgis.sql, spatial_ref_sys.sql and postgis_comments.sql from PostGIS 1.5, in YAML format. The tools are also able to take in that YAML and generate the SQL necessary to recreate all those objects, in the correct order, on an empty database, as well as to reverse the process.

The following is an update to the feature coverage table presented earlier. Although Pyrseas 0.4 has not been released yet, all these capabilities are present in the current HEAD GitHub repository.

DDL Feature 0.1 0.2 0.3 0.4
– ENUM  X  X
– composite  X
– base type  X

I’d like to thank Leo and Regina for their help, as well aquilax, mhow and acornett at GitHub for being early adopters and raising issues to improve Pyrseas functionality.

If you have a challenging PostgreSQL database, I’d like to encourage you to give Pyrseas a try, and let me know how it works out.

1 For those interested, this includes 2 tables, 1 view, 9 base types, 1 composite type, 777 functions, 17 aggregate functions, 19 casts, 4 operator classes/families, 23 operators, and 315 comments.


Database Redesign and User Interface Refactoring

In this series about database user interfaces, until now I’ve focused on a single table with just three attributes. As I did with version control, I’ll be introducing additional entities and attributes and exploring how they could be presented or updated via the interface.

Here are some of the attributes we could add to our lonely film table:

  • a single-valued attribute
  • a multi-valued attribute with few values per entity
  • a multi-valued attribute with a potentially large number of values per entity
  • an audit trail attribute

Single valued attributes

Practically every film in existence has a primary spoken language (or intertitle language in the case of silent films). Rather than using the language table I showed before (patterned after Pagila), i.e., a smallint language_id, it is preferable to use a standard coding scheme, i.e., ISO 639. For purposes of this tutorial, ISO 639-1 will suffice. Another attribute we’ll add is the film running time in minutes.

Multi-valued attribute, small fan-out

Films are classified into one or more categories, usually identifying a genre, e.g., action, comedy, science fiction. Although there is no definitive list of genres and there are disagreements on categorizing any given film, there is a limited set of genres and most films fall into one or two categories and rarely more than six. Instead of the serial integer and separate film_category table used before (also as in Pagila), since we are using PostgreSQL, perhaps we’ll experiment with an array of varchar (or text) to list the categories in the film table.

Multi-valued attribute, large fan-out

Making a movie is of course a collective endeavor, employing several individuals as cast and crew (even student films). We’ll need a table to record their names and possibly other information and a second table to connect each film to those involved in it.


Each of these kinds of attributes present different issues for the user interface.

Assuming we limit ourselves to a short list of genres, we could use a drop-down list to allow the user to choose the categories for each film, but we have to cater for multiple items being selected.

We may allow the user to enter the film language using a two-letter code, but we probably want to display the full language name as confirmation.

On the other hand, we don’t want to force the user to remember (or even be aware of) person identifiers to enter the cast and crew, so some search capability seems necessary.

Refactoring the Interface

To explore these additions, we’ll go back to the minimalist command line interface. However, we’ll first refactor it with the knowledge we gained up to this point.

Instead of the bl.py module we had earlier, we’ll copy the bl/film.py module from the CherryPy interface. We’ll also replace the dblib.py module by the more robust version, but adding a DbConnection.close() method.

The major refactoring occurs in dbapp.py. First, we add a top level “menu” (only two options for now: Films and Quit) and split film-related functionality to a separate film.py module. The latter is patterned after the corresponding module in the CherryPy version. A brief review of the result may be useful:

  • The Film class (in bl/film.py) is the part that knows how to fetch from and update the film table
  • The FilmForm class is the component that obtains valid data from the user
  • The FilmHandler, for the most part, acts a controller, sequencing the various operations

The code is available on GitHub, tagged as v0.4.0.


Audit Trails

Back in February, I defined the film table version 0.1 with just three columns:

    title VARCHAR(32) NOT NULL,
    release_year INTEGER NOT NULL CHECK (release_year >= 1888)

It was unclear then how far I’d go in supporting all of PostgreSQL data definition features. Now, Pyrseas is making progress towards covering all those capabilities.

Consequently, I’d like to add one column to the above and tag it as version 0.1.1. This is the column to be added (from the Pagila sample database):

   last_update timestamp with time zone NOT NULL DEFAULT now()

With this new column, I’ll be able to explore another kind of field in the database user interface: one that will not be present on new records (because the DEFAULT will take care of filling in that data), but will be displayed, in readonly mode, for updates. Note: In Pagila, last_update is defined as timestamp without time zone, but that seems too ambiguous for my taste.

To complete this, we need an ON UPDATE trigger together with a function to be called by the trigger. These are also provided by Pagila, but to highlight some of the capabilities of Pyrseas, I’ll first show the output of dbtoyaml against a database with all the necessary objects:

language plpgsql:
  trusted: true
schema public:
  description: standard public schema
  function last_updated():
    language: plpgsql
    returns: trigger
    source: "\nBEGIN\n    NEW.last_update = CURRENT_TIMESTAMP;\n    RETURN NEW;\n\
      END "
  table film:
        - release_year
        expression: (release_year >= 1888)
    - id:
        not_null: true
        type: integer
    - title:
        not_null: true
        type: character varying(32)
    - release_year:
        not_null: true
        type: integer
    - last_update:
        default: now()
        not_null: true
        type: timestamp with time zone
        access_method: btree
        - id
        - update
        level: row
        procedure: last_updated()
        timing: before

If you feed this YAML to yamltodb against a database that already has the film table version 0.1, it generates the following (against a PostgreSQL 8.4 DBMS—vs. 9.0 the creation of PL/pgSQL is bypassed):

SET check_function_bodies = false;
CREATE FUNCTION last_updated() RETURNS trigger
    LANGUAGE plpgsql
    AS $_$
    NEW.last_update = CURRENT_TIMESTAMP;
END $_$;
    ADD COLUMN last_update timestamp with time zone NOT NULL DEFAULT now();
CREATE TRIGGER last_updated
    EXECUTE PROCEDURE last_updated();

Thus we can successfully upgrade the version 0.1 database to version 0.1.1, with our audit trail column.

Happy trails to you, ’till we meet again …