Category Archives: Version control

The Phantom of the Database – Part 1

Scenario: A row with “Seven  Samurai” (notice two spaces between the words) as the movie title and 1956 as the release year, in the film table.

Plot: User Alice updates the row to remove the extra space. In the meantime, user Bob updates it to change the title to its phonetic Japanese equivalent “Shichinin no Samurai” and user Carol updates the release date to 1954, the year the movie was first shown in Japan. Whose updates will survive?

If the updates take place sequentially and each user fetches the previously updated row prior to saving his or her change, we should end up with a row with “Shichinin no Samurai” as the title and 1954 as the year.

However, if each user starts viewing the original row in a web browser and the application is incorrectly designed, one or two changes may be overwritten.

For example, my minimalist user interface application even in its more robust incarnation, issues an UPDATE with both the title and the year as received from the user. This was done intentionally, for simplicity, to be corrected later. As a result, if the users press the “Save” button in the sequence shown, both Alice’s correction to the English title and Bob’s change to a Japanese title will be lost because they will be overwritten by Carol’s change.

If you’ve used a centralized VCS, like Subversion, you’re probably familiar with the issue. If our users were developers updating a source file with the movie information, they would (typically) be prevented from overwriting because the VCS would inform the second and third users that they had to merge the changes in prior commits.

So, is it enough if the application only updates the columns that were changed by the users, somewhat like a VCS automatically merging non-conflicting changed lines?

To be continued …

Quo vadis, Pyrseas?

When I found Andromeda, over a year ago, I was searching for a tool that would help me maintain PostgreSQL tables supporting web sites, somewhat like a Django admin app, but without Django itself since that was not part of the stack. Ideally, this tool would also facilitate table redesign. I had already written a maintenance app, using CherryPy, Mako and SQLAlchemy, but it was cumbersome to add or make changes to it, and database version control was limited to storing the latest CREATE TABLE statements in a Subversion repository.

Andromeda looked attractive because of its version control features and the capability of automatically generating admin apps. However, after experimenting, I found it lacked the ability to instropect an existing database, requiring you to input the schema in its own YAML format. Pyrseas was born as a result of thinking on how to improve this.

Fast forward to the present: Pyrseas 0.4.0 has been released, supporting about 70% of the PostgreSQL DDL features. This seems like a good time to reflect on where to go next, both short term and long term.

Database Version Control

On the version control front, the following tasks come to mind:

  • Add support for missing DDL features: TEXT SEARCH objects, USERs/ROLEs, TABLESPACEs, FDWs and EXTENSIONs.
  • Ensure existing DDL features are covered fully, i.e., all syntax options
  • Add further options to dbtoyaml/yamltodb for selective output or SQL generation
  • Package Pyrseas as a PG 9.1 EXTENSION, allowing access from other languages and tools
  • Support migration of data, i.e., when ALTER TABLE is not enough
  • Your favorite version control feature

Application Generation

Progress on this front has been slow. Pyrseas could generate minimalist web admin apps with CherryPy, Jinja2 and perhaps something like WTForms, but how many users would want to use them? Conversely, how many would also want support for HTML5, JQuery, or some other WSGI server or templating engine?

I’m not decided, but I still need that admin app, so perhaps I’ll build the capability and blog about it, for tutorial purposes, as I have done so far.

What do you think? What is your favorite database version control feature? What would you like to see in a database web app generation tool? Let me know …

P.S. I’ll be attending PGBR 2011, so hopefully I can hear from some of you personally, in English, em Português o en español.

Version Control for PostGIS

A year ago, I was only trying to “rescue” Andromeda

When I started the Pyrseas project, I was mainly interested in improving on the basic table/column version control of Andromeda. I never thought I’d be adding support for “exotic” features such as base types, operators or operator classes. However, the initial feedback led me to set my sights much higher: Pyrseas was to offer support for all PostgreSQL data definition features.

Over the past two and a half months, I’ve added coverage for those advanced features and more. I’m glad to report that, with the help of Leo Hsu and Regina Obe, the Pyrseas tools are now able to output all the objects1 in a database loaded with postgis.sql, spatial_ref_sys.sql and postgis_comments.sql from PostGIS 1.5, in YAML format. The tools are also able to take in that YAML and generate the SQL necessary to recreate all those objects, in the correct order, on an empty database, as well as to reverse the process.

The following is an update to the feature coverage table presented earlier. Although Pyrseas 0.4 has not been released yet, all these capabilities are present in the current HEAD GitHub repository.

DDL Feature 0.1 0.2 0.3 0.4
AGGREGATE  X  X
CAST  X
CONSTRAINT TRIGGER  X
CONVERSION  X
DOMAIN  X  X
FOREIGN DATA WRAPPER
FUNCTION  X  X  X
GROUP
INDEX  X  X  X  X
LANGUAGE  X  X  X
OPERATOR  X
OPERATOR CLASS  X
OPERATOR FAMILY  X
ROLE
RULE  X  X
SCHEMA  X  X  X  X
SEQUENCE  X  X  X  X
SERVER
TABLE  X  X  X  X
– CHECK CONSTRAINT  X  X  X  X
– PRIMARY KEY  X  X  X  X
– FOREIGN KEY  X  X  X  X
– UNIQUE CONSTRAINT  X  X  X  X
– INHERIT  X  X  X
TABLESPACE
TEXT SEARCH CONFIGURATION
TEXT SEARCH DICTIONARY
TEXT SEARCH PARSER
TEXT SEARCH TEMPLATE
TRIGGER  X  X
TYPE  X  X
– ENUM  X  X
– composite  X
– base type  X
USER
USER MAPPING
VIEW  X  X  X
COMMENT  X  X  X

I’d like to thank Leo and Regina for their help, as well aquilax, mhow and acornett at GitHub for being early adopters and raising issues to improve Pyrseas functionality.

If you have a challenging PostgreSQL database, I’d like to encourage you to give Pyrseas a try, and let me know how it works out.


1 For those interested, this includes 2 tables, 1 view, 9 base types, 1 composite type, 777 functions, 17 aggregate functions, 19 casts, 4 operator classes/families, 23 operators, and 315 comments.

Database Redesign and User Interface Refactoring

In this series about database user interfaces, until now I’ve focused on a single table with just three attributes. As I did with version control, I’ll be introducing additional entities and attributes and exploring how they could be presented or updated via the interface.

Here are some of the attributes we could add to our lonely film table:

  • a single-valued attribute
  • a multi-valued attribute with few values per entity
  • a multi-valued attribute with a potentially large number of values per entity
  • an audit trail attribute

Single valued attributes

Practically every film in existence has a primary spoken language (or intertitle language in the case of silent films). Rather than using the language table I showed before (patterned after Pagila), i.e., a smallint language_id, it is preferable to use a standard coding scheme, i.e., ISO 639. For purposes of this tutorial, ISO 639-1 will suffice. Another attribute we’ll add is the film running time in minutes.

Multi-valued attribute, small fan-out

Films are classified into one or more categories, usually identifying a genre, e.g., action, comedy, science fiction. Although there is no definitive list of genres and there are disagreements on categorizing any given film, there is a limited set of genres and most films fall into one or two categories and rarely more than six. Instead of the serial integer and separate film_category table used before (also as in Pagila), since we are using PostgreSQL, perhaps we’ll experiment with an array of varchar (or text) to list the categories in the film table.

Multi-valued attribute, large fan-out

Making a movie is of course a collective endeavor, employing several individuals as cast and crew (even student films). We’ll need a table to record their names and possibly other information and a second table to connect each film to those involved in it.

Challenges

Each of these kinds of attributes present different issues for the user interface.

Assuming we limit ourselves to a short list of genres, we could use a drop-down list to allow the user to choose the categories for each film, but we have to cater for multiple items being selected.

We may allow the user to enter the film language using a two-letter code, but we probably want to display the full language name as confirmation.

On the other hand, we don’t want to force the user to remember (or even be aware of) person identifiers to enter the cast and crew, so some search capability seems necessary.

Refactoring the Interface

To explore these additions, we’ll go back to the minimalist command line interface. However, we’ll first refactor it with the knowledge we gained up to this point.

Instead of the bl.py module we had earlier, we’ll copy the bl/film.py module from the CherryPy interface. We’ll also replace the dblib.py module by the more robust version, but adding a DbConnection.close() method.

The major refactoring occurs in dbapp.py. First, we add a top level “menu” (only two options for now: Films and Quit) and split film-related functionality to a separate film.py module. The latter is patterned after the corresponding module in the CherryPy version. A brief review of the result may be useful:

  • The Film class (in bl/film.py) is the part that knows how to fetch from and update the film table
  • The FilmForm class is the component that obtains valid data from the user
  • The FilmHandler, for the most part, acts a controller, sequencing the various operations

The code is available on GitHub, tagged as v0.4.0.

Audit Trails

Back in February, I defined the film table version 0.1 with just three columns:

CREATE TABLE film (
    id INTEGER NOT NULL PRIMARY KEY,
    title VARCHAR(32) NOT NULL,
    release_year INTEGER NOT NULL CHECK (release_year >= 1888)
);

It was unclear then how far I’d go in supporting all of PostgreSQL data definition features. Now, Pyrseas is making progress towards covering all those capabilities.

Consequently, I’d like to add one column to the above and tag it as version 0.1.1. This is the column to be added (from the Pagila sample database):

   last_update timestamp with time zone NOT NULL DEFAULT now()

With this new column, I’ll be able to explore another kind of field in the database user interface: one that will not be present on new records (because the DEFAULT will take care of filling in that data), but will be displayed, in readonly mode, for updates. Note: In Pagila, last_update is defined as timestamp without time zone, but that seems too ambiguous for my taste.

To complete this, we need an ON UPDATE trigger together with a function to be called by the trigger. These are also provided by Pagila, but to highlight some of the capabilities of Pyrseas, I’ll first show the output of dbtoyaml against a database with all the necessary objects:

language plpgsql:
  trusted: true
schema public:
  description: standard public schema
  function last_updated():
    language: plpgsql
    returns: trigger
    source: "\nBEGIN\n    NEW.last_update = CURRENT_TIMESTAMP;\n    RETURN NEW;\n\
      END "
  table film:
    check_constraints:
      film_release_year_check:
        columns:
        - release_year
        expression: (release_year >= 1888)
    columns:
    - id:
        not_null: true
        type: integer
    - title:
        not_null: true
        type: character varying(32)
    - release_year:
        not_null: true
        type: integer
    - last_update:
        default: now()
        not_null: true
        type: timestamp with time zone
    primary_key:
      film_pkey:
        access_method: btree
        columns:
        - id
    triggers:
      last_updated:
        events:
        - update
        level: row
        procedure: last_updated()
        timing: before

If you feed this YAML to yamltodb against a database that already has the film table version 0.1, it generates the following (against a PostgreSQL 8.4 DBMS—vs. 9.0 the creation of PL/pgSQL is bypassed):

BEGIN;
CREATE LANGUAGE plpgsql;
SET check_function_bodies = false;
CREATE FUNCTION last_updated() RETURNS trigger
    LANGUAGE plpgsql
    AS $_$
BEGIN
    NEW.last_update = CURRENT_TIMESTAMP;
    RETURN NEW;
END $_$;
ALTER TABLE film
    ADD COLUMN last_update timestamp with time zone NOT NULL DEFAULT now();
CREATE TRIGGER last_updated
    BEFORE UPDATE ON film
    FOR EACH ROW
    EXECUTE PROCEDURE last_updated();
COMMIT;

Thus we can successfully upgrade the version 0.1 database to version 0.1.1, with our audit trail column.

Happy trails to you, ’till we meet again …

PostgreSQL Version Control Feature Coverage

In a comment to one of my early posts about version control, Peter Eisentraut stated he was “somewhat discouraged because [seeing how fast PostgreSQL develops and adds new features,] I don’t see how a tool like apgdiff can keep up and stay useful unless a lot more resources are put into it.” At the time, the first release of Pyrseas (0.1.0) was well over a month away. Currently, Pyrseas 0.3.0 is nearing release . This seems like a good time to summarize where Pyrseas stands vis-à-vis the list of PostgreSQL data definition features:

DDL Feature 0.1 0.2 0.3
AGGREGATE  X
CAST
CONSTRAINT TRIGGER
CONVERSION
DOMAIN  X
FOREIGN DATA WRAPPER
FUNCTION  X  X
GROUP
INDEX  X  X  X
LANGUAGE  X  X
OPERATOR
OPERATOR CLASS
OPERATOR FAMILY
ROLE
RULE X
SCHEMA X X X
SEQUENCE X X X
SERVER
TABLE  X  X  X
 – CHECK CONSTRAINT  X  X  X
 – PRIMARY KEY  X  X  X
 – FOREIGN KEY  X  X  X
 – UNIQUE CONSTRAINT  X  X  X
 – INHERIT  X  X
TABLESPACE
TEXT SEARCH CONFIGURATION
TEXT SEARCH DICTIONARY
TEXT SEARCH PARSER
TEXT SEARCH TEMPLATE
TRIGGER  X
TYPE  X
USER
USER MAPPING
VIEW  X  X
COMMENT  X  X

Note that PosgreSQL 9.1 will add EXTENSION to the list above. Note also that although the links in the matrix go to the corresponding CREATE object documentation page, Pyrseas supports CREATE, CREATE OR REPLACE, DROP and ALTER, as applicable. A final note: an “X” in a particular release doesn’t imply that the feature is supported in all its syntactic glory.

Caveats aside, progress has been made. Pyrseas 0.1.0 supported most of the basic relational database objects and concepts present in the SQL Standard, and usually covered by generic version control tools. Pyrseas 0.3.0 will support the features showcased in the Pagila sample database, which I hope is somewhat representative of the average PostgreSQL database (in terms of DDL features, of course).

The goal for the 0.4.0 release will be to cover the remaining features excluding Text Search, SQL/MED and ROLE/GROUP, which will be tackled in 0.5.0 or later. Is there a feature you or your organization is interested in? Let me know in the comments.

Gearing up for a second release

The first release of Pyrseas was aimed at addressing the essential version control issues described in my first posts. The second release, somewhat accidentally, has focused on being able to recreate the PostgreSQL autodoc regression database. In addition, Pyrseas will be released on the PostgreSQL Extension Network (PGXN).

Pyrseas already supports the basic schema, table, column, primary key, foreign key and index CREATE and ALTER operations provided by PostgreSQL (and every relational DBMS worthy of that designation). Here is a summary of PostgreSQL features that will be supported in the upcoming release:

  • COMMENTs on schemas, tables, columns and functions.
  • FOREIGN KEY ON UPDATE and ON DELETE actions.
  • ALTER TABLE RENAME COLUMN and enhanced support for other ALTER object RENAME statements.
  • VIEWs
  • INHERITed tables, and by extension, partitioned tables.
  • PROCEDURAL LANGUAGEs
  • FUNCTIONs

Support for some of these features is still elementary. However, the Pyrseas utilities are now able to recreate the autodoc database and revert back to an empty database, i.e., they also know how to drop each object in the correct order.

The Pyrseas unit tests have also been enhanced so they can be run against different PostgreSQL installations. For example, using the PYRSEAS_TEST_PORT environment variable I was able to run the tests against the 9.1 Beta 1 release, as well as 9.0 and 8.4.

Looking forward, the subsequent release will tackle being able to recreate the Pagila sample database, thus adding support for TYPEs, DOMAINs, AGGREGATEs, TRIGGERs and RULEs, and improving support for existing features.

Update: Pyrseas 0.2.0 has been released and is now available via PyPI, PGXN, PgFoundry and from the GitHub repository.