A Minimalist Command Line Database User Interface

To begin exploring the design of a generic database user interface as mentioned in my previous post, I’ve written a minimalist command line program. To try it, do this:

$ git clone git://github.com/jmafc/database-ui-tutorial.git dbui
$ cd dbui
$ createdb moviesdev
$ yamltodb moviesdev film.yaml | psql moviesdev

Of course, there are some pre-requisites: Git, PostgreSQL, Python, Psycopg2 and Pyrseas. If you don’t want to install the latter, create the film table according to the 0.1 version here. If you simply want to look at the code, you can find dbapp.py at GitHub here.

The following is a sample usage session:

$ cmdline/dbapp.py moviesdev
   A - Add
   L - List
   U - Update
   D - Delete
   Q - Quit
Command? l
    Id Title                            Year
 19777 The Cocoanuts                    1929
 20629 All Quiet on the Western Front   1930
 20640 Animal Crackers                  1930
   A - Add
   L - List
   U - Update
   D - Delete
   Q - Quit
Command? a
Id [0]: 1234
Title []: Testing
Release year [0]: 2010
Film 'Testing - 2010' added
   A - Add
   L - List
   U - Update
   D - Delete
   Q - Quit
Command? u
Id: 1234
Updating 'Testing - 2010'
Title [Testing]: A Test Movie
Release year [2010]:
Film 'A Test Movie - 2010' updated
   A - Add
   L - List
   U - Update
   D - Delete
   Q - Quit
Command? d
Id: 1234
Delete film 'A Test Movie - 2010' (y/n) [n]: y
Film 'A Test Movie - 2010' deleted
   A - Add
   L - List
   U - Update
   D - Delete
   Q - Quit
Command? q
Done

Caveat emptor: The code is very succint and has limited error checking.

Database User Interface Basics

Examining the sample application leads us to identify the following essential features:

  1. The application needs to present data both as single records and in list or tabular form.
  2. The data is entered and displayed as characters but possibly stored or interfaced to the DBMS in binary formats, hence conversion procedures between external and internal formats are necessary.
  3. Data items need an initialization or default value.
  4. Many or most data items need validation procedures, including a failure message. Some validations are generic to the data type, others are specific to the item.
  5. Depending on the operation mode, modification of some items may be prevented, e.g., primary key cannot be entered or changed in Update mode.
  6. One or more facilities are needed to select a record for Update or Delete, e.g., by the primary key, from a list, etc.
  7. When reporting actions taken, a short external representation of a record is desirable1.

Some of these may seem tautological, but they apply regardless of whether the user interface is line-oriented, character-oriented (like ncurses), graphical or web-based.

A program to develop database applications needs to assist the developer in specifying these features. For example, the program has to provide a means to design a single record or tabular format. The advantage of having access to the database catalogs is that the program can facilitate this by supplying useful defaults at various stages in the design process.

Once the design is finalized, we’ll probably want to store it in the database itself, but –mindful of version control– it’s recommended we keep the design in a YAML specification file that can be stored in a VCS.


1 The primary key for the film table is an integer, and in the sample session, the IMDb identifier was used as it were a “natural” key. In most applications, such an id would be auto-generated and possibly not shown to the user.

User Interfaces for Databases

Pyrseas was started to improve on Andromeda. In addition to the schema version control I’ve been discussing in these posts, Andromeda provides two other capabilities: automations and web application programming.

Andromeda’s web programming allows you to generate a database application with very little actual programming, since Andromeda takes into account the primary key, foreign key and other information present in the YAML specification to generate much of the application code for you. This is another area that I’d like to explore for implementation in Pyrseas.

Administrative Applications

Nearly any database needs a user interface to add, retrieve, update and delete records, known by the unattractive acronym CRUD. There are a number of admin apps that provide such generic facilties. For example, for PostgreSQL we have pgAdmin which offers a GUI, and phpPgAdmin and Adminer which offer PHP web interfaces. These have a couple of limitations.

Admin apps almost always operate on a single table at a time. Interfaces for end users often require interacting with or viewing two or more tables on a single screen or page, e.g., customers with their invoices, accounts with their transactions.

By definition, admin apps are “raw” and cater to the needs of DBAs and programmers. After retrieving a customer record searching by name, the admin user can update any column, including the primary key and other columns that end users would normally be prevented from updating.

Django Admin

Django includes an appealing admin site. With a moderate dose of programming, you can develop an application suitable for end users. However, you’re at the mercy of Django’s constraints on “modeling” a database. Take for example the film_category table I used in my second post on version control:

CREATE TABLE film_category (
    film_id INTEGER NOT NULL REFERENCES film (id),
    category_id INTEGER NOT NULL
        REFERENCES category (category_id),
    last_update TIMESTAMP WITH TIME ZONE NOT NULL,
    PRIMARY KEY (film_id, category_id)
);

Django cannot use this table “as is” because it insists in having a single column as the primary key. So you end up adding an unnecessary “id” column together with a SEQUENCE (and needless to say, you incur the wrath of the SSSKA).

We could turn to SQLAlchemy, a Python-based ORM that understands that a primary key –albeit singular– can encompass more than one column, and some other web framework or even Django itself, and attempt to recreate the nice Django admin app. However, aside from the additional effort that would represent, we wouldn’t be taking advantage of the YAML database specification we can now store in our VCS.

This is a very broad, and some may say, ambitious topic, but as I did with version control, in future posts I’ll survey existing “art” and progress towards designing and implementing a generic end user interface for PostgreSQL. Reader feedback will be much appreciated.

Gearing up for a second release

The first release of Pyrseas was aimed at addressing the essential version control issues described in my first posts. The second release, somewhat accidentally, has focused on being able to recreate the PostgreSQL autodoc regression database. In addition, Pyrseas will be released on the PostgreSQL Extension Network (PGXN).

Pyrseas already supports the basic schema, table, column, primary key, foreign key and index CREATE and ALTER operations provided by PostgreSQL (and every relational DBMS worthy of that designation). Here is a summary of PostgreSQL features that will be supported in the upcoming release:

  • COMMENTs on schemas, tables, columns and functions.
  • FOREIGN KEY ON UPDATE and ON DELETE actions.
  • ALTER TABLE RENAME COLUMN and enhanced support for other ALTER object RENAME statements.
  • VIEWs
  • INHERITed tables, and by extension, partitioned tables.
  • PROCEDURAL LANGUAGEs
  • FUNCTIONs

Support for some of these features is still elementary. However, the Pyrseas utilities are now able to recreate the autodoc database and revert back to an empty database, i.e., they also know how to drop each object in the correct order.

The Pyrseas unit tests have also been enhanced so they can be run against different PostgreSQL installations. For example, using the PYRSEAS_TEST_PORT environment variable I was able to run the tests against the 9.1 Beta 1 release, as well as 9.0 and 8.4.

Looking forward, the subsequent release will tackle being able to recreate the Pagila sample database, thus adding support for TYPEs, DOMAINs, AGGREGATEs, TRIGGERs and RULEs, and improving support for existing features.

Update: Pyrseas 0.2.0 has been released and is now available via PyPI, PGXN, PgFoundry and from the GitHub repository.

SQL Version Control Implementation Choices II

In the previous post, I discussed two alternative implementations for the yamltodb difference engine: SQL comparisons using extra dictionary tables, and comparing the YAML/JSON maps (Python dicts). Today I’ll review the last option: internal structure comparisons.

Internal Structures – Take 1

The original implementation for yamltodb was based on the implementation of dbtoyaml. Since the YAML maps output by the latter were hierarchical, the internal structures were also organized in layers: a Python class instance (Database) to represent the database, having a dict of classes representing the schemas, each in turn with a dict of “schema objects,” i.e., tables, sequences, etc. Each layer was populated by queries against the corresponding PostgreSQL catalogs.

This organization was also used for the difference engine. A second Database instance represented the YAML input. The two hierarchies were then traversed and compared at each layer, outputting SQL as differences were found.

This approach served to flesh out the SQL generation code and to deal with various feature tests, e.g., creation of a schema, creation of a table, adding columns to a table, and even RENAMEing schemas and tables. However, it couldn’t easily handle FOREIGN KEYs and SEQUENCEs owned by tables (resulting from a SERIAL data type).

Internal Structures – Take 2

To tackle the various dependencies, I refactored the code to use structures somewhat along the lines of Post Facto. The Database class now holds objects derived from class DbObjectDict which corresponds roughly to a single PostgreSQL catalog. Thus, SchemaDict is nearly equivalent to pg_namespace, ClassDict to pg_class and ConstraintDict to pg_constraint. Each one of those classes is also a Python dict and holds specialized objects, e.g., ClassDict holds objects (derived from DbClass) of class Sequence, Table or View (for the time being, class Index is handled separately).

Each XxxDict class is hashed by a Python tuple (an immutable sequence), paralleling the underlying catalog UNIQUE keys. After the dicts are populated, either from the catalogs or from the YAML map, methods are called to link related objects, e.g., columns to tables, foreign keys to the primary key tables, etc.

Note that Pyrseas implements dual sets of classes: collections (dicts) of objects and classes for individual object types. In constrast Post Facto has a single set of objects and uses a Python classmethod to fetch the catalog information. I found the dual sets more convenient to deal with certain implementation issues.

In order to generate the SQL in the correct sequence considering inter-object dependencies, Post Facto does a topological sort of its objects. Influenced by pg_dump, I preferred a lazier approach.  The diff_map methods of Database and lower level classes generate the SQL in a stratified manner with deferred actions.

For example, since one cannot drop tables or primary keys if foreign keys still refer to the former, ClassDict‘s diff_map, upon finding a table has been dropped (it doesn’t exist in the input YAML map), only marks it for dropping. Then it has a pass to drop foreign keys and views, followed by another pass to drop other constraints and indexes and finally the tables themselves.

As a second illustration, I recently added support for inherited tables (and by extension, partitioned tables). Since PostgreSQL allows for multiple inheritance, creating (and dropping) tables in an inheritance “network” requires special care. My solution: the first pass stacks children tables and only creates “root” tables, while the second pass iterates over the stack to create the rest in the right order.

I’d love to hear feedback on this or from anyone who has experimented with the Pyrseas utilities.

SQL Version Control Implementation Choices I

The Andromeda “data dictionary” is organized as multiple hierarchies, e.g., modules, generic columns, and tables with columns. Here is a very simple example:

module movies:
    description: Movies

column id:
    type_id: int

column title:
    type_id: vchar
    colprec: 32

column release_year:
    type_id: int

table film:
    module: movies
    description: Film information
    column film_id:
        primary_key: Y
    column title:
    column release_year:

This has to be created and maintained manually which may be OK if you’re starting off on a new database project, but not so cool if you have an existing database with a hundred tables (or hundreds of thousands of tables–see page 6).

Thus for Pyrseas I decided to create first a tool (dbtoyaml) to output an existing database schema in YAML format. Structurally, a single hierarchy, e.g., database → schemas → tables (somewhat like pgAdmin’s left panel), appeared preferable to Andromeda’s design. After a couple of iterations, the Pyrseas YAML specification ended as follows:

schema public:
  table film:
    columns:
    - id:
        not_null: true
        type: integer
    - title:
        type: character varying(32)
    - release_year:
        type: integer
    description: Film information
    primary_key:
      film_pkey:
        columns:
        - id

The above is the output from PyYAML’s dump function with default_flow_style=False. Internally, the structure is a Python dict or map which looks just like JSON, e.g.,

{
  'schema public':
  {
    'table film':
    {
      'primary_key':
      {
        'film_pkey':
        {
          'columns': ['id']
        }
      },
      'columns':
      [
        {
          'id':
          {
            'not_null': True,
            'type': 'integer'
          }
        },
        {
          'title':
          {
            'type': 'character varying(32)'
          }
        },
        {
          'release_year':
          {
            'type': 'integer'
          }
        }
      ],
      'description':
        'Film information'
    }
  }
}

PyYAML’s load function can read the YAML file and turn it into the Python dict, so the second tool (yamltodb) can deal with the structure above directly.

The most important implementation decision for yamltodb was how to compare the input map to the map created from the current database’s catalogs. A few options were possible:

  1. SQL-based comparisons
  2. Map comparisons
  3. Internal structure comparisons

SQL Comparisons

This is the approach taken by Andromeda. Its essence is described in step 4 of Dictionary Based Database Upgrades. It consists of storing the input specification into a set of tables and the existing catalogs into a parallel set (potentially one could use the information_schema views), and then issuing SQL queries to compare the two sets.

The presumed advantage of this method is that some of the operations can be done –as SQL set operations– in the DBMS. However, the input spec first has to be inserted into the tables, a row-at-a-time, thus negating the former benefit. In addition, the affected database has to carry at least one set of extraneous catalog-like tables.

Map Comparisons

This is the naive approach consisting of “walking the trees,” i.e., comparing the input map to an internal map created from the current database. This works, up to a point. For example, if a table in the input map is not present in the internal map, it’s easy enough to call a function to generate a CREATE TABLE statement.

However, as most readers know, a PostgreSQL database usually has many dependencies between the various system entities, e.g., foreign keys on primary keys, views on tables, etc. So comparing hierarchies isn’t ideal.

Internal Comparisons

In a forthcoming post, I’ll explore this method as well as other issues such as generating the SQL statements in the correct order.