Automated Database Augmentation

Suppose you have a PostgreSQL database like the Pagila sample with 14 tables, each with a last_update timestamp column to record the date and time each row was modified, and it is now a requirement to capture which user effected each change. Or perhaps you have several tables without such audit trail columns and need to add them quickly. Or maybe you have decided to denormalize your design by adding a calculated column, e.g., extended price = unit price times quantity ordered, or a derived column, e.g., carrying the customer name in the invoice table.

If you have some experience as a DBA, the word “drudgery” may have come to mind at the prospect of implementing the above features. It’s possible that, after a while, you’ve developed an approach for dealing with some of them but still wish there’d be some way to automate these thankless tasks.

You may have looked at the Andromeda project’s “automations” which provide some of these capabilities. However, in order to take advantage of the automations, you’ll first have to manually describe your database in a YAML format (and you’ll have to install Apache and PHP). Or you could have tried to use the follow-on project, Triangulum, but essentially you’d still have to create a YAML schema (no need for Apache, but you still need PHP).

Some relief is forthcoming. As a result of discussions resulting from my Business Logic in the Database post, I have been collaborating with Roger Hunwicks on a potential solution to these common DBA needs. The new Pyrseas tool is tentatively named dbextend1 and its initial documentation is available in the Pyrseas extender branch. This is how I envision dbextend being used.

Consider the opening example. The DBA would create a simple YAML file such as the (abbreviated) one below, listing the tables and the needed features:

schema public:
  table actor:
    audit_columns: default
  table category:
    audit_columns: default
...
  table store:
    audit_columns: default

The DBA would then use this file, say audext.yaml, as input to dbextend, e.g.,

dbextend pagiladb audext.yaml

dbextend reads the PostgreSQL catalogs (using code shared with dbtoyaml and yamltodb), building its internal representation. It also reads the YAML extensions file and builds a parallel (albeit much smaller) structure. Thirdly, it reads extension configuration information, e.g., a definition of what columns need to be added for “audit_columns: default“, for example, modified_timestamp and modified_by_user, what trigger(s) to add, and what function(s) to be created.

The output of dbextend is a YAML schema file, just like the one output by dbtoyaml, which can be piped directly to yamltodb to generate SQL to implement the desired features.

In case you’re wondering, dbextend —like other Pyrseas tools— will require Python, psycopg2 and pyyaml.

What features would you like to see automated? What are your suggested best practices for automating these common needs?


Picture credit: Thanks to Mr. O’Brien, a fourth-grade teacher in Minnesota.

1 We’re still receptive to some other suitable name.