Automated Database Augmentation

Suppose you have a PostgreSQL database like the Pagila sample with 14 tables, each with a last_update timestamp column to record the date and time each row was modified, and it is now a requirement to capture which user effected each change. Or perhaps you have several tables without such audit trail columns and need to add them quickly. Or maybe you have decided to denormalize your design by adding a calculated column, e.g., extended price = unit price times quantity ordered, or a derived column, e.g., carrying the customer name in the invoice table.

If you have some experience as a DBA, the word “drudgery” may have come to mind at the prospect of implementing the above features. It’s possible that, after a while, you’ve developed an approach for dealing with some of them but still wish there’d be some way to automate these thankless tasks.

You may have looked at the Andromeda project’s “automations” which provide some of these capabilities. However, in order to take advantage of the automations, you’ll first have to manually describe your database in a YAML format (and you’ll have to install Apache and PHP). Or you could have tried to use the follow-on project, Triangulum, but essentially you’d still have to create a YAML schema (no need for Apache, but you still need PHP).

Some relief is forthcoming. As a result of discussions resulting from my Business Logic in the Database post, I have been collaborating with Roger Hunwicks on a potential solution to these common DBA needs. The new Pyrseas tool is tentatively named dbextend1 and its initial documentation is available in the Pyrseas extender branch. This is how I envision dbextend being used.

Consider the opening example. The DBA would create a simple YAML file such as the (abbreviated) one below, listing the tables and the needed features:

schema public:
  table actor:
    audit_columns: default
  table category:
    audit_columns: default
...
  table store:
    audit_columns: default

The DBA would then use this file, say audext.yaml, as input to dbextend, e.g.,

dbextend pagiladb audext.yaml

dbextend reads the PostgreSQL catalogs (using code shared with dbtoyaml and yamltodb), building its internal representation. It also reads the YAML extensions file and builds a parallel (albeit much smaller) structure. Thirdly, it reads extension configuration information, e.g., a definition of what columns need to be added for “audit_columns: default“, for example, modified_timestamp and modified_by_user, what trigger(s) to add, and what function(s) to be created.

The output of dbextend is a YAML schema file, just like the one output by dbtoyaml, which can be piped directly to yamltodb to generate SQL to implement the desired features.

In case you’re wondering, dbextend —like other Pyrseas tools— will require Python, psycopg2 and pyyaml.

What features would you like to see automated? What are your suggested best practices for automating these common needs?


Picture credit: Thanks to Mr. O’Brien, a fourth-grade teacher in Minnesota.

1 We’re still receptive to some other suitable name.

About these ads

6 thoughts on “Automated Database Augmentation”

  1. My Approach, and its never failed me.

    Create a Table called Auditlog with the following columns
    eventid – Unique ID for every event
    timestamp – the timestamp the action was recorded
    recorduid – the TABLEs Primary KEY (I always uses unique integers)
    userid – Assuming youhave a users TABLE
    actionname – INSERT, UPDATE OR DELETE
    tablename – The name of the TABLE the event happened to
    fieldname – The column that was affected
    value – The value that was saved

    I then wrote a function that you give a tablename to and it dynamically creates triggers and trigger functions for INSERT UPDATE and DELETE for the given table. When you insert, update or delete the above table gets populated. It does slow down the performance when mass inserting / updating / deleteing but by a very acceptable amount. It serves me well.

    All you do then is use your database as normal.

    I also have a set of functions that will display the record history from Auditlog table so I can then to step through a records changes one at a time.

    BTW currently I have an 6Gb database and the above table contains

    count
    ———-
    20510439
    (1 row)

    20 Million not bad and performance is great.

    1. Hi Mike,

      I’m afraid that Alembic is an entirely different animal. The existing Pyrseas tools, dbtoyaml and yamltodb, are somewhat similar in purpose to Alembic, i.e., they assist with “migrating” a database from one version to another, although I prefer to refer to that as SQL or schema version control, and they are PostgreSQL-specific (and proud of it) because PostgreSQL has several features that other tools/DBMSs simply don’t provide (see the Schema VC page above).

      This new tool, dbextend, is intended to assist with implementing common database “patterns,” e.g., denormalizing tables, adding audit trail columns, creating and maintaining history tables, and similar features described in the Andromeda Automations link above.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s