Suppose you have a PostgreSQL database like the Pagila sample with 14 tables, each with a last_update timestamp column to record the date and time each row was modified, and it is now a requirement to capture which user effected each change. Or perhaps you have several tables without such audit trail columns and need to add them quickly. Or maybe you have decided to denormalize your design by adding a calculated column, e.g., extended price = unit price times quantity ordered, or a derived column, e.g., carrying the customer name in the invoice table.
If you have some experience as a DBA, the word “drudgery” may have come to mind at the prospect of implementing the above features. It’s possible that, after a while, you’ve developed an approach for dealing with some of them but still wish there’d be some way to automate these thankless tasks.
You may have looked at the Andromeda project’s “automations” which provide some of these capabilities. However, in order to take advantage of the automations, you’ll first have to manually describe your database in a YAML format (and you’ll have to install Apache and PHP). Or you could have tried to use the follow-on project, Triangulum, but essentially you’d still have to create a YAML schema (no need for Apache, but you still need PHP).
Some relief is forthcoming. As a result of discussions resulting from my Business Logic in the Database post, I have been collaborating with Roger Hunwicks on a potential solution to these common DBA needs. The new Pyrseas tool is tentatively named dbextend1 and its initial documentation is available in the Pyrseas extender branch. This is how I envision dbextend being used.
Consider the opening example. The DBA would create a simple YAML file such as the (abbreviated) one below, listing the tables and the needed features:
schema public: table actor: audit_columns: default table category: audit_columns: default ... table store: audit_columns: default
The DBA would then use this file, say audext.yaml, as input to dbextend, e.g.,
dbextend pagiladb audext.yaml
dbextend reads the PostgreSQL catalogs (using code shared with dbtoyaml and yamltodb), building its internal representation. It also reads the YAML extensions file and builds a parallel (albeit much smaller) structure. Thirdly, it reads extension configuration information, e.g., a definition of what columns need to be added for “audit_columns: default“, for example, modified_timestamp and modified_by_user, what trigger(s) to add, and what function(s) to be created.
The output of dbextend is a YAML schema file, just like the one output by dbtoyaml, which can be piped directly to yamltodb to generate SQL to implement the desired features.
In case you’re wondering, dbextend —like other Pyrseas tools— will require Python, psycopg2 and pyyaml.
What features would you like to see automated? What are your suggested best practices for automating these common needs?
Picture credit: Thanks to Mr. O’Brien, a fourth-grade teacher in Minnesota.
1 We’re still receptive to some other suitable name.
My Approach, and its never failed me.
Create a Table called Auditlog with the following columns
eventid – Unique ID for every event
timestamp – the timestamp the action was recorded
recorduid – the TABLEs Primary KEY (I always uses unique integers)
userid – Assuming youhave a users TABLE
actionname – INSERT, UPDATE OR DELETE
tablename – The name of the TABLE the event happened to
fieldname – The column that was affected
value – The value that was saved
I then wrote a function that you give a tablename to and it dynamically creates triggers and trigger functions for INSERT UPDATE and DELETE for the given table. When you insert, update or delete the above table gets populated. It does slow down the performance when mass inserting / updating / deleteing but by a very acceptable amount. It serves me well.
All you do then is use your database as normal.
I also have a set of functions that will display the record history from Auditlog table so I can then to step through a records changes one at a time.
BTW currently I have an 6Gb database and the above table contains
count
———-
20510439
(1 row)
20 Million not bad and performance is great.
Hi Mike,
What you’re describing is more or less a standard audit log, which is usually part of a facilty provided by the DBMS itself since it is typically a requirement to audit both successful and unsuccessful attempts, including SELECT operations and logins/logouts. Greg Sabino Mullane recently described a Perl-based approach to implementing this, but ideally IMHO this would be incorporated into PostgreSQL.
This seems to be exactly what [alembic autogenerations](http://alembic.readthedocs.org/en/latest/tutorial.html#auto-generating-migrations) does right now. The functionality is pretty portable too (but oh, needs that dreaded “sqlalchemy” thing installed ! )
Hi Mike,
I’m afraid that Alembic is an entirely different animal. The existing Pyrseas tools, dbtoyaml and yamltodb, are somewhat similar in purpose to Alembic, i.e., they assist with “migrating” a database from one version to another, although I prefer to refer to that as SQL or schema version control, and they are PostgreSQL-specific (and proud of it) because PostgreSQL has several features that other tools/DBMSs simply don’t provide (see the Schema VC page above).
This new tool, dbextend, is intended to assist with implementing common database “patterns,” e.g., denormalizing tables, adding audit trail columns, creating and maintaining history tables, and similar features described in the Andromeda Automations link above.
Pingback: Joe Abbate: Automated Database Augmentation | PostgreSQL | Syngu
Pingback: More Database Tools? | Taming Serpents and Pachyderms