2

The Future of Pyrseas: Part 1

In the early days of the Pyrseas project, I read about some open source projects being abandoned due to their developers losing interest or perhaps becoming involved elsewhere and thought to myself “That could never happen to me.”

Sadly, this blog has remained silent for over two years, and I haven’t done substantive development work since last September.

Still, some remain interested in Pyrseas, if gauged by the latest two issues: inquiring about consolidating/updating our documentation and porting dbtoyaml/yamltodb to other databases. So it’s appropriate that we discuss in what directions the project may turn.

Some background first.

Pyrseas: Origins

Pyrseas was born from my experience with making structural changes to SQL databases, version control systems, the typical incompatibility between the two and the solution suggested by Andromeda.

Andromeda had a fairly simple concept: describe your database using YAML and use that to drive the process of generating SQL to make structural modifications (also, use the YAML spec to produce a basic CRUD app to make content changes—see Part 3).

The Pyrseas innovation was: rather than manually edit a YAML spec, why not create it from the Postgres catalogs? In addition, instead of using the information_schema catalog views so that the process could be nominally portable to a few other SQL databases, we opted to query the internal catalogs directly.

The Imaginarium of Peter Eisentraut

In response to a post regarding another tool that diff’d database versions, Core Team member Peter Eisentraut commented that something that “doesn’t support all PostgreSQL features” is not helpful “to help you manage your database upgrades, because it might miss important details.”

That made us reconsider the scope of Pyrseas which initially was to be limited to tables, views and perhaps functions. We decided to address the vast array of Postgres features and some releases later we managed to achieve that goal, for the most part.

A post about the proper place of business logic then led to a collaboration with Roger Hunwicks to augment the Pyrseas tools. Another discussion with Josep Martínez resulted in a preliminary capability to copy predefined data into the database.

Lilo & Sqitch (or Is diffing database versions sufficient?)

Although my Pyrseas development activity has been limited recently, I’ve continued to use Pyrseas to assist in my DBA and database development tasks. I’ve thus come to the conclusion that: no, in the most general case, diffing database versions is not sufficient to manage structural changes.

A simple example based on my original proof-of-concept schema can serve to validate this assertion. Suppose we wanted to modify the release_year column to release_date and use a DATE datetype. The RENAME may be made to work but the datatype change will require some specialized handling (e.g., an external script) that cannot be codified in a generalized “difference” approach.

When I first reviewed the other SQL schema versioning tools, Liquibase was the main exponent of the non-differencing camp (and it still seems to be going strong). About a year after that, David Wheeler came out with the first version of Sqitch and thanks to David I was able to experiment with it.

My main objection to Sqitch and Liquibase is that for the most common use cases, e.g., add a table, add a column, rewrite a view or function, etc., it seems unproductive for a DBA or developer to do the work more than once, i.e., in a development database, either via psql, using a design tool such as PgAdmin or, in some simple cases, even by editing the YAML spec directly. Recreating the SQL DDL and applying it –in the correct order– to another development, test or production database should be automated. The generated SQL should take into consideration the state of the target and, if it becomes necessary, should include generating SQL to backout changes, e.g., drop table, alter table drop column, etc., without any extra DBA or developer intervention.

Conclusion

My proposal to address the insufficiency of diffing database versions is to incorporate some of the ideas of the non-differencing approaches into yamltodb. The solution may be similar to what was done for dbaugment, i.e., add specialized scripts or configuration files that can control the additional processing. Admittedly, this is still very vague and will probably be third in terms of priorities, although I chose to discuss it first.

1

Pyrseas/PostgreSQL Feature Matrix

In my last post, I wrote—referring to the state of Pyrseas after version 0.5 is released:

The only gaps left are TABLESPACE, GROUP/ROLE and the EXTENSIONs added in PG 9.1.

I’m afraid I should’ve double checked the list of 9.1 SQL CREATE statements. I missed COLLATIONs. I’ve created a new page, Feature Matrix, that shows the correct picture, which will be updated as subsequent releases are made.

4

More Database Tools?

It’s been over year since I started blogging on these pages about Pyrseas and version control. In a month it will also be the first anniversary of the initial commit to GitHub. Much code and many words have flown under these “bridges,” so this seems an appropriate time to reflect.

When I discovered Andromeda, I was looking for a framework to do simple (CRUD-type) database updates

  • with more flexibility (read, programability) than pgAdmin or phpPgAdmin
  • without being tied to an object-relational mapper, either built-in as in Django or external as SQLAlchemy (Pylons/Pyramid)
  • without having to write repetitive code, either SQL or ORM.

Andromeda appeared to satisfy these objectives (although I wasn’t thrilled about having to customize it in PHP).

When I conceived dbtoyaml, I was being lazy: reacting to Andromeda’s requirement to handcraft a YAML description of a database before I could use it to manage SQL changes to it. I thought: why not create the YAML from the database catalogs?

Since my concept for a YAML database specification didn’t match well to Andromeda’s, that led to yamltodb, my attempt to recreate the SQL “diff’ing” features of Andromeda in Python. Andromeda did it using the information_schema catalogs, which made it portable to other DBMSs that had those. Andromeda also did the comparisons by issuing SQL queries (which didn’t perform well). I chose to use the pg_catalog tables and did the comparisons directly on Python structures.

At first, I had intended to only diff schemas and tables and not much more, since that sufficed for my purposes. However, Peter Eisentraut’s comment eventually convinced me that Pyrseas had to support ALL PostgreSQL DDL features1. I’m very pleased with what was accomplished. Pyrseas 0.5, to be released shortly2, will add support for TEXTSEARCH and FOREIGN DATA WRAPPER related objects. The only gaps left are TABLESPACE, GROUP/ROLE and the EXTENSIONs added in PG 9.1.

2012 brought another turn of events. My post on the controversy between Chris Travers and Tony Marston on whether business logic ought to reside in the database led to collaboration with Roger Hunwicks to create dbextend, a tool to automate database augmentation. A first submission was made and work continues on that front.

The latter effort raises other possibilities. For example, since yamltodb already knows how to create nearly all PG objects, it would be trivial to create a schemadump utility (equivalent to pg_dump -s). Another potential tool of interest to PostgreSQL advocates: dbtoyaml for other databases (mytoyaml, oratoyaml anyone?) together with a conversion utility that operates on the YAML specification so it can be accepted by the PG-only yamltodb (the YAML converter seems should be easier than editing SQL statements). The YAML/JSON output from dbtoyaml is amenable to other analysis or automation tasks.

I hope to get back to my database user interface “dream” … one of these days, but in the meantime, I’m glad for having taken these detours. I’d like to thank those who helped along the way: Josh Berkus, Robert Brewer, Adam Cornett, Ronan Dunklau, Peter Eisentraut, David Fetter, Dickson Guedes, Matthias Howell, Roger Hunwicks, Toon Koppelaars, Marko Kreen, Fabrízio Mello, Regina Obe, Filip Rembialkowski, Dariusz Suchojad, Daniele Varrazzo, Evgeni Vasilev, David Wheeler and others I may have missed.


1 Actually, Josh Berkus was the first one who mentioned (in a private email) that I ought to support all PG objects.
2 And just in time for PyCon, I’m happy to announce that it will support Python 3.

6

Automated Database Augmentation

Suppose you have a PostgreSQL database like the Pagila sample with 14 tables, each with a last_update timestamp column to record the date and time each row was modified, and it is now a requirement to capture which user effected each change. Or perhaps you have several tables without such audit trail columns and need to add them quickly. Or maybe you have decided to denormalize your design by adding a calculated column, e.g., extended price = unit price times quantity ordered, or a derived column, e.g., carrying the customer name in the invoice table.

If you have some experience as a DBA, the word “drudgery” may have come to mind at the prospect of implementing the above features. It’s possible that, after a while, you’ve developed an approach for dealing with some of them but still wish there’d be some way to automate these thankless tasks.

You may have looked at the Andromeda project’s “automations” which provide some of these capabilities. However, in order to take advantage of the automations, you’ll first have to manually describe your database in a YAML format (and you’ll have to install Apache and PHP). Or you could have tried to use the follow-on project, Triangulum, but essentially you’d still have to create a YAML schema (no need for Apache, but you still need PHP).

Some relief is forthcoming. As a result of discussions resulting from my Business Logic in the Database post, I have been collaborating with Roger Hunwicks on a potential solution to these common DBA needs. The new Pyrseas tool is tentatively named dbextend1 and its initial documentation is available in the Pyrseas extender branch. This is how I envision dbextend being used.

Consider the opening example. The DBA would create a simple YAML file such as the (abbreviated) one below, listing the tables and the needed features:

schema public:
  table actor:
    audit_columns: default
  table category:
    audit_columns: default
...
  table store:
    audit_columns: default

The DBA would then use this file, say audext.yaml, as input to dbextend, e.g.,

dbextend pagiladb audext.yaml

dbextend reads the PostgreSQL catalogs (using code shared with dbtoyaml and yamltodb), building its internal representation. It also reads the YAML extensions file and builds a parallel (albeit much smaller) structure. Thirdly, it reads extension configuration information, e.g., a definition of what columns need to be added for “audit_columns: default“, for example, modified_timestamp and modified_by_user, what trigger(s) to add, and what function(s) to be created.

The output of dbextend is a YAML schema file, just like the one output by dbtoyaml, which can be piped directly to yamltodb to generate SQL to implement the desired features.

In case you’re wondering, dbextend —like other Pyrseas tools— will require Python, psycopg2 and pyyaml.

What features would you like to see automated? What are your suggested best practices for automating these common needs?


Picture credit: Thanks to Mr. O’Brien, a fourth-grade teacher in Minnesota.

1 We’re still receptive to some other suitable name.

2

Interesting database tool: Adminer

No, it’s not a tool to extract extra revenue from Google Ads. Adminer (which I guess is pronounced admin-er, rather than ad-minor) is a nifty little tool that competes with phpMyAdmin, the web-browser-based MySQL administration tool. What is interesting from my perspective is that it’s been so successful that it now also supports PostgreSQL –as well as SQLite, Oracle and SQL Server (so it also competes with phpPgAdmin).

Nifty, in the sense of “quick, agile,” is an apt description for Adminer: adminer-3.1.0.php weighs in at only 292kB and that single file is all you have to download and install. If you prefer an English-only version it’s just 188kB (there are also MySQL-only versions that are smaller yet).  It installs quickly, provided you have Apache and PHP already installed, and its responsiveness is admirable.

But perhaps what I found most impressive was its ability to display the database schema.  Here is Adminer showing the schema for the “movies” database I’ve been using in recent posts:

Its MySQL roots are visible in some places, e.g., in the table display below, the SERIAL column language_id is shown as Auto increment:

Overall, a very responsive, easy-to-use tool for administering a database.

Hat tip: Selena Marie.