The Future of Pyrseas: Part 3

The second Andromeda feature, aside from schema version control, that Pyrseas wanted to rescue was the automatic generation of search and detailed editing screens.

It is somewhat amazing that 40 years after Moshé Zloof’s “Query by Example” and over 30 years after “A Forms Application Development System” by Prof. Larry Rowe and Kurt Shoens, most applications for searching, entering and maintaining data are still reinventing the wheel.

There are of course apps such as PgAdmin and Adminer that can be used for general querying and editing, but their target audience is developers and DBAs, not end users. Furthermore, these tools typically only allow querying a single table at a time (see QBE queries #3 and subsequent, in the aforementioned paper, for examples that usually can’t be handled).

At first, I thought I would develop a typical Python WSGI application. I planned to use Werkzeug because the extra machinery in larger frameworks such as Django is unnecessary for the envisioned purpose. An additional consideration was that a smaller library or framework would make it easier for anyone wanting to extend the automatically generated pieces into a full-fledged app.

I started designing a prototype application and also exploring other WSGI contenders. Then Jacob Kaplan-Moss and Tarek Ziadé jolted my thought process: a JavaScript-based framework was the way forward for a web-based app with Python (and Werkzeug) relegated to a REST API server.

So I took a look at AngularJS, Ember.js, Backbone.js and several other JavaScript frameworks and libraries. At this time, it seems a simpler library such as Backbone.js, or perhaps the Marionette framework, may be more appropriate. It is comforting to see that PgAdmin4 has chosen a similar path for its web-deployed client.

This capability is ultimately quite ambitious, so perhaps Pyrseas ought to stick just to schema version control. Nevertheless, whether integrated with Pyrseas or not, this functionality ought to re-use the database-catalog-to-Python-structures-to-YAML (and vice versa) features developed for Pyrseas.

This concludes this three-part series. Please also see part 1 and part 2.

The Future of Pyrseas: Part 2

When I started working on Pyrseas, I reviewed several other products. Robert Brewer’s Post Facto was probably the one with the most unique design. Although it compared database schemas in order to generate SQL to synch them up, it did not store database object definitions in a standard VCS repository. Rather, it used a Postgres database as the repository analog.

While Post Facto’s design certainly influenced Pyrseas, there is one aspect of  the former that, unfortunately, I did not emulate.

The Dependables

As any developer knows, database objects have dependencies on each other: table A has a primary key PK1, table B is declared with a foreign key dependent on PK1, function C is dependent on type X, view D is based on table A and includes a call to function C.

Pyrseas currently deals with these dependencies in an object-specific manner. For example, it does at least two passes through pg_class objects (tables, views, sequences, etc.) in order to create, alter or drop these objects in the correct order. However, this ad hoc approach can result in incorrect sequencing of generated SQL statements in some cases, particularly those like view D above.

The missing feature from Post Facto that avoids this conundrum? If you answered topological sort you were obviously paying attention in your Algorithms class. If you didn’t, may I suggest chapter 15, “Devising and engineering an algorithm: Topological Sort” of Bertrand Meyer’s Touch of Class.

Daniele’s Quest

Over two years ago, someone opened an issue about the need to create primary keys before creating views. Later, Daniele Varrazzo reported another issue with dependencies.

Many of you Postgres users will recognize Daniele as the maintainer of Psycopg, the popular Python PG adapter, which of course is used by Pyrseas.  Daniele and I chatted online, I mentioned Post Facto’s solution and he, fortuitously and generously, started implementing a topological sort on a deptrack branch of Pyrseas.

We then collaborated for about eight months. He did most of the initial coding and I ran tests and fixed some issues. Unfortunately, Daniele is very busy, with a full-time job, Psycopg and other interests, so the work came to a near standstill.

Where We Stand

The last time changes were submitted to the deptrack branch, about six months ago, only four tests failed (out of over 600) running on both Python 2.7 and 3.4 against Postgres 9.3. Regrettably, three of those tests are integration and functional tests, so correcting those is critical to adding this feature.

In addition, although most tests complete successfully, the run times have been impacted severely. This will require an effort at re-optimizing performance before releasing the changes. Last but not least, the implementation needs internal documentation so that it can be properly maintained.

Sadly, I have not had much time or incentives to address these shortcomings. Are there any Pyrseas, Postgres or Python enthusiasts looking for a challenge?

The Future of Pyrseas: Part 1

In the early days of the Pyrseas project, I read about some open source projects being abandoned due to their developers losing interest or perhaps becoming involved elsewhere and thought to myself “That could never happen to me.”

Sadly, this blog has remained silent for over two years, and I haven’t done substantive development work since last September.

Still, some remain interested in Pyrseas, if gauged by the latest two issues: inquiring about consolidating/updating our documentation and porting dbtoyaml/yamltodb to other databases. So it’s appropriate that we discuss in what directions the project may turn.

Some background first.

Pyrseas: Origins

Pyrseas was born from my experience with making structural changes to SQL databases, version control systems, the typical incompatibility between the two and the solution suggested by Andromeda.

Andromeda had a fairly simple concept: describe your database using YAML and use that to drive the process of generating SQL to make structural modifications (also, use the YAML spec to produce a basic CRUD app to make content changes—see Part 3).

The Pyrseas innovation was: rather than manually edit a YAML spec, why not create it from the Postgres catalogs? In addition, instead of using the information_schema catalog views so that the process could be nominally portable to a few other SQL databases, we opted to query the internal catalogs directly.

The Imaginarium of Peter Eisentraut

In response to a post regarding another tool that diff’d database versions, Core Team member Peter Eisentraut commented that something that “doesn’t support all PostgreSQL features” is not helpful “to help you manage your database upgrades, because it might miss important details.”

That made us reconsider the scope of Pyrseas which initially was to be limited to tables, views and perhaps functions. We decided to address the vast array of Postgres features and some releases later we managed to achieve that goal, for the most part.

A post about the proper place of business logic then led to a collaboration with Roger Hunwicks to augment the Pyrseas tools. Another discussion with Josep Martínez resulted in a preliminary capability to copy predefined data into the database.

Lilo & Sqitch (or Is diffing database versions sufficient?)

Although my Pyrseas development activity has been limited recently, I’ve continued to use Pyrseas to assist in my DBA and database development tasks. I’ve thus come to the conclusion that: no, in the most general case, diffing database versions is not sufficient to manage structural changes.

A simple example based on my original proof-of-concept schema can serve to validate this assertion. Suppose we wanted to modify the release_year column to release_date and use a DATE datetype. The RENAME may be made to work but the datatype change will require some specialized handling (e.g., an external script) that cannot be codified in a generalized “difference” approach.

When I first reviewed the other SQL schema versioning tools, Liquibase was the main exponent of the non-differencing camp (and it still seems to be going strong). About a year after that, David Wheeler came out with the first version of Sqitch and thanks to David I was able to experiment with it.

My main objection to Sqitch and Liquibase is that for the most common use cases, e.g., add a table, add a column, rewrite a view or function, etc., it seems unproductive for a DBA or developer to do the work more than once, i.e., in a development database, either via psql, using a design tool such as PgAdmin or, in some simple cases, even by editing the YAML spec directly. Recreating the SQL DDL and applying it –in the correct order– to another development, test or production database should be automated. The generated SQL should take into consideration the state of the target and, if it becomes necessary, should include generating SQL to backout changes, e.g., drop table, alter table drop column, etc., without any extra DBA or developer intervention.

Conclusion

My proposal to address the insufficiency of diffing database versions is to incorporate some of the ideas of the non-differencing approaches into yamltodb. The solution may be similar to what was done for dbaugment, i.e., add specialized scripts or configuration files that can control the additional processing. Admittedly, this is still very vague and will probably be third in terms of priorities, although I chose to discuss it first.