Tag Archives: andromeda

The Future of Pyrseas: Part 3

The second Andromeda feature, aside from schema version control, that Pyrseas wanted to rescue was the automatic generation of search and detailed editing screens.

It is somewhat amazing that 40 years after Moshé Zloof’s “Query by Example” and over 30 years after “A Forms Application Development System” by Prof. Larry Rowe and Kurt Shoens, most applications for searching, entering and maintaining data are still reinventing the wheel.

There are of course apps such as PgAdmin and Adminer that can be used for general querying and editing, but their target audience is developers and DBAs, not end users. Furthermore, these tools typically only allow querying a single table at a time (see QBE queries #3 and subsequent, in the aforementioned paper, for examples that usually can’t be handled).

At first, I thought I would develop a typical Python WSGI application. I planned to use Werkzeug because the extra machinery in larger frameworks such as Django is unnecessary for the envisioned purpose. An additional consideration was that a smaller library or framework would make it easier for anyone wanting to extend the automatically generated pieces into a full-fledged app.

I started designing a prototype application and also exploring other WSGI contenders. Then Jacob Kaplan-Moss and Tarek Ziadé jolted my thought process: a JavaScript-based framework was the way forward for a web-based app with Python (and Werkzeug) relegated to a REST API server.

So I took a look at AngularJS, Ember.js, Backbone.js and several other JavaScript frameworks and libraries. At this time, it seems a simpler library such as Backbone.js, or perhaps the Marionette framework, may be more appropriate. It is comforting to see that PgAdmin4 has chosen a similar path for its web-deployed client.

This capability is ultimately quite ambitious, so perhaps Pyrseas ought to stick just to schema version control. Nevertheless, whether integrated with Pyrseas or not, this functionality ought to re-use the database-catalog-to-Python-structures-to-YAML (and vice versa) features developed for Pyrseas.

This concludes this three-part series. Please also see part 1 and part 2.

The Future of Pyrseas: Part 1

In the early days of the Pyrseas project, I read about some open source projects being abandoned due to their developers losing interest or perhaps becoming involved elsewhere and thought to myself “That could never happen to me.”

Sadly, this blog has remained silent for over two years, and I haven’t done substantive development work since last September.

Still, some remain interested in Pyrseas, if gauged by the latest two issues: inquiring about consolidating/updating our documentation and porting dbtoyaml/yamltodb to other databases. So it’s appropriate that we discuss in what directions the project may turn.

Some background first.

Pyrseas: Origins

Pyrseas was born from my experience with making structural changes to SQL databases, version control systems, the typical incompatibility between the two and the solution suggested by Andromeda.

Andromeda had a fairly simple concept: describe your database using YAML and use that to drive the process of generating SQL to make structural modifications (also, use the YAML spec to produce a basic CRUD app to make content changes—see Part 3).

The Pyrseas innovation was: rather than manually edit a YAML spec, why not create it from the Postgres catalogs? In addition, instead of using the information_schema catalog views so that the process could be nominally portable to a few other SQL databases, we opted to query the internal catalogs directly.

The Imaginarium of Peter Eisentraut

In response to a post regarding another tool that diff’d database versions, Core Team member Peter Eisentraut commented that something that “doesn’t support all PostgreSQL features” is not helpful “to help you manage your database upgrades, because it might miss important details.”

That made us reconsider the scope of Pyrseas which initially was to be limited to tables, views and perhaps functions. We decided to address the vast array of Postgres features and some releases later we managed to achieve that goal, for the most part.

A post about the proper place of business logic then led to a collaboration with Roger Hunwicks to augment the Pyrseas tools. Another discussion with Josep Martínez resulted in a preliminary capability to copy predefined data into the database.

Lilo & Sqitch (or Is diffing database versions sufficient?)

Although my Pyrseas development activity has been limited recently, I’ve continued to use Pyrseas to assist in my DBA and database development tasks. I’ve thus come to the conclusion that: no, in the most general case, diffing database versions is not sufficient to manage structural changes.

A simple example based on my original proof-of-concept schema can serve to validate this assertion. Suppose we wanted to modify the release_year column to release_date and use a DATE datetype. The RENAME may be made to work but the datatype change will require some specialized handling (e.g., an external script) that cannot be codified in a generalized “difference” approach.

When I first reviewed the other SQL schema versioning tools, Liquibase was the main exponent of the non-differencing camp (and it still seems to be going strong). About a year after that, David Wheeler came out with the first version of Sqitch and thanks to David I was able to experiment with it.

My main objection to Sqitch and Liquibase is that for the most common use cases, e.g., add a table, add a column, rewrite a view or function, etc., it seems unproductive for a DBA or developer to do the work more than once, i.e., in a development database, either via psql, using a design tool such as PgAdmin or, in some simple cases, even by editing the YAML spec directly. Recreating the SQL DDL and applying it –in the correct order– to another development, test or production database should be automated. The generated SQL should take into consideration the state of the target and, if it becomes necessary, should include generating SQL to backout changes, e.g., drop table, alter table drop column, etc., without any extra DBA or developer intervention.

Conclusion

My proposal to address the insufficiency of diffing database versions is to incorporate some of the ideas of the non-differencing approaches into yamltodb. The solution may be similar to what was done for dbaugment, i.e., add specialized scripts or configuration files that can control the additional processing. Admittedly, this is still very vague and will probably be third in terms of priorities, although I chose to discuss it first.

More Database Tools?

It’s been over year since I started blogging on these pages about Pyrseas and version control. In a month it will also be the first anniversary of the initial commit to GitHub. Much code and many words have flown under these “bridges,” so this seems an appropriate time to reflect.

When I discovered Andromeda, I was looking for a framework to do simple (CRUD-type) database updates

  • with more flexibility (read, programability) than pgAdmin or phpPgAdmin
  • without being tied to an object-relational mapper, either built-in as in Django or external as SQLAlchemy (Pylons/Pyramid)
  • without having to write repetitive code, either SQL or ORM.

Andromeda appeared to satisfy these objectives (although I wasn’t thrilled about having to customize it in PHP).

When I conceived dbtoyaml, I was being lazy: reacting to Andromeda’s requirement to handcraft a YAML description of a database before I could use it to manage SQL changes to it. I thought: why not create the YAML from the database catalogs?

Since my concept for a YAML database specification didn’t match well to Andromeda’s, that led to yamltodb, my attempt to recreate the SQL “diff’ing” features of Andromeda in Python. Andromeda did it using the information_schema catalogs, which made it portable to other DBMSs that had those. Andromeda also did the comparisons by issuing SQL queries (which didn’t perform well). I chose to use the pg_catalog tables and did the comparisons directly on Python structures.

At first, I had intended to only diff schemas and tables and not much more, since that sufficed for my purposes. However, Peter Eisentraut’s comment eventually convinced me that Pyrseas had to support ALL PostgreSQL DDL features1. I’m very pleased with what was accomplished. Pyrseas 0.5, to be released shortly2, will add support for TEXTSEARCH and FOREIGN DATA WRAPPER related objects. The only gaps left are TABLESPACE, GROUP/ROLE and the EXTENSIONs added in PG 9.1.

2012 brought another turn of events. My post on the controversy between Chris Travers and Tony Marston on whether business logic ought to reside in the database led to collaboration with Roger Hunwicks to create dbextend, a tool to automate database augmentation. A first submission was made and work continues on that front.

The latter effort raises other possibilities. For example, since yamltodb already knows how to create nearly all PG objects, it would be trivial to create a schemadump utility (equivalent to pg_dump -s). Another potential tool of interest to PostgreSQL advocates: dbtoyaml for other databases (mytoyaml, oratoyaml anyone?) together with a conversion utility that operates on the YAML specification so it can be accepted by the PG-only yamltodb (the YAML converter seems should be easier than editing SQL statements). The YAML/JSON output from dbtoyaml is amenable to other analysis or automation tasks.

I hope to get back to my database user interface “dream” … one of these days, but in the meantime, I’m glad for having taken these detours. I’d like to thank those who helped along the way: Josh Berkus, Robert Brewer, Adam Cornett, Ronan Dunklau, Peter Eisentraut, David Fetter, Dickson Guedes, Matthias Howell, Roger Hunwicks, Toon Koppelaars, Marko Kreen, Fabrízio Mello, Regina Obe, Filip Rembialkowski, Dariusz Suchojad, Daniele Varrazzo, Evgeni Vasilev, David Wheeler and others I may have missed.


1 Actually, Josh Berkus was the first one who mentioned (in a private email) that I ought to support all PG objects.
2 And just in time for PyCon, I’m happy to announce that it will support Python 3.

Automated Database Augmentation

Suppose you have a PostgreSQL database like the Pagila sample with 14 tables, each with a last_update timestamp column to record the date and time each row was modified, and it is now a requirement to capture which user effected each change. Or perhaps you have several tables without such audit trail columns and need to add them quickly. Or maybe you have decided to denormalize your design by adding a calculated column, e.g., extended price = unit price times quantity ordered, or a derived column, e.g., carrying the customer name in the invoice table.

If you have some experience as a DBA, the word “drudgery” may have come to mind at the prospect of implementing the above features. It’s possible that, after a while, you’ve developed an approach for dealing with some of them but still wish there’d be some way to automate these thankless tasks.

You may have looked at the Andromeda project’s “automations” which provide some of these capabilities. However, in order to take advantage of the automations, you’ll first have to manually describe your database in a YAML format (and you’ll have to install Apache and PHP). Or you could have tried to use the follow-on project, Triangulum, but essentially you’d still have to create a YAML schema (no need for Apache, but you still need PHP).

Some relief is forthcoming. As a result of discussions resulting from my Business Logic in the Database post, I have been collaborating with Roger Hunwicks on a potential solution to these common DBA needs. The new Pyrseas tool is tentatively named dbextend1 and its initial documentation is available in the Pyrseas extender branch. This is how I envision dbextend being used.

Consider the opening example. The DBA would create a simple YAML file such as the (abbreviated) one below, listing the tables and the needed features:

schema public:
  table actor:
    audit_columns: default
  table category:
    audit_columns: default
...
  table store:
    audit_columns: default

The DBA would then use this file, say audext.yaml, as input to dbextend, e.g.,

dbextend pagiladb audext.yaml

dbextend reads the PostgreSQL catalogs (using code shared with dbtoyaml and yamltodb), building its internal representation. It also reads the YAML extensions file and builds a parallel (albeit much smaller) structure. Thirdly, it reads extension configuration information, e.g., a definition of what columns need to be added for “audit_columns: default“, for example, modified_timestamp and modified_by_user, what trigger(s) to add, and what function(s) to be created.

The output of dbextend is a YAML schema file, just like the one output by dbtoyaml, which can be piped directly to yamltodb to generate SQL to implement the desired features.

In case you’re wondering, dbextend —like other Pyrseas tools— will require Python, psycopg2 and pyyaml.

What features would you like to see automated? What are your suggested best practices for automating these common needs?


Picture credit: Thanks to Mr. O’Brien, a fourth-grade teacher in Minnesota.

1 We’re still receptive to some other suitable name.

Quo vadis, Pyrseas?

When I found Andromeda, over a year ago, I was searching for a tool that would help me maintain PostgreSQL tables supporting web sites, somewhat like a Django admin app, but without Django itself since that was not part of the stack. Ideally, this tool would also facilitate table redesign. I had already written a maintenance app, using CherryPy, Mako and SQLAlchemy, but it was cumbersome to add or make changes to it, and database version control was limited to storing the latest CREATE TABLE statements in a Subversion repository.

Andromeda looked attractive because of its version control features and the capability of automatically generating admin apps. However, after experimenting, I found it lacked the ability to instropect an existing database, requiring you to input the schema in its own YAML format. Pyrseas was born as a result of thinking on how to improve this.

Fast forward to the present: Pyrseas 0.4.0 has been released, supporting about 70% of the PostgreSQL DDL features. This seems like a good time to reflect on where to go next, both short term and long term.

Database Version Control

On the version control front, the following tasks come to mind:

  • Add support for missing DDL features: TEXT SEARCH objects, USERs/ROLEs, TABLESPACEs, FDWs and EXTENSIONs.
  • Ensure existing DDL features are covered fully, i.e., all syntax options
  • Add further options to dbtoyaml/yamltodb for selective output or SQL generation
  • Package Pyrseas as a PG 9.1 EXTENSION, allowing access from other languages and tools
  • Support migration of data, i.e., when ALTER TABLE is not enough
  • Your favorite version control feature

Application Generation

Progress on this front has been slow. Pyrseas could generate minimalist web admin apps with CherryPy, Jinja2 and perhaps something like WTForms, but how many users would want to use them? Conversely, how many would also want support for HTML5, JQuery, or some other WSGI server or templating engine?

I’m not decided, but I still need that admin app, so perhaps I’ll build the capability and blog about it, for tutorial purposes, as I have done so far.

What do you think? What is your favorite database version control feature? What would you like to see in a database web app generation tool? Let me know …

P.S. I’ll be attending PGBR 2011, so hopefully I can hear from some of you personally, in English, em Português o en español.

Version Control for PostGIS

A year ago, I was only trying to “rescue” Andromeda

When I started the Pyrseas project, I was mainly interested in improving on the basic table/column version control of Andromeda. I never thought I’d be adding support for “exotic” features such as base types, operators or operator classes. However, the initial feedback led me to set my sights much higher: Pyrseas was to offer support for all PostgreSQL data definition features.

Over the past two and a half months, I’ve added coverage for those advanced features and more. I’m glad to report that, with the help of Leo Hsu and Regina Obe, the Pyrseas tools are now able to output all the objects1 in a database loaded with postgis.sql, spatial_ref_sys.sql and postgis_comments.sql from PostGIS 1.5, in YAML format. The tools are also able to take in that YAML and generate the SQL necessary to recreate all those objects, in the correct order, on an empty database, as well as to reverse the process.

The following is an update to the feature coverage table presented earlier. Although Pyrseas 0.4 has not been released yet, all these capabilities are present in the current HEAD GitHub repository.

DDL Feature 0.1 0.2 0.3 0.4
AGGREGATE  X  X
CAST  X
CONSTRAINT TRIGGER  X
CONVERSION  X
DOMAIN  X  X
FOREIGN DATA WRAPPER
FUNCTION  X  X  X
GROUP
INDEX  X  X  X  X
LANGUAGE  X  X  X
OPERATOR  X
OPERATOR CLASS  X
OPERATOR FAMILY  X
ROLE
RULE  X  X
SCHEMA  X  X  X  X
SEQUENCE  X  X  X  X
SERVER
TABLE  X  X  X  X
– CHECK CONSTRAINT  X  X  X  X
– PRIMARY KEY  X  X  X  X
– FOREIGN KEY  X  X  X  X
– UNIQUE CONSTRAINT  X  X  X  X
– INHERIT  X  X  X
TABLESPACE
TEXT SEARCH CONFIGURATION
TEXT SEARCH DICTIONARY
TEXT SEARCH PARSER
TEXT SEARCH TEMPLATE
TRIGGER  X  X
TYPE  X  X
– ENUM  X  X
– composite  X
– base type  X
USER
USER MAPPING
VIEW  X  X  X
COMMENT  X  X  X

I’d like to thank Leo and Regina for their help, as well aquilax, mhow and acornett at GitHub for being early adopters and raising issues to improve Pyrseas functionality.

If you have a challenging PostgreSQL database, I’d like to encourage you to give Pyrseas a try, and let me know how it works out.


1 For those interested, this includes 2 tables, 1 view, 9 base types, 1 composite type, 777 functions, 17 aggregate functions, 19 casts, 4 operator classes/families, 23 operators, and 315 comments.

User Interfaces for Databases

Pyrseas was started to improve on Andromeda. In addition to the schema version control I’ve been discussing in these posts, Andromeda provides two other capabilities: automations and web application programming.

Andromeda’s web programming allows you to generate a database application with very little actual programming, since Andromeda takes into account the primary key, foreign key and other information present in the YAML specification to generate much of the application code for you. This is another area that I’d like to explore for implementation in Pyrseas.

Administrative Applications

Nearly any database needs a user interface to add, retrieve, update and delete records, known by the unattractive acronym CRUD. There are a number of admin apps that provide such generic facilties. For example, for PostgreSQL we have pgAdmin which offers a GUI, and phpPgAdmin and Adminer which offer PHP web interfaces. These have a couple of limitations.

Admin apps almost always operate on a single table at a time. Interfaces for end users often require interacting with or viewing two or more tables on a single screen or page, e.g., customers with their invoices, accounts with their transactions.

By definition, admin apps are “raw” and cater to the needs of DBAs and programmers. After retrieving a customer record searching by name, the admin user can update any column, including the primary key and other columns that end users would normally be prevented from updating.

Django Admin

Django includes an appealing admin site. With a moderate dose of programming, you can develop an application suitable for end users. However, you’re at the mercy of Django’s constraints on “modeling” a database. Take for example the film_category table I used in my second post on version control:

CREATE TABLE film_category (
    film_id INTEGER NOT NULL REFERENCES film (id),
    category_id INTEGER NOT NULL
        REFERENCES category (category_id),
    last_update TIMESTAMP WITH TIME ZONE NOT NULL,
    PRIMARY KEY (film_id, category_id)
);

Django cannot use this table “as is” because it insists in having a single column as the primary key. So you end up adding an unnecessary “id” column together with a SEQUENCE (and needless to say, you incur the wrath of the SSSKA).

We could turn to SQLAlchemy, a Python-based ORM that understands that a primary key –albeit singular– can encompass more than one column, and some other web framework or even Django itself, and attempt to recreate the nice Django admin app. However, aside from the additional effort that would represent, we wouldn’t be taking advantage of the YAML database specification we can now store in our VCS.

This is a very broad, and some may say, ambitious topic, but as I did with version control, in future posts I’ll survey existing “art” and progress towards designing and implementing a generic end user interface for PostgreSQL. Reader feedback will be much appreciated.