PGBR 2011 recap

PGBR 2011 group photoThis is a summary of what I saw and heard at PGBR 2011.

The night before the conference I had the opportunity to greet Alvaro Herrera, Bruce Momjian, Dave Page and Greg Smith.

The conference opened on Thursday 3rd November with introductory remarks by Fábio Telles, Flavio Gurgel, Bruce and Euler Taveira. I then attended Dave’s talk which covered the new foreign data wrapper functionality in PG 9.1.  The presentation was given with “simultaneous” translation into Portuguese, which meant I listened to almost every point twice. Next I went to Rogerio Bassete‘s talk which dealt with the PostgreSQL windowing functions. Rogerio gave a multitude of examples, but unfortunately the time was relatively short.

At lunch, I talked with Rogerio, Fabrízio de Royes Mello and Dickson Guedes. It was personally satisfying to find out that Fabrizio had downloaded and tried Pyrseas, and was possibly going to “adopt” some of it. Dickson also mentioned that some years ago he had tried to implement something similar, so he was glad when he found out about Pyrseas.

The first slot after lunch was taken by the round table on the Brazilan market for PostgreSQL-based services. The participants included Euler (representing Timbira), Rodolfo Gobbi (for 4Linux–I hope I have the name right), Charly Batista (for XPort), Nabucodonosor Coutinho (for Mondrian Tecnologia) and Luis Dosso (for Dextra), with Fernando Ike acting as moderator. The overall take was that the Brazilian PostgreSQL market is mature yet still challenging, supporting variously sized provider companies serving a variety of public and private organizations. After a snack break, where I chatted with Coutinho, I attended Greg’s benchmarking talk (without translation).

The first day was topped off by the group photo followed by “happy hour” which consisted mostly of Brazilian “chopes” (allegedly 200 litres of it).

On Friday I had breakfast with Jaime Casanova and mentioned I had earlier been responsible for another product named “repmgr” (for another DBMS). I later attended Greg’s performance pitfalls talk, translated by Flavio, and Dickson’s presentation covering the PG 9.1 EXTENSION capabilities and PGXN. At lunch I spoke with Charly.

Probably the most interesting session was Diogo Biazus‘ PL/pgSQL programming dojo. Diogo sported a mohawk although the Brazilians called him a “mohicano,” but it wasn’t his hairstyle that was interesting: it was the concept of “dojo” as in “learn by doing.” Rather than giving a tutorial on PL/pgSQL, Diogo presented a programming challenge (the hard of hearing granny) and then assisted the attendees in developing a solution in PL/pgSQL using TDD and pair programming. Participants took five minute turns being the “pilot” and “co-pilot” of the pair. A most interesting experience!

For the last formal talk, I heard Flavio’s talk about myths and realities of Postgres DBA projects. To close, the Brazilian PG group presented their first ever awards to community members, followed by quite informal lightning talks, closing remarks and another group picture.

Overall, a very useful and enjoyable event. Thanks to all the organizers, speakers and others with whom I spoke! Thanks also to those who invited me back (no promises, but we’ll see).


A Multi-Layered Test Cake

Dobos layered cake*

A recurrent theme in software engineering are the multiple layers (or tiers) into which systems are subdivided. In the ideal case, a given layer only interacts with the immediate layers “above” and “below” it. In practice, that clean separation of responsibilities is not always possible or, some may argue, desirable.

Nevertheless, thinking in terms of layers is quite useful. For example, the database application I’ve been presenting in this series consists of a PostgreSQL database, psycopg2 (a Python DB-API adapter for PostgreSQL), a business logic module that calls on psycopg2, Werkzeug (a WSGI library), the application modules that call on the business logic and Werkzeug, the HTTP protocol, and the user’s web browser.

As application developers, we need not concern ourselves with the details of how PostgreSQL stores the data that we send to it, nor with testing its internal functions. We rely on PG hackers to cover that testing. Similarly, we rely on Psycopg2 programmers to test communications with PostgreSQL and the Werkzeug team to test its WSGI and HTTP-related functionality. We do have responsibility for testing the business logic, the application code, and the interface presented to the user via the browser.

Tag v0.4.1 of the Database UI Tutorial at GitHub adds a set of unit tests for exercising the business logic module (i.e., werkzeug/bl/film.py). The test module (werkzeug/tests/bl/test_film.py) is pretty straightforward: there is a test method for each method in the business logic class. In addition, there are methods for verifying failure conditions, such as duplicate inserts or deleting a missing record.

Although the unit tests were written after the business logic module (i.e., not in keeping with test-driven development), they helped to resolve an open design issue. Earlier, some methods in dblib.py (thin interface to psycopg2) took an “expected count” argument to check the number of rows affected by a database operation. That didn’t look right, so I had removed it.  While developing the unit tests, it became clear that the checks ought to live in the business logic layer.

Tag v0.4.2 adds unit tests for testing the WSGI or web interface. I was very pleased with how easy it was to use Werkzeug’s testing faciities, Client and BaseResponse, to create the tests: simply “drop” the database app instance into the Client and then call its get or post methods to get the response headers and data. I only had to add the standard library’s xml.dom.minidom to examine particular elements in the HTML body.

Both the business logic and web test modules depend on a utility module which is “reheated” from the Pyrseas version control testing. A nice capability is that the base DbAppTestCase, in its setUpClass method (a Python 2.7 feature), ensures the test database is in sync with the YAML spec.

Coming up: testing the user (browser) interface, via Selenium.

* Dobos cake photograph, courtesy of Bruce Tuten.


SQL Version Control System Design Issues

The initial release of Pyrseas is nearing completion. As noted elsewhere, Pyrseas was born to follow through on the concepts of the Andromeda Project. The idea of using a data dictionary specified in JSON or YAML was perhaps the most attractive feature of Andromeda. However, I felt there were several areas that could be improved upon. In this post I’ll go over some of the design issues that influenced the creation of the Pyrseas version control utilities.

Test Driven Development

One of the goals of Pyrseas is to encourage, support or otherwise assist in the development of test-driven applications. The tools should allow the DBA or developer to experiment with changes to database tables and other objects with almost as much ease as a file-based application programmer can change the file structures (granted, the latter has much more work overall due to the lack of a DBMS). ALTER TABLE is great to experiment and make changes, but you need something else if you also want (or need) to keep a current representation of the CREATE TABLE statement in your VCS.

In order to encourage TDD, I think it’s crucial for the tools to do their work speedily. Andromeda takes about 10 seconds to process a database change such as those I have been using as examples. The analogous Pyrseas tool completes the task in one to two orders of magnitude less time.

Team Development

Another objective is to make the tools convenient for use in a team environment. In other words, it should be easy for one DBA to make a change in her database and confidently share it –together with a set of related code changes– with her colleague via a distributed VCS. Note this does not preclude them being used by a single developer.


Andromeda insists on a single data dictionary file. Although it requires “modules” to be defined to group tables, all user tables are created in the ‘public’ schema. It appears that Andromeda “modules” are primarily for user interface and security constraint purposes.

The initial release of Pyrseas recognizes the existence of PostgreSQL schemas and allows the DBA to place the tables in schemas as desired. Currently, the Pyrseas dbtoyaml tool (see below) can output specifications for a single schema or for selected tables. I hope to refine this capability in the future, so that –if desired– the data dictionary file could be split and stored in the VCS in a modular fashion, i.e., side-by-side or close to related application code.

Existing System Support

One aspect of Andromeda that I found limiting when I started delving into it was what happens if you want to use if for an existing database, application or project. First, it requires you to create a YAML specification of your database manually, according to its rules, for example, columns need to defined before tables and then “re-used” to specify the tables. Secondly, you can only use a limited set of data types, e.g., a ‘Y’ or ‘N’ column for a BOOLEAN.

For Pyrseas, the first tool designed was dbtoyaml which connects to a database and outputs its schemas, tables and columns in a YAML/JSON format. This format can then be input directly into the second tool, yamltodb. The latter generates SQL statements to bring a second database to match the first.

Summarizing all of the above, this allows a DBA to make a database change either through psql, pgAdmin or some other tool, and then use dbtoyaml to quickly generate a YAML dictionary which can be submitted to the VCS together with related code changes. Another DBA or programmer can pull the changes and apply them promptly to their database to test or integrate them with their own database and code changes.

Platform Support

This an area that concerns implementation but nevertheless affects design. SQL database version control tools can opt to support multiple databases or just one. Presently, I’ve decided to concentrate on PostgreSQL. I believe this can provide complete or nearly complete coverage of the many PG features, and will keep my hands full. If Pyrseas is found useful by others, I’ll be glad to assist anyone with expanding support to other DBMSs.

Development of Pyrseas was done under Linux, but since it’s written in Python, hopefully it will encounter few problems on other OSs.

Comments and discussion welcome!