Python Development Tools of the Trade

Last week, Bruce Momjian commented that pgindent is one of the reasons for the clarity of the PostgreSQL source code. Today, Andrew Dunstan remarked on “how cleanly it compiles” and no doubt this is due in great part to the buildfarm. So I thought I’d reflect on the tools I’ve been using to ensure the clarity and quality of the Pyrseas code.

Editors

Early in my career, I had a brief encounter with TECO. I also encountered Brief (but for a longer time). Perhaps that explains why, after many years of using vi and vim, in the past few years I’ve gone back to my roots, if you will, and I’m using Emacs for programming.

Some of the Emacs features I find helpful or convenient in developing quality code include (in no particular order):

  • Syntax highlighting: Whether it’s Python, ReStructured Text for the Sphinx documentation, or C in the PostgreSQL code, syntax coloring is nearly always on, by default based on the filename extension.
  • Parentheses matching: Not only for parens, but also for brackets and braces (particularly helpful when creating or editing Python dicts and JSON/YAML maps).
  • which-func-mode: Being able to tell which function/method you’re editing, on the status line, when the head of the function may not be in view.
  • Git branch: Shows in what branch you’re editing, again on the status line. For Subversion, it shows SVN-nnn where nnn is the revision number that last affected the current file.

Unit Testing

I have used the Python unittest testing framework, almost religiously, to develop tests before writing code. I initially also looked at py.test and briefly at nose, but decided the standard library module was good enough. The unit tests have been invaluable when adding features or refactoring code.

Version Control

Git was my choice for a VCS. Two Git features are worth mentioning. One is the change coloring in git diff, but also the context information, e.g., it shows the Python class in which the difference occurs. Second is the ability to stash away work in progress.

Code Quality

I’ve been using pep8 to ensure the Pyrseas code follows as much as possible the Python Style Guide. This is the tool closest to pgindent, but it doesn’t reformat the code.

Lastly, before making the code available publicly, I ran pylint against it. Then I edited it accordingly to improve readability and to eliminate some warning categories.

SQL Database Version Control and RENAMEs

You’ve just watched Josh Berkus’ presentation “Ten Ways to Wreck Your Database” and bearing in mind point #2 “ranDom_naming(s),” your team has decided to get your act together and adopt a naming convention (although this may not be advantageous to your job security, per josh :-)).

PostgreSQL makes it fairly easy to rename schemas, tables and columns via an appropriate ALTER object RENAME statement. Aside: Other SQL implementations provide similar capabilities to various degrees of completeness. Some SQL version control tools support such changes. For example, Liquibase supports renaming tables, columns and views.

On the other hand, an ALTER statement –like a biological mutation– destroys information, which creates a problem for difference engines such as apgdiff, Andromeda and Pyrseas. It may be obvious to a human that the film.rating column and category table in the production database correspond to the renamed film.mpaa_rating column and categories table, respectively, in the development database. However, there is nothing in the latter’s system catalogs that tie the new names to the previous ones. As a result, difference engines will usually DROP the former objects and CREATE others with the new names.

The Pyrseas solution to this problem is to add an oldname field to the YAML spec. For example, if you have the following abbreviated spec (generated by dbtoyaml from the development database):

schema public:
  table film:
    check_constraints:
      ...
    columns:
    ...
    - mpaa_rating:
        type: character(5)
    ...
    primary_key:
    ...

By editing the spec and making the following change to the mpaa_rating column:

    - mpaa_rating:
        oldname: rating
        type: character(5)

Then yamltodb will, when run against a database that still uses the previous colum name, generate the following SQL script:

ALTER TABLE film RENAME COLUMN rating TO mpaa_rating;

Similarly, you can use oldname on a table to generate ALTER TABLE name RENAME TO new_name and on a schema to generate ALTER SCHEMA name RENAME TO new_name.

This is not a foolproof solution because the rename is a one-time action. You could commit the changed spec to your VCS, but if you ran yamltodb against a database after the RENAME has been applied, it will give an error because it cannot find the oldname object. Aside: I’m considering changing that to simply a warning. Another option would be to allow yamltodb to read more than one YAML spec or one that lists some objects more than once, so the extra information could be added when needed. For these types of changes, a complementary tool such as depesz Versioning can be used to introduce the RENAMEs into production.

Pyrseas 0.1.0 includes the RENAME capability for schemas and tables and I’ve recently committed the changes needed for RENAMEing columns.

In other project news, I’ve also added support for FOREIGN KEY ON UPDATE and ON DELETE actions, support for COMMENT statements on schemas, tables and columns, and corrected a problem with indexes being created in the wrong schema. With these changes, the autodoc regression database schemas, tables, columns, primary keys, foreign keys, indexes and comments on these objects can all be properly recreated using dbtoyaml and yamltodb, with the exception of the inheritance tables.

SQL Database Version Control – Pyrseas

Now that Pyrseas has been released, it’s time to see how the DBAs Carol and Dave would use it to manage the changes described in Version Control, Part 2: SQL Databases.

Version 0.1

Carol starts by creating the film table in her database, caroldev, using the CREATE TABLE script, version 0.1, which we saw in the original article. She then uses the Pyrseas dbtoyaml tool to ouput the database specification:

$ dbtoyaml caroldev
schema public:
  table film:
    check_constraints:
      film_release_year_check:
        columns:
        - release_year
        expression: (release_year >= 1888)
    columns:
    - id:
        not_null: true
        type: integer
    - title:
        not_null: true
        type: character varying(32)
    - release_year:
        not_null: true
        type: integer
    primary_key:
      film_pkey:
        access_method: btree
        columns:
        - id

This should be mostly self-explanatory, but let’s go over it. Indentation indicates structure (nesting) and hyphens are used for each element in a list. Thus, there is one schema (public) and one table (film) defined within it. The table has three columns (id, title and release_year) with the given data types and none of them are nullable. The PRIMARY KEY (film_pkey) is on the id column and there is a CHECK constraint on the release_year column. By the way, the alphabetical order within each level is simply the default output format of PyYAML.

Carol can redirect the output of dbtoyaml to a file, say, film.yaml and store it in a Git repository or some other VCS. Dave will then be able to pull the file from the repository and use it to generate the SQL statements needed to create the table in his database, or the development, test and production databases, using the yamltodb tool as follows:

$ yamltodb -1 davedev film.yaml
BEGIN;
CREATE TABLE film (
    id integer NOT NULL,
    title character varying(32) NOT NULL,
    release_year integer NOT NULL);
ALTER TABLE film ADD CONSTRAINT film_pkey PRIMARY KEY (id);
ALTER TABLE film ADD CONSTRAINT film_release_year_check CHECK (release_year >= 1888);
COMMIT;

If he is satisfied with the SQL, he can pipe the output to psql, e.g.,

$ yamltodb -1 davedev film.yaml | psql davedev

Version 0.2

For the second version, Dave issues an ALTER TABLE to add the length column, invokes dbtoyaml to create a new film.yaml and commits that to his repository. Carol uses another ALTER TABLE to add the rating column in her database, and stores a new film.yaml in her repository.

Manny, the integration manager, merges the new film.yaml versions into the project-wide repository. He uses yamltodb to output the statements needed to upgrade to version 0.2:

$ git diff 0.1 0.2
diff --git a/film.yaml b/film.yaml
index 0633c1b..98f19c4 100644
--- a/film.yaml
+++ b/film.yaml
@@ -1,6 +1,10 @@
 schema public:
   table film:
     check_constraints:
+      film_length_check:
+        columns:
+        - length
+        expression: ((length > 0) AND (length < 10000))
       film_release_year_check:
         columns:
         - release_year
@@ -15,6 +19,10 @@ schema public:
     - release_year:
         not_null: true
         type: integer
+    - length:
+        type: smallint
+    - rating:
+        type: character(5)
     primary_key:
       film_pkey:
         access_method: btree
$ yamltodb moviesdev film.yaml
ALTER TABLE film
    ADD COLUMN length smallint;
ALTER TABLE film
    ADD COLUMN rating character(5);
ALTER TABLE film ADD CONSTRAINT film_length_check CHECK ((length > 0) AND (length < 10000));

Version 0.3

For the next round, Carol and Dave issue the statements shown under Version 0.3 in the previous article and commit newer versions of film.yaml which Manny integrates into the project repository. The yamltodb output against the project database is as follows:

$ yamltodb -1 film.yaml
BEGIN;
CREATE SEQUENCE category_category_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MAXVALUE
    NO MINVALUE
    CACHE 1;
CREATE TABLE film_category (
    film_id integer NOT NULL,
    category_id integer NOT NULL,
    last_update timestamp with time zone NOT NULL);
CREATE TABLE category (
    category_id integer NOT NULL DEFAULT nextval('category_category_id_seq'::regclass),
    name character varying(25) NOT NULL,
    last_update timestamp with time zone NOT NULL);
ALTER SEQUENCE category_category_id_seq OWNED BY category.category_id;
ALTER TABLE category ADD CONSTRAINT category_pkey PRIMARY KEY (category_id);
ALTER TABLE film_category ADD CONSTRAINT film_category_pkey PRIMARY KEY (film_id, category_id);
ALTER TABLE film_category ADD CONSTRAINT film_category_category_id_fkey FOREIGN KEY (category_id) REFERENCES category (category_id);
ALTER TABLE film_category ADD CONSTRAINT film_category_film_id_fkey FOREIGN KEY (film_id) REFERENCES film (id);
COMMIT;

Summary

I cannot objectively review the tools I have created, but sincerely hope others will find them useful in managing changes to PostgreSQL databases in conjunction with a VCS.

Pyrseas dbtoyaml and yamltodb require Python, psycopg2 and the PyYAML library. They currently support the basic PostgreSQL features (schemas, tables, PRIMARY KEYs, FOREIGN KEYs, UNIQUE and CHECK constraints, indexes) but the intention is to expand them to cover all features.

The primary audience of the tools are DBAs and developers, but they can be used in conjunction with something like depesz Versioning to implement a stricter system for control over production databases.

I’d like to take this opportunity to thank Adam Cornett, an early adopter/tester, who found an issue with a FOREIGN KEY across schemas, which has been fixed in the master repository.

First release of Pyrseas

just a quick note to announce that the first release of Pyrseas is now available as follows:

Github

https://github.com/jmafc/Pyrseas: Git repository. You can report any issues there.

Python Package Index

http://pypi.python.org/pypi/Pyrseas:  Source tar archive.

PgFoundry

http://pgfoundry.org/projects/pyrseas/: Source tar archive.

Discussion

You can of course leave comments below, but there is also a general mailing list available at PgFoundry, which also hosts forums.

My deepest thanks to the PgFoundry team for all the support tools.

In addition there is a website which is currently mostly bare, but I hope to upload the documentation soon. Update: The initial documentation is now available here.

Pyrseas was developed on Debian Linux with PostgreSQL 8.4, and also tested against 9.0, using Python 2.6, psycopg 2.2 (also tested with 2.4), and PyYAML 3.09. I intend to test on Windows, but I’ll be thankful if some adventurous soul tries it out first.