All posts by jma

Attributes in the Pythonic, TTM-inspired interface to PostgreSQL

The Third Manifesto‘s Relational Model (RM) prescription 9 defines a relation heading as “a set of ordered pairs or attributes of the form <A,T>,” where A is the name of the attribute and T is the name of the declared type of the attribute. It then defines a tuple value (or tuple for short) as a set of ordered triples of form <A,T,v> where v is an arbitrary value of type T, called the attribute value.

Class Attribute of the TTM-inspired interface to PostgreSQL models the latter ordered triple rather than the ordered pair. I considered implementing the pair, say as AttribType, and deriving Attribute from it, but opted for the leaner, no-hierarchy solution for now.

An Attribute object should be initialized with a name, Python type and a value. However, only the name is required. The default type is str and for Python 2, unicode is treated almost as a synonym for str (I realize this goes against everything that your mother taught you, but I’m hoping that widespread Python 3 adoption will make this moot soon).

I wholehearteadly agree with TTM RM proscription 5 banning attributes that do not have a value, i.e., like SQL NULLs. However, I hope the interface is useful on existing databases, so in the interest of practicality, class Attribute has two additional, optional arguments: nullable and sysdefault. The former is to specify that an attribute allows NULLs (or None in Python). The latter can be used to indicate the corresponding table column has an SQL DEFAULT specification (this includes those defined as SERIAL or BIGSERIAL).

If an attribute is not nullable and not sysdefault, a value must be specified. If omitted, a suitable default is created based on the type (if possible), i.e., an empty string for str, 0 for int, 0.0 for float. If the attribute is nullable, empty string, 0 or 0.0 are converted to None. This approach is to facilitate dealing with empty HTML form fields, i.e., if the user skips a nullable field, the attribute should end up as a NULL in the database.

If a value is provided, the code raises ValueError if the value does not agree with the specified (or defaulted) Python type. The only exceptions are that an int value is cast to float, and a unicode value is allowed if the type is str and we’re working in Python 2.x.

A Pythonic, TTM-inspired interface to PostgreSQL – Requirements

Several moons ago, I started a series of posts about “designing and implementing a generic end user interface for PostgreSQL.” After a while, the series got sidetracked by other issues.

More recently, I have returned to the original endeavor. Partly from reading Database Explorations: Essays on The Third Manifesto and related topics by C.J. Date and Hugh Darwen, I decided to use relational concepts as presented in The Third Manifesto (TTM) in my implementation. This post provides an overview of the requirements.

Limited Scope

The interface is not a full-blown replacement for an object-relational mapper (ORM) (although in theory it could eventually grow in that direction). The interface is intended to assist with two typical needs of a database “admin” application: browsing and CRUD.

Browsing refers to presenting a subset of rows (tuples) of a table (relation variable or relvar) for subsequent editing. The relvar will typically be normalized so it may be necessary to join it to other relvars. Browsing will usually display a limited number of columns (attributes) so relational projection will be needed.

CRUD refers to the ability to create, read, update and delete single tuples in a relvar. The interface should only support relvars with a properly defined, possibly composite primary key.

Simplicity

The user (developer) should have to define only the attributes of each relvar together with the key, and for browsing, the projected attributes plus a JOIN specification if multiple relvars are involved. The definitions should be simple enough so that most of them could be (at a later date) derived automatically from the database catalogs.

From the definitions, the interface should generate all necessary SQL commands to INSERT a single tuple (possibly returning a generated key value), retrieve, UPDATE or DELETE a single tuple using the key, and fetch subsets of projected/joined tuples in a given order.

Optimistic Concurrency Control

The interface should take advantage of PostgreSQL features to implement optimistic locking when handling updates or deletes, as described in a previous series of posts.

Query by Example Support

The interface should facilitate querying of the browsed tuples using something similar to Query-By-Example. For example, when browsing movies if the argument release_year is passed as “>= 1969“, the results should only include films released on that year or later. This feature was not discussed in a post but had been committed to the tutorial repository.

TTM and SQL

The interface should follow the TTM guidelines when possible. For example, although implemented in Python, assignment to a relvar attribute defined as int should not be allowed if the value is of type str, and duplicate attribute names in a join expression should not be permitted. However, since the interface ought to be usable against existing SQL databases, allowance should be made for certain SQL features such as nullable attributes.

The implementation has been committed to the Pyrseas repository and changes were made to the DBUI tutorial to use the new interface. Subsequent posts will cover the interface in more detail.

A couple of Pyrseas enhancements

Based on feedback from users and contributors, Pyrseas now sports two enhancements.

Multi-line String Formatting

Up to Pyrseas 0.6, long textual elements such as view definitions, function source text and long object comments, would usually be shown in the YAML output as quoted strings with embedded newlines. Here are two examples from the autodoc database:

schema product:
  description: "This schema stores a list of products and information\n about the\
    \ product"
...
schema warehouse:
  view products:
    definition: " SELECT DISTINCT product.product_id, product.product_code, product.product_description\n\
      \   FROM warehouse.inventory\n   JOIN product.product USING (product_id);"

As you can imagine, this was particularly unsatisfactory for complex functions and views. Thanks to preliminary work by Andrey Popp, Pyrseas 0.7 will be able to format these elements in YAML block style. The above elements will be shown as follows:

schema product:
  description: |-
    This schema stores a list of products and information
     about the product
...
schema warehouse:
  view products:
    definition: |2-
       SELECT DISTINCT product.product_id, product.product_code, product.product_description
         FROM warehouse.inventory
         JOIN product.product USING (product_id);

Thanks to testing by Josep Martínez, 0.7 will also properly display and handle such strings even when they include non-ASCII characters such as accented characters. For example, in 0.6, “Martínez” would be shown as “Mart\xEDnez”. In 0.7, the output will be the original UTF-8 string.

Directory of Database Objects

Pyrseas 0.6 has a single format for output by dbtoyaml or input into yamltodb: a single YAML-formatted file. This becomes a problem when your database has hundreds or more tables, functions, etc (let alone 409,994 tables and counting!). Furthermore, as dbtoyaml and yamltodb are intended to assist with database version control, your team may want to store individual object specifications in your version control system, or you may want to diff individual objects.

The 0.7 --directory option to dbtoyaml and yamltodb allows you to split the YAML spec into multiple files in a directory (or folder) tree. For example, using the dbtoyaml -d option on the autodoc database results in the following tree (shown under Linux using ls -RF):

.:
schema.inherit/      schema.public/      schema.warehouse/
schema.inherit.yaml  schema.public.yaml  schema.warehouse.yaml
schema.product/      schema.store/
schema.product.yaml  schema.store.yaml

./schema.inherit:
table.tab1b.yaml  table.tab1.yaml  table.taba.yaml  table.tabb.yaml

./schema.product:
function.worker.yaml  sequence.product_product_id_seq.yaml  table.product.yaml

./schema.public:

./schema.store:
sequence.store_store_id_seq.yaml  table.inventory.yaml  table.store.yaml

./schema.warehouse:
function.worker.yaml                      table.warehouse.yaml
sequence.warehouse_warehouse_id_seq.yaml  view.products.yaml
table.inventory.yaml

As can be seen, each schema gets its own directory wherein are stored each object belonging to that schema. In addition to schemas, the root level also stores non-schema owned objects such as foreign data wrappers and extensions (the latter can be placed in a schema, but are not owned by it).

The directory tree and multi-line string formats are still under review, so I’d like to encourage you to test both enhancements and provide feedback.

Testing Python and PostgreSQL on Windows, Part 6

Alliterative locales, languages, collations.

A tox on all your houses (test combinations).

The last item to fix in the Pyrseas unit tests so that they run on Windows is related to the PostgreSQL 9.1 COLLATION feature. When creating the tests, I was influenced by the examples in the documentation, i.e., I created a collation with ‘fr_FR.utf8′ LC_COLLATE and LC_CTYPE. On Linux, it’s fairly straightforward to add such a locale to your system (although perhaps Windows users may disagree :-)), so the tests worked as expected.

On Windows, however, most collation tests failed with

DataError: could not create locale "fr_FR.utf8": No error

Unfortunately, the PG documentation doesn’t seem to provide any hints on what is the Windows equivalent of ‘fr_FR.utf8′ (or similar Linux locales). Eventually I figured it out by looking at the output of \l (list databases) in psql. This showed the Collation and Ctype (in my case) where ‘English_United States.1252′ so I assumed what was needed was ‘French.France.1252′. Here is the procedure to set that up:

Open the Control Panel, select Date, Time, Language, and Regional Options, then Regional and Language Options (or Add other languages), click on the Advanced tab in the dialog and then choose “French (France)” from the dropdown. Finally, click OK and respond to any subsequent prompts to install the locale, including rebooting the machine.

Aside: For comparison, on Debian Linux, the equivalent procedure involves running sudo dpkg-reconfigure locales, selecting fr_FR.UTF-8 UTF-8 from a list, accepting the default locale and waiting for the locales to be generated (no reboot necessary). Second aside: On Linux, you can deselect a locale to remove it from your system, but Windows doesn’t appear to allow for language removals.

To test, make sure you have the latest Pyrseas code from GitHub, which includes a change to fix the COLLATION tests to run on Windows.

Finally, we’re ready to install Tox and run all the unit tests with a single command. First, run pip install tox under both Python 2.7 and 3.2. Next, define (set) the environment variables PG84_PORT, PG90_PORT, PG91_PORT, and PG92_PORT to point to the corresponding PostgreSQL ports.

Then simply invoke tox from the Python 2.7 environment. Thanks to the Pyrseas tox.ini, this will install Python 2.7 and 3.2 virtualenvs, under a .tox subdirectory in the Pyrseas tree, install Psycopg2, PyYAML and Pyrseas into each virtualenv and run the unit tests eight times, once for each combination of Python and PostgreSQL.

If you have been following along, the only test failure should be in test_extension.py, in test_map_lang_extension, when attempting to CREATE EXTENSION plperl, due to the missing PERL514.DLL (see previous post). The error will only occur under PG 9.1 and 9.2.

The only problem I noticed with tox is that when there are errors it may get confused in its summary report.

___________________________________ summary ___________________________________
  py27pg90: commands succeeded
  py27pg91: commands succeeded
  py27pg92: commands succeeded
ERROR:   py32pg91: commands failed
  py27pg84: commands succeeded
ERROR:   py32pg84: commands failed
ERROR:   py32pg90: commands failed
ERROR:   py32pg92: commands failed

The errors actually occurred in the *pg91 and *pg92 environments but tox reports that all py32* tests failed, which was not the case. This is a minor issue considering all that tox accomplishes, with very little setup or configuration.

Testing Python and PostgreSQL on Windows, Part 5

I’ve got the Perl on Windows blues …

Aside from PL/pgSQL, the base distribution of PostgreSQL supports three procedural languages: Perl, Python and Tcl. When creating Pyrseas unit tests for languages (before they became EXTENSIONs), PL/Perl seemed like the “natural” choice. Perl is available on virtually all Linux and BSD base distributions, and (in contrast to Python) PL/Perl is available in both trusted and untrusted versions—a distinguishing attribute that the unit tests ought to check.

Alas, when testing on Windows I discovered that Perl isn’t as ubiquitous or as easy to deal with as it is on Linux.

The preliminary research prompted me to install Active Perl, specifically its Community Edition. Currently, the only two versions available are 5.14 and 5.16. Strangely enough, if you install one of those versions and then attempt to create the PL/Perl language under PG 8.4 or 9.0, the command succeeds, but when you try to create a function using plperl, you’ll see something like:

ERROR:  could not load library "C:/Program Files/PostgreSQL/8.4/lib/plperl.dll":
 The specified module could not be found.

Against PG 9.1 and 9.2, if you installed Perl 5.16, you’ll see the error message when issuing the CREATE EXTENSION (or LANGUAGE) statement. If you installed Perl 5.14, there should be no error.

When I first saw the error message, I was a bit puzzled since the plperl.dll libraries had all been installed and were located in the paths show in the messages. What “specified module” was missing?

Some web searching pointed me to Dependency Walker (depends.exe), a tool that appears to be indispensable if you’re going to be testing with multiple executable and DLL versions. It is analogous to Linux ldd. Depends.exe showed that plperl.dll in the 8.4 and 9.0 installations was linked with PERL510.DLL and in 9.1 and 9.2 with PER514.DLL.

Unfortunately, Active Perl has no Perl 5.10 Community Edition available, so off I was looking for an alternative. Thus I found Strawberry Perl.

The downside of Strawberry Perl’s installers is that they install it in C:\strawberry so you can’t have both Perl 5.10 and 5.14 at the same time. Someone on IRC explained that it is possible to install it in two separate paths (but it ain’t easy). For now, I chose to only install Perl 5.10. This allowed me to test Pyrseas using Python 2.7 and 3.2 against PostgreSQL 8.4, 9.0, 9.1 and 9.2, with only one Perl-related test faling (under PG 9.1 and 9.2, due to the absence of Perl 5.14).

A note of caution: Strawberry Perl installs GCC (3.4.5 in the Perl 5.10 version). If you have a pydistutils.cfg specifying a mingw32 compiler (as mentioned in my previous post), that may cause problems if you try to install or upgrade psycopg2 (or some other C extension module).

Testing Python and PostgreSQL on Windows, Part 4

At the end of Part 2, I suggested those who were anxious to start testing could try python tests\dbobject\test_schema.py right after installing psycopg2, and implied everything would work just fine by showing the output of a successful run.

That was deceptive because before running the Pyrseas tests you need to create a PostgreSQL user. So you first need to connect using psql or pgAdmin, as the postgres user and issue a command such as the following:

CREATE USER username CREATEDB SUPERUSER password;

Of course, username and password should be your user name and a suitable password, respectively. While you’re at it, you can also create the two testing roles:

CREATE ROLE user1;
CREATE ROLE user2;

The user username needs the CREATEDB privilege to create the Pyrseas test database, by default, pyrseas_testdb. Alternatively, the test database could be created first, e.g., by postgres and owned by username. Also, if username isn’t granted the SUPERUSER privilege then the tests requiring it will be skipped.

One last detail before being able to run the tests: create a PostgreSQL password file, i.e., %APPDATA%\postgresql\pgpass.conf. The format is straightforward:

*:*:*:username:password

Make sure you have the latest pyrseas/testutils.py. It includes a change to make the tests portable to Windows. You’re now ready to give all the tests a whirl:

python tests\dbobject\__init_py

or you may prefer to use test discovery:

python  -m unittest discover -s tests/dbobject

One of the first issues you’ll notice is when running test_tablespace.py, which exercises support for PostgreSQL tablespaces. According to the documentation, “tablespaces can be used only on systems that support symbolic links” (aside: thanks to Andres Freund and Merlin Moncure for answering questions about this on IRC).

It seems Windows Vista (or even XP) may have something akin to symbolic links (junctions?). Nevertheless for Pyrseas, it is sufficient to create directories for the tablespaces, e.g.,

C:\>md c:\somedir\pg\9.1\ts1
C:\>md c:\somedir\pg\9.1\ts2

However, when you try to define the tablespace from psql, you’ll probably see:

postgres=# CREATE TABLESPACE ts1 LOCATION E'C:\\somedir\\pg\\9.1\\ts1';
ERROR:  could not set permissions on directory "C:/somedir/pg/9.1/ts1": Permission denied

The problem is that the directory needs to be owned by the postgres user. Unfortunately, there is nothing equivalent to chown on native Windows XP Home. Andres had mentioned cacls and after some web searching and trying, I came up with the following commands as the nearest approximation to chown:

cacls c:\somedir\pg\9.1\ts1 /E /G postgres:F
cacls c:\somedir\pg\9.1\ts2 /E /G postgres:F

These give the postgres user FULL owner privileges over the directories. If you invoke the CREATE TABLESPACE statements again, then they should succeed … Well, at least on PostgreSQL 8.4, 9.0 and 9.1, the cacls commands allow the tablespaces to be defined. However, under 9.2 the “Permission denied” error persists (still haven’t figured out why—if you know why, do leave a comment).

Update: Thank to alemarko’s comment below, I figured out that for PG 9.2, assuming you installed with the defaults, the correct incantations  to use are:

cacls c:\somedir\pg\9.2\ts1 /E /G networkservice:F
cacls c:\somedir\pg\9.2\ts2 /E /G networkservice:F

Testing Python and PostgreSQL on Windows, Part 3

As a commenter mentioned in response to Part 2, an alternative to using pip install psycopg2, which requires that you first install VC++ 2008 Express, is to download and install the Windows port, aka win-psycopg. Jason Erickson makes these builds available for several versions of Python for both 32- and 64-bit Windows.

If you’re not planning to use virtualenvs or tox (which creates virtualenvs for you), then the win-psycopg installer is the easiest way to satisfy the psycopg2 dependency (in fact, that’s how I started after the initial failure with pip). Simply download the Python 2.7 and 3.2 installers, run them and you’re done. However, the installers don’t work in a virtualenv (there is a workaround to extract the files, but I didn’t explore it because I wanted to use tox).

Another option is to build psycopg2 with the MinGW compiler. Daniele Varrazzo has a post describing this. Daniele used a special MinGW package, but I chose to install the latest (mingw-get-inst-20120426.exe) from MinGW.org (click on the Navigation – Downloads link which will take you to SourceForge).

To get the psycopg2 sources, you can download the .tar.gz package, or, since you have Git, do this from Git Bash:

git clone git://luna.dndg.it/public/psycopg2.git
cd psycopg2
git checkout 2_4_5  # or latest tag

Create a pydistutils.cfg file in your home directory (%USERPROFILE%) with the following (an alternative is to use the --compiler option to the python setup.py command below):

[build]
compiler=mingw32

Make sure MinGW, Python, and PostgreSQL are in your PATH, e.g., set PATH=C:\MinGW\bin;C:\Python27;C:\Program Files\PostgreSQL\8.4\bin;%PATH%, and then run:

python setup.py build_ext build

With the latest MinGW and unless Python bug 12641 has been resolved, you’ll see the following error message:

cc1.exe: error: unrecognized command line option '-mno-cygwin'

The workaround is to edit the cygwincompiler.py file in your Python Lib\distutils directory and remove all instances of -mno-cygwin. Hopefully after that, the python setup.py above will work and then you can run the following to install it:

python setup.py install

Rinse and repeat for the Python 3.2 version and you should be ready to test connecting from both Python versions to PostgreSQL.

Aside: To remove the pip-installed psycopg2, you run pip uninstall psycopg2 from the corresponding Python environment. To remove win-psycopg, you use Control Panel’s Add or Remove Programs and click Remove on the desired version. To remove the versions installed with MinGW, I’m afraid you’ll have to resort to deleting the Lib\site-packages\psycopg2 directories and the related psycopg2-*.egg-info files.