Testing Python and PostgreSQL on Windows, Part 2

In the previous post, I covered installation of Git, PostgreSQL and Python under Windows in order to set up a Pyrseas testing and development environment. Today, we’ll explore installation of the Python dependencies.

The Hitchhiker’s Guide to Python recommends first downloading and running the distribute_setup.py script. This gives you the easy_install command but the Guide recommends installing pip (with easy_install pip) and then using pip to install all other modules.

You can use pip to install pyyaml with the following command:

pip install pyyaml

However, if you try pip install psycopg2 (or even easy_install psycopg2), it’s very likely you’ll see the error:

error: Unable to find vcvarsall.bat

As best as I’ve been able to determine the only way to get around this is by installing Microsoft Visual Express Studio. According to this email and this post, for Python 2.7, it must be the 2008 Express Studio which, to make things interesting, is no longer available from the download links given. If you search enough you may find it here (download vcsetup.exe) (Update below). After installing VC++ 2008 Express (and if you haven’t installed Strawberry Perl—a later installment in our saga), the pip install psycopg2 command should succeed.

However, if you try to import psycopg2 at the Python 2.7 prompt you may be surprised with a traceback ending in:

    from psycopg2._psycopg import BINARY, NUMBER, STRING, DATETIME, ROWID
ImportError: DLL load failed: The specified module could not be found.

Ahh … the mysteries of Windows DLLs. Don’t despair: this probably means you don’t have the PostgreSQL DLLs (libpq.dll in particular) in your PATH. Add one of the postgres\x.x\bin directories to your PATH and (hopefully) you should then be able to connect from Python 2.7 to your PostgreSQL installations.

OK, let’s turn our attention to Python 3.2. If you followed the Hitchhiker’s Guide instructions previously and added C:\Python27 to your PATH, you’ll now have to change that to C:\Python32. Suggestion: create a couple of batch scripts, e.g., env27.bat and env32.bat, so you can easily switch between the two Python installations. And don’t forget to add the postgres\x.x\bin directory as well.

For 3.2, once again run the distribute_setup.py script, easy_install pip, and pip install pyyaml, as for 2.7 above. Then you can run pip install psycopg2, and if you installed VC++ previously, the gods may smile upon you and you may see the following message:

Successfully installed psycopg2
Cleaning up...

At this point, if you followed along, you’ll have four versions of PostgreSQL (8.4 through 9.2), two versions of Python (2.7 and 3.2), each with PyYAML and psycopg2, ready for testing. If you’re anxious to check things out, invoke one of the PATH setup scripts and try the following, from the Pyrseas source directory:

set PYTHONPATH=%USERPROFILE%\src\Pyrseas
C:\...\src\Pyrseas>python tests\dbobject\test_schema.py
............
----------------------------------------------------------------------
Ran 12 tests in 1.452s

OK

There are some alternatives to installing psycopg2 using pip and VC++ 2008.  I’ll cover those in a subsequent post.

Update: Microsoft seems to keep changing download URLs. Your best bet is to search for “Visual C++ 2008 Express download.” Currently, that should lead you to the following download link.

Testing Python and PostgreSQL on Windows – Basics

In my previous post, I wrote:

Although I have not yet personally run the [Pyrseas] unit tests on Windows …, I believe the tox setup should be quite portable …, since the tests only depend on Python and psycopg2 being able to connect to Postgres, i.e., they do not depend on running any PG utilities from the command line.

Several moons ago, I had done a cursory test of the Pyrseas utilities on Windows from a source zip file, but now I wanted to set up a full development environment (well, almost full—I used Notepad for minor editing) and run through all the unit tests on as many Python/PostgreSQL combinations as possible and ideally using tox.

This post describes what I found out during the install/test process. Hopefully others will find it useful.

Operating System

I chose to use Windows XP Home Edition running under VirtualBox. It’s not a professional solution, but I wasn’t prepared to pay for the “privilege” of using Windows and it’s likely others also have a home edition CD or similar media from an earlier hardware purchase.

Version Control

Pyrseas sources are stored on GitHub. Installing Git and cloning the repository was probably the most uneventful step. The Git download page gives you an installer which offers three options. I chose “Use Git Bash only” as this appears to be the most friendly to someone coming from a Linux/Unix environment. It doesn’t change nor does it require you to change the PATH, all you need to do is select “Git Bash” from the Start menu and a Bash shell is opened for you.

DBMS

Installing PostgreSQL was fairly straightforward. The Windows download page leads to EnterpriseDB one-click installers for multiple platforms and for the more recent versions you have to choose between 32-bit and 64-bit systems. The installer asks for an installation directory, data directory, postgres user password, port number and locale, offering defaults except for the password.

The installer installs both the DBMS and pgAdmin III. If you’re more comfortable with psql, you can select “SQL Shell (psql)” from the Start menu. With either the latter or pgAdmin, you won’t have to change PATH, unless you want to run psql or some other PostgreSQL utility from a Command Prompt window.

Python

Installing Python can be done from Windows MSI installers available from Python.org for the latest releases of Python 2.7 and 3.2. Aside from specifying the installation directory, you’re given the choice of additional components to install, e.g., Tcl/Tk, documentation.

The installers provide a “Python (command line)” option from the Start Menu, but for testing or development, you’ll probably want to open your own Command Prompt window, in order to customize your setup. This requires that you add, e.g., C:\Python27 and C:\Python27\Scripts, to your PATH. Alternatively, you could use the Git Bash window to stay within a Unix-like environment (in which case you’ll still have to add the equivalent directories, e.g., /c/Python27, to PATH).

So far so good. A forthcoming post will cover more, shall we say, entertaining topics.

Testing Python and PostgreSQL on Multiple Platforms

I’m working on making the Pyrseas functional tests portable enough so that they can be submitted to the repository.

Until now, these tests —which exercise dbtoyaml and yamltodb directly— existed as Linux shell scripts. Briefly, each test runs both dbtoyaml and pg_dump -s on a source database creating YAML and SQL dump outputs, respectively. Then it runs yamltodb on a second dabase to recreate the source tables, etc., and finishes by comparing the first pg_dump ouput to that from the target database to verify that all database objects are present and identical.

The Pyrseas unit tests now use tox which makes it fairly easy to add new platforms. For example, on the Python side, the tox.ini configuration includes 2.7 and 3.2, using a single virtualenv for each version.  It would be easy to add 2.6 or 3.3 (when that is released or from a 3.3.0 rc1 install). To test against Postgres 8.4, 9.0, 9.1 and recently 9.2rc1, the only requirement is to define environment variables PG(84|90|91|92)_PORT with the port numbers used for those Postgres installations. Then tox takes care of running the tests eight times, using each Python/Postgres combination.

Although I have not yet personally run the unit tests on Windows or Linux/Unix variants other than Debian, I believe the tox setup should be quite portable (assuming multiple Postgres installations can be present on a given platform), since the tests only depend on Python and psycopg2 being able to connect to Postgres, i.e., they do not depend on running any PG utilities from the command line.

For the functional tests, running the Pyrseas utilities can be done in a fairly portable way thanks to the os.path, tempfile and subprocess modules. Even the diffing of the pg_dump output can be implemented without having to worry about the presence of a diff command, e.g., on Windows.

However, executing pg_dump against multiple Postgres clusters is not so easy. On Debian (and presumably Ubuntu and all other Debian derivatives), if installed from Debian packages, Postgres utilities can be invoked, for example, as

$ pg_dump --cluster 9.1/main -s pyrseas_testdb

The --cluster option causes the correct executable, e.g., /usr/lib/postgresql/9.1/bin/pg_dump to be run and using the correct port. This translates to Python as:

subprocess.call(['pg_dump', '--cluster 9.1/main', '-s',
                targdump, TEST_DBNAME])

The second element in the list could be provided programmatically to run the tests against various Postgres versions, but it would only work on Debian, Ubuntu, etc. (it also assumes the default cluster installation location).

For Red Hat variants, *BSD or Windows, the only solution I could come up with is requiring the existence of a shell script or .bat file with a set name, e.g., pg_dumpXX, where XX is the PG version number, somewhere along the PATH, to point to the right executable. That is not ideal so I’d appreciate hearing from others who may have dealt with similar issues.

An Apology to Roberto Alsina

This post is off-topic from the main topics of the blog. It is written to defend from an accusation.

Yesterday I read through Roberto Alsina’s post where he described the inequities caused by the government of the city of San Isidro, province of Buenos Aires, Argentina, imposing a municipal tax on its residents in order to pay for various “services”, and further rules/laws at that and other government levels. This was done in the context of exploring game theory using Python.

The scenario is mostly anchored in reality (aside: USD100 monthly tax per house is probably not close to actual facts considering that Argentine average gross salary is about USD10,000 per year, but that can be forgiven since 100 is a nice round number). Roberto gave me the impression of siding with the poorer homeowners who sometimes cannot afford the tax and have no other alternative than to sell their residences (or suffer less palatable consequences).

However, I thought Roberto’s characterization of the city government as an impersonal player (“the city” playing solitaire as if nothing mattered) was an inaccurate representation. That led to an exchange of comments which ended with Roberto accusing me of comparing him to the pigs in Animal Farm and blocking me from further participation.

Therefore, I’d like to say that I never intended any disrespect. Animal Farm is an allegory for society. There are some people in society that believe they are “more equal than others” and as a result believe that they have the “right” to tell others what to do, e.g., force them to pay for some collective good, when it would be to the benefit of those less well-off to find some other way to deal with those needs, e.g., set up a neighborhood-level cooperative, barter for services, etc.

In a private email, Roberto answered that he does not believe some people have that “right” and I have to take him at his word, but he declined to further debate. Although I respect Roberto’s technical insights, e.g., his work on Nikola, and apologize for the implications of my comment which led to the accusation, I still hold that those who side with government taxation are, unintentionally, unknowlngly or unconsciously, agreeing with that “right”.

PostgreSQL Indexes on Expressions

Pyrseas had its first release a little over a year ago and we now have our first backward compatibility issue. The first release included basic support for traditional indexes, i.e., one or more key columns. For example, given a table test1 with columns col1, col2 and col3, and an index on the last two, dbtoyaml would show (some details omitted):

table test1:
  columns:
  - col1
  - col2
  - col3
  indexes:
    test1_idx:
      columns:
      - col2
      - col3

One of the first issues reported the lack of support for “functional” indexes. I added that but unfortunately, didn’t realize that one can have more than one function or expression and even mix regular columns with expressions. Thus the support was limited to a single expression. Given the first example in the “Indexes on Expressions” documentation, dbtoyaml would show:

table test1:
  columns:
  ...
  indexes:
    test1_lower_col1_idx:
      expression:
        lower(col1)

The original issue was recently re-opened (thanks, Roger) to point out the deficiencies. A fix has been pushed. Thus in the next release, dbtoyaml will support indexes with multiple expressions and even combinations of functions and regular columns. Here is a weird example using the first table. Given CREATE INDEX test1_idx ON test1 (btrim(col3, 'x') NULLS FIRST, col1, lower(col2) DESC), dbtoyaml now outputs:

table test1:
  indexes:
    test1_idx:
      access_method: btree
      keys:
      - btrim(c3, 'x'::text):
          nulls: first
          type: expression
      - col1
      - lower(col2):
          order: desc
          type: expression

So instead of ‘columns’ (or ‘expression’), dbtoyaml outputs ‘keys’. Any key that is an expression is marked with the ‘type’ qualifier.To allow for backward compatibility, yamltodb will continue to accept ‘columns’, so existing YAML specs  with traditional indexes won’t need to be changed. However, if you have an index using an expression, you’ll have to edit as seen above.

Do you have an unusual index?  Try dbtoyaml (from GitHub) on it and let us know if it works (or not).

PostgreSQL Extensions and Pyrseas

Prompted by Peter Eisentraut’s blog post, I’ve finished adding support for PG 9.1 EXTENSIONs to the Pyrseas dbtoyaml and yamltodb utilities. For now, this is only available on GitHub.

In order to deal with procedural languages, which are now created as extensions, the utilities now fetch the pg_catalog schema (previously deemed uninteresting for the purpose of version control).  The output of dbtoyaml from a freshly created 9.1 database (assuming no customizations via template1) is now:

schema pg_catalog:
  description: system catalog schema
  extension plpgsql:
    description: PL/pgSQL procedural language
    version: '1.0'
schema public:
  description: standard public schema

This could be changed easily to exclude pg_catalog (which will now also appear against 8.4 and 9.0 databases) before the next Pyrseas release. Update: The pg_catalog schema will now only be shown if it has something other than a description.

I’m hoping some brave, adventurous or simply interested souls will help test the additions.  Please report any issues on GitHub.

Pyrseas PostgreSQL features: feedback requested

I’ve been considering the missing features of dbtoyaml/yamltodb.  Two of those are PG 9.1 features:  COLLATIONs and EXTENSIONs.  I plan to cover them eventually, but I think I ought to deal first with the remaining pre-9.1 features.

ROLEs (as well as USERs and GROUPs) and TABLESPACEs are not output by pg_dump (the equivalent of dbtoyaml), only by pg_dumpall.  I’m thinking that if I were to add support for ROLEs and TABLESPACEs I’d probably do it with a --cluster option to dbtoyaml, and the output would be something like the following:

database postgres:
  role one:
    createdb: true
    login: true
  role grp:
    roles:
      - one
 tablespace dataspace:
    location: /data/db

This approach could, in theory, produce output for all databases in a cluster, i.e., the databases would be the top nodes in the YAML spec, rather than the schemas as is normally the case. In other words, it would be the equivalent of pg_dumpall --schema-only. However, I suspect that few persons would be interested in that, at least for version control purposes—since different databases may belong to different projects.

On the other hand, I believe DBAs may want dbtoyaml to include “owner” and privlege (GRANT) information. David Fetter specifically asked for GRANTs saying they would be “handy for deployments.”

Owner and privilege information could be shown as follows:

schema public:
  table film:
    owner: jma
    privileges:
      admin:
        - insert
        - update
      jma:
        - all
      viewer:
        - select

An open question is whether some list of roles is necessary, aside from the object-level information.

I’d appreciate readers taking a couple of minutes to leave feedback on any of the above points, particularly on how important they think the features are in their day-to-day work.

Musings on Python, Postgres and other species

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: