7

The Future of Pyrseas: Part 3

The second Andromeda feature, aside from schema version control, that Pyrseas wanted to rescue was the automatic generation of search and detailed editing screens.

It is somewhat amazing that 40 years after Moshé Zloof’s “Query by Example” and over 30 years after “A Forms Application Development System” by Prof. Larry Rowe and Kurt Shoens, most applications for searching, entering and maintaining data are still reinventing the wheel.

There are of course apps such as PgAdmin and Adminer that can be used for general querying and editing, but their target audience is developers and DBAs, not end users. Furthermore, these tools typically only allow querying a single table at a time (see QBE queries #3 and subsequent, in the aforementioned paper, for examples that usually can’t be handled).

At first, I thought I would develop a typical Python WSGI application. I planned to use Werkzeug because the extra machinery in larger frameworks such as Django is unnecessary for the envisioned purpose. An additional consideration was that a smaller library or framework would make it easier for anyone wanting to extend the automatically generated pieces into a full-fledged app.

I started designing a prototype application and also exploring other WSGI contenders. Then Jacob Kaplan-Moss and Tarek Ziadé jolted my thought process: a JavaScript-based framework was the way forward for a web-based app with Python (and Werkzeug) relegated to a REST API server.

So I took a look at AngularJS, Ember.js, Backbone.js and several other JavaScript frameworks and libraries. At this time, it seems a simpler library such as Backbone.js, or perhaps the Marionette framework, may be more appropriate. It is comforting to see that PgAdmin4 has chosen a similar path for its web-deployed client.

This capability is ultimately quite ambitious, so perhaps Pyrseas ought to stick just to schema version control. Nevertheless, whether integrated with Pyrseas or not, this functionality ought to re-use the database-catalog-to-Python-structures-to-YAML (and vice versa) features developed for Pyrseas.

This concludes this three-part series. Please also see part 1 and part 2.

2

The Future of Pyrseas: Part 2

When I started working on Pyrseas, I reviewed several other products. Robert Brewer’s Post Facto was probably the one with the most unique design. Although it compared database schemas in order to generate SQL to synch them up, it did not store database object definitions in a standard VCS repository. Rather, it used a Postgres database as the repository analog.

While Post Facto’s design certainly influenced Pyrseas, there is one aspect of  the former that, unfortunately, I did not emulate.

The Dependables

As any developer knows, database objects have dependencies on each other: table A has a primary key PK1, table B is declared with a foreign key dependent on PK1, function C is dependent on type X, view D is based on table A and includes a call to function C.

Pyrseas currently deals with these dependencies in an object-specific manner. For example, it does at least two passes through pg_class objects (tables, views, sequences, etc.) in order to create, alter or drop these objects in the correct order. However, this ad hoc approach can result in incorrect sequencing of generated SQL statements in some cases, particularly those like view D above.

The missing feature from Post Facto that avoids this conundrum? If you answered topological sort you were obviously paying attention in your Algorithms class. If you didn’t, may I suggest chapter 15, “Devising and engineering an algorithm: Topological Sort” of Bertrand Meyer’s Touch of Class.

Daniele’s Quest

Over two years ago, someone opened an issue about the need to create primary keys before creating views. Later, Daniele Varrazzo reported another issue with dependencies.

Many of you Postgres users will recognize Daniele as the maintainer of Psycopg, the popular Python PG adapter, which of course is used by Pyrseas.  Daniele and I chatted online, I mentioned Post Facto’s solution and he, fortuitously and generously, started implementing a topological sort on a deptrack branch of Pyrseas.

We then collaborated for about eight months. He did most of the initial coding and I ran tests and fixed some issues. Unfortunately, Daniele is very busy, with a full-time job, Psycopg and other interests, so the work came to a near standstill.

Where We Stand

The last time changes were submitted to the deptrack branch, about six months ago, only four tests failed (out of over 600) running on both Python 2.7 and 3.4 against Postgres 9.3. Regrettably, three of those tests are integration and functional tests, so correcting those is critical to adding this feature.

In addition, although most tests complete successfully, the run times have been impacted severely. This will require an effort at re-optimizing performance before releasing the changes. Last but not least, the implementation needs internal documentation so that it can be properly maintained.

Sadly, I have not had much time or incentives to address these shortcomings. Are there any Pyrseas, Postgres or Python enthusiasts looking for a challenge?

2

The Future of Pyrseas: Part 1

In the early days of the Pyrseas project, I read about some open source projects being abandoned due to their developers losing interest or perhaps becoming involved elsewhere and thought to myself “That could never happen to me.”

Sadly, this blog has remained silent for over two years, and I haven’t done substantive development work since last September.

Still, some remain interested in Pyrseas, if gauged by the latest two issues: inquiring about consolidating/updating our documentation and porting dbtoyaml/yamltodb to other databases. So it’s appropriate that we discuss in what directions the project may turn.

Some background first.

Pyrseas: Origins

Pyrseas was born from my experience with making structural changes to SQL databases, version control systems, the typical incompatibility between the two and the solution suggested by Andromeda.

Andromeda had a fairly simple concept: describe your database using YAML and use that to drive the process of generating SQL to make structural modifications (also, use the YAML spec to produce a basic CRUD app to make content changes—see Part 3).

The Pyrseas innovation was: rather than manually edit a YAML spec, why not create it from the Postgres catalogs? In addition, instead of using the information_schema catalog views so that the process could be nominally portable to a few other SQL databases, we opted to query the internal catalogs directly.

The Imaginarium of Peter Eisentraut

In response to a post regarding another tool that diff’d database versions, Core Team member Peter Eisentraut commented that something that “doesn’t support all PostgreSQL features” is not helpful “to help you manage your database upgrades, because it might miss important details.”

That made us reconsider the scope of Pyrseas which initially was to be limited to tables, views and perhaps functions. We decided to address the vast array of Postgres features and some releases later we managed to achieve that goal, for the most part.

A post about the proper place of business logic then led to a collaboration with Roger Hunwicks to augment the Pyrseas tools. Another discussion with Josep Martínez resulted in a preliminary capability to copy predefined data into the database.

Lilo & Sqitch (or Is diffing database versions sufficient?)

Although my Pyrseas development activity has been limited recently, I’ve continued to use Pyrseas to assist in my DBA and database development tasks. I’ve thus come to the conclusion that: no, in the most general case, diffing database versions is not sufficient to manage structural changes.

A simple example based on my original proof-of-concept schema can serve to validate this assertion. Suppose we wanted to modify the release_year column to release_date and use a DATE datetype. The RENAME may be made to work but the datatype change will require some specialized handling (e.g., an external script) that cannot be codified in a generalized “difference” approach.

When I first reviewed the other SQL schema versioning tools, Liquibase was the main exponent of the non-differencing camp (and it still seems to be going strong). About a year after that, David Wheeler came out with the first version of Sqitch and thanks to David I was able to experiment with it.

My main objection to Sqitch and Liquibase is that for the most common use cases, e.g., add a table, add a column, rewrite a view or function, etc., it seems unproductive for a DBA or developer to do the work more than once, i.e., in a development database, either via psql, using a design tool such as PgAdmin or, in some simple cases, even by editing the YAML spec directly. Recreating the SQL DDL and applying it –in the correct order– to another development, test or production database should be automated. The generated SQL should take into consideration the state of the target and, if it becomes necessary, should include generating SQL to backout changes, e.g., drop table, alter table drop column, etc., without any extra DBA or developer intervention.

Conclusion

My proposal to address the insufficiency of diffing database versions is to incorporate some of the ideas of the non-differencing approaches into yamltodb. The solution may be similar to what was done for dbaugment, i.e., add specialized scripts or configuration files that can control the additional processing. Admittedly, this is still very vague and will probably be third in terms of priorities, although I chose to discuss it first.

5

Most Liked and Disliked Programming Languages

Vasudev Ram pointed readers to a Hacker News poll on the subject. While the raw numbers per language are interesting, I think the percentages of Like and Dislike vs. the total votes cast for a given language are perhaps a better metric.

Thus the five most liked languages based on raw votes were:

  1. Python
  2. C
  3. JavaScript
  4. Ruby
  5. SQL

And the five most disliked languages were:

  1. PHP
  2. Java
  3. C++
  4. JavaScript
  5. Visual Basic

It’s rather interesting that JavaScript is on both lists (and I’m quite surprised that SQL had so many votes).  I haven’t included the actual numbers since the poll is still active.

Ranking the languages by number of Like votes as percentage of total votes for that language gives a perhaps more realistic picture:

C 88%
Python 86%
Scheme 84%
Lua 84%
Lisp * 81%
Haskell 80%
Rust 79%
Clojure 78%
Erlang 76%
Go 75%

(*) Dimitri Fontaine will appreciate this.

I believe those rankings will be more stable than the raw votes. Oh, and SQL ranks about 14 according to these percentages.

For completeness, here are the five most disliked languages based on ratio of Dislike votes to total votes for the language:

Cobol 94%
ColdFusion 94%
Visual Basic 89%
Actionscript 83%
PHP 76%

As they say, YMMV.

16

Multisets and the Relational Model

In a comment to my previous post, David Fetter challenged me to “find a case for multisets. That we’re stuck with them doesn’t mean they’re useless.” My response was that I couldn’t help him because multisets (or bags) are not part of the relational model (which was the point of my post) and asked David to show me an example of a multiset he’s stuck with so that we could discuss it.

While waiting for his response, I read an article titled “Toil and Trouble” by Chris Date, which was originally published in Database Programming and Design, January 19941, where he tackled the issue of duplicate rows and multisets. Chris opened by stating that duplicates “are, and always were, a mistake in SQL” (and nearly 20 years later the mistake has not been corrected).

In the article, Date makes a number of points against duplicates and multisets but perhaps two of the best are the following:

  1. When considering the collection (3, 6, 6, 8, 8, 8, 11) versus the set {3, 6, 8, 11} we have to distinguish between the two 6’s by saying “the first 6″ or “the second.” Date then points out that “we have now introduced a totally new concept, one that is quite deliberately omitted from the relational model: positional addressing. … we have moved quite outside the cozy framework of relational theory … [and] there is no guarantee whatsoever that any results that hold within that framework still apply.”
  2. In response to a claim by David Beech that “mathematicians deal with such collections, called multisets or … bags” and therefore that a model with duplicate rows is at least mathematically respectable, Date says:

“… all of the mathematical ‘bag theory’ treatments I’ve seen start off by assuming that there is a way to count duplicates! And that assumption, I contend, effectively means that bags are defined in terms of sets—each bag element has a hidden identifying tag that distinguishes it somehow, and the bag is really a set of tag/element pairs.”

I believe that as programmers it becomes second nature to deal with duplicate items in lists and sequences. Since it is so easy to code a loop to visit each item in turn and apply some processing—in Python you can even use built-ins or functions from itertools, that we frown on a system that, at least in theory, insists on removing duplicates and dealing only with proper (mathematical) sets. However, we should realize that the theory, as Date says, is practical: by keeping the duplicates we lose, for example, the benefits of relational normal forms and certain optimization techniques.

In closing, Date presents the following parts and shipments database:

P  pno │ pname        SP   sno │ pno 
  ─────┼────────          ─────┼─────
   P1  │ Screw             S1  │ P1 
   P1  │ Screw             S1  │ P1  
   P1  │ Screw             S1  │ P2 
   P2  │ Screw

And considers the query “List part numbers for parts that either are screws or are supplied by supplier S1, or both.” He then presents 12 candidate SQL formulations, which someone ran for him against SQL Server 4.2 on OS/2.  I thought it would be instructive to run them against Postgres 9.3, so here they are.

SELECT pno
FROM   p
WHERE  pname = 'Screw'
OR     pno IN
     ( SELECT pno
       FROM   sp
       WHERE  sno = 'S1');

Result: 3 P1, 1 P2

SELECT pno
FROM   sp
WHERE  sno = 'S1'
OR     pno IN
     ( SELECT pno
       FROM   p
       WHERE  pname = 'Screw');

Result: 2 P1, 1 P2

SELECT p.pno
FROM   p, sp
WHERE  ( sno = 'S1' AND
         p.pno = sp.pno)
OR       pname = 'Screw';

Result: 9 P1, 3 P2

SELECT sp.pno
FROM   p, sp
WHERE  ( sno = 'S1' AND
         p.pno = sp.pno)
OR       pname = 'Screw';

Result: 8 P1, 4 P2

SELECT pno
FROM   p
WHERE  pname = 'Screw'
UNION  ALL
SELECT pno
FROM   sp
WHERE  sno = 'S1';

Result: 5 P1, 2 P2

SELECT DISTINCT pno
FROM   p
WHERE  pname = 'Screw'
UNION  ALL
SELECT pno
FROM   sp
WHERE  sno = 'S1';

Result: 3 P1, 2 P2

SELECT pno
FROM   p
WHERE  pname = 'Screw'
UNION  ALL
SELECT DISTINCT pno
FROM   sp
WHERE  sno = 'S1';

Result: 4 P1, 2 P2

SELECT DISTINCT pno
FROM   p
WHERE  pname = 'Screw'
OR     pno IN
     ( SELECT pno
       FROM   sp
       WHERE  sno = 'S1');

Result: 1 P1, 1 P2

SELECT DISTINCT pno
FROM   sp
WHERE  sno = 'S1'
OR     pno IN
     ( SELECT pno
       FROM   p
       WHERE  pname = 'Screw');

Result: 1 P1, 1 P2

SELECT pno
FROM   p
GROUP  BY pno, pname
HAVING pname = 'Screw'
OR     pno IN
     ( SELECT pno
       FROM   sp
       WHERE  sno = 'S1');

Result: 1 P1, 1 P2

SELECT p.pno
FROM   p, sp
GROUP  BY p.pno, p.pname, sno, sp.pno
HAVING ( sno = 'S1' AND
         p.pno = sp.pno)
OR       pname = 'Screw';

Result: 2 P1, 2 P2

SELECT pno
FROM   p
WHERE  pname = 'Screw'
UNION
SELECT pno
FROM   sp
WHERE  sno = 'S1';

Result: 1 P1, 1 P2

As Date points out, 12 different formulations produce 9 different results!  And as he further states, those are not all the possible formulations. For example, a modern revision of the third query may be:

SELECT pno
FROM   p NATURAL JOIN sp
WHERE  sno = 'S1'
OR     pname = 'Screw';

and the result is yet again different (6 P1 parts and 1 P2).

The bottom line is to be very, very careful when dealing with multisets in SQL.


1 The article was republished in Relational Database Writings, 1991-1994, in Part I, “Theory Is Practical!”

9

Is This Relational?

This post was prompted by Hans-Juergen Schoenig’s Common mistakes: UNION vs. UNION ALL because it touches on one of my pet peeves: the claim that some feature of SQL exemplifies or conforms to the relational model. Schoenig does not make that claim explicitly, but he does state “What [most] people in many cases really want is UNION ALL” and shows the following query and result:

test=# SELECT 1 UNION ALL SELECT 1;
 ?column? 
----------
        1
        1

(2 rows)

There are two relational faults above*. First, UNION ALL is not a relational operator. This is an area where both Ted Codd and Chris Date (and Hugh Darwen), are fully in agreement. In the “Serious Flaws in SQL” chapter of The Relational Model for Database Management: Version 2 (1990) Codd listed duplicate rows as the first flaw and characterized “relations in which duplicate rows are permitted as corrupted relations.” Date concurs and wrote the essay “Why Duplicate Rows Are Prohibited”(in Relational Database Writings, 1985-1989) and (with Darwen) included RM Proscription 3: No Duplicate Tuples in their Third Manifesto, which reads:

D shall include no concept of a “relation” containing two distinct tuples t1 and t2 such that the comparison “t1 = t2” evaluates to TRUE. It follows that (as already stated in RM Proscription 2), for every relation r expressible in D, the tuples of r shall be distinguishable by value.

Needless to say, those two “1”s are not distinguishable unless you talk about “the first 1” and “the last 1,” i.e., ordering, which is also proscribed by the relational model because relations are sets.

Now, the example given is synthetic so I’ll present a more realistic example. Suppose a manager asks “which employees are in department 51 or work on the Skunk Works project?” Let’s assume we have a projects table with columns proj_no (primary key) and proj_name, an emp table with columns emp_no (primary key), last_name, first_name, and dept_no, and aassignments table with columns proj_no and emp_no (both forming the primary key and each referencing the previous two tables, respectively). We’ll first emulate this with a CTE, so we won’t have to create or populate any tables:

WITH emp AS (SELECT 'Ben Rich'::text AS emp_name,
                     51 AS dept_no),
     assignments AS (SELECT 'Ben Rich'::text AS emp_name,
                           'Skunk Works'::text AS proj_name)
SELECT emp_name
  FROM emp
 WHERE dept_no = 51
UNION ALL
SELECT emp_name 
  FROM assignments
 WHERE proj_name = 'Skunk Works';

If you run this in psql, you’ll see two rows with identical values and the manager is going to ask “Do we have two employees named Ben Rich?”  However, in practice the real query will be:

SELECT first_name, last_name
  FROM emp
 WHERE dept_no = 51
UNION ALL
SELECT first_name, last_name
  FROM emp JOIN assignments USING (emp_no)
           JOIN projects p USING (proj_no)
 WHERE p.proj_name = 'Skunk Works';

Unless you change UNION ALL to UNION your result wil contain duplicate rows for employees that satisfy both predicates. However, an alternative formulation without UNION would be

SELECT first_name, last_name
  FROM emp LEFT JOIN assignments USING (emp_no)
           LEFT JOIN projects p USING (proj_no)
 WHERE dept_no = 51
    OR p.proj_name = 'Skunk Works';

This query correctly returns only one row per employee. Admittedly, the query is still somewhat synthetic. In reality, the query may include multiple other columns and several hundred rows may be retrieved and thus the duplicate tuples and the logical error may not be so obvious.

UPDATE: Changed last query to use LEFT JOINs as correctly suggested by RobJ below.


* The second relational fault? The result column is unnamed (something Date and Darwen insist on much more than Codd).

2

ANFSCD: Revisiting the Web Server

Nearly two years ago, I was considering which Python web framework to use for a user interface to Postgres: CherryPy, Flask, Werkzeug? Not entirely satisfied with the choices, I started reviewing even more frameworks thinking I might want to write my own minimalist framework.

Several months later, somebody (through Planet Python, IIRC) referred me to a presentation by Jacob Kaplan-Moss on the history and future of Python on the web. Surprisingly, halfway through the talk Jacob started raving about Meteor, a pure JavaScript framework, saying “we’re deluding ourselves if we think this [something like Meteor] is not the future of web applications.” This prompted me to take a close look at Meteor and several other JS frameworks.

Tarek Ziadé’s “A new development era” essay reinforced this change in direction. Ultimately, I settled on AngularJS as the (client) framework. Two-way data binding, dependency injection and testability are some of the features that won me over.

Angular opened the door to the Node.js world—which appears somewhat chaotic compared to Python’s (and even more to the staidness of Postgres). Like Python, Node.js has an abundance of web frameworks, templating libraries and other tools to choose from (and master). Aside from that, are there any negatives in continuing down this path?

For one, although Angular is an open source project, unlike Python and PostgreSQL, its destiny is controlled by a behemoth. A saving grace is its large community of contributors. And perhaps some of Angular’s innovations may eventually become part of standard HTML.

Second, in spite of Selena Deckelmann’s recent comments on JS and PG, I’m strongly partial to Python and not fond of JavaScript as an implementation language. It’s liberating not to have to use braces (and semicolons) for code structure! To compensate, CoffeeScript appears to be the obvious alternative.

When it comes to interfacing to Postgres, although I haven’t explored it enough to do justice, node-postgres doesn’t seem to be up to par with psycopg, and I’m not about to throw away the work I’ve done on Pyrseas, in particular the TTM-inspired interface. So Werkzeug may still play a part, as a Postgres-Python-to-JSON service, particularly now that it support Python 3. However, for contrast I will use node-postgres in an early implementation.

Last, the Angular team’s choice for “workflow” tool (Yeoman) did not sit well with me: I don’t care for “scaffolding” and my first experience with Grunt rubbed me the wrong way. Fortunately, in the Node.js “chaos” I found Brunch, which although not without problems, looks suitable for my purposes.

Having addressed the negatives, I’ve started work on this at GitHub, and plan to post more about it later on.

Update: Due to the change in direction, I was wondering whether I should also change the title of this blog to something like “Taming Serpents, Pachyderms and White A’s in Red Shields”, but fortunately I discovered that at least O’Reilly uses a rhinoceros as the JavaScript mascot and rhinos are considered pachyderms. 🙂

1

Pyrseas contributions solicited

Do you use PostgreSQL and truly believe it’s “the world’s most advanced open source database” and that its upcoming 9.3 release will make it even more awesome?

Do you also use Python and believe it’s “an easy to learn, powerful programming language” with “elegant syntax” that makes it an ideal language for developing applications and tools around PostgreSQL, such as Pyrseas?

Then we could use your help. For starters, we want to add support for the MATERIALIZED VIEWs and EVENT TRIGGERs coming up in PG 9.3.

We have also been requested to add the capability to load and maintain “static data” (relatively small, unchanging tables) as part of yamltodb, so that it can be integrated more easily into database version control workflows.

And for the next release, Pyrseas 0.7, we’d like to include the first version of the database augmentation tool which will support declarative implementation of business logic in the database–starting off with audit trail columns. Some work has been done on this already, but it needs integration with the current code and tests.

Or perhaps coding is not your forte, but you’re really good at explaining and documenting technical “stuff”. Then you could give us a hand with revamping the docs, maybe writing a tutorial so that users have a smooth ride using our tools.

Or maybe you have your own ideas as to how improve the PostgreSQL version control experience. We’d love to hear those too.

If you’d like to help, you can fork the code on GitHub, join the mailing list and introduce yourself, or leave a comment below.

0

Tuples in the Pythonic, TTM-inspired interface to PostgreSQL

The Third Manifesto formally describes tuple types (RM prescription 6), tuple values (prescription 9), tuple variables (prescription 12) as well as other tuple-related elements. As mentioned in the previous post, a tuple value is a set of ordered triples each consisting of attribute name, type and value.

Class Tuple of the TTM-inspired interface to PostgreSQL models TTM tuples as Python lists of TTM Attribute objects. Lists were used rather than sets because for many practical purposes the order of the attributes is useful (or has “meaning”), e.g., the first attribute listed is most often –even in purist relational theory presentations– the primary key or part of the primary key.

The interface stores the Tuple heading as a (Python) n-tuple of name-type tuples, in the “internal use” _heading attribute. The n-tuple was chosen over a list due to its immutability. The interface also sets each Attribute as a Python attribute of the Tuple object. Thus, if you define a Tuple variable as follows:

film = Tuple([
    Attribute('id', int, sysdefault=True),
    Attribute('title'),
    Attribute('release_year', int)])

You can then assign or access an Attribute using simple Python syntax:

film.title = "Seven Samurai"
if film.year == 1954:
    do something

The interface also stores two other internal use lists, one for nullable attributes and another for attributes that allow default values. These are to be used by upstream classes such as RelVar.

Class Tuple has a __setattr__ method tailored to deal with assignment to TTM Attributes. It disallows assignment to internal attributes after initialization, with one exception: the _tuple_version attribute (used by RelVar for optimistic concurrency). It also doesn’t allow assignment to undefined Attributes, e.g., given the film variable above, attempting to assign to film.length will raise an AttributeError. Finally, the assignment is “filtered” through class Attribute, so that an attempt such as film.title = 8.5 will result in a ValueError from that class.

The pyrseas.relation.tuple module defines a standalone function: tuple_values_dict. This is used to generate a dictionary of attribute values suitable for passing to Psycopg’s cursor.execute method. For INSERT, a single currtuple argument is expected. For UPDATE, the modified Tuple is passed as a second argument and tuple_values_dict will return a dictionary solely for the attribute and values that have changed.

1

Attributes in the Pythonic, TTM-inspired interface to PostgreSQL

The Third Manifesto‘s Relational Model (RM) prescription 9 defines a relation heading as “a set of ordered pairs or attributes of the form <A,T>,” where A is the name of the attribute and T is the name of the declared type of the attribute. It then defines a tuple value (or tuple for short) as a set of ordered triples of form <A,T,v> where v is an arbitrary value of type T, called the attribute value.

Class Attribute of the TTM-inspired interface to PostgreSQL models the latter ordered triple rather than the ordered pair. I considered implementing the pair, say as AttribType, and deriving Attribute from it, but opted for the leaner, no-hierarchy solution for now.

An Attribute object should be initialized with a name, Python type and a value. However, only the name is required. The default type is str and for Python 2, unicode is treated almost as a synonym for str (I realize this goes against everything that your mother taught you, but I’m hoping that widespread Python 3 adoption will make this moot soon).

I wholehearteadly agree with TTM RM proscription 5 banning attributes that do not have a value, i.e., like SQL NULLs. However, I hope the interface is useful on existing databases, so in the interest of practicality, class Attribute has two additional, optional arguments: nullable and sysdefault. The former is to specify that an attribute allows NULLs (or None in Python). The latter can be used to indicate the corresponding table column has an SQL DEFAULT specification (this includes those defined as SERIAL or BIGSERIAL).

If an attribute is not nullable and not sysdefault, a value must be specified. If omitted, a suitable default is created based on the type (if possible), i.e., an empty string for str, 0 for int, 0.0 for float. If the attribute is nullable, empty string, 0 or 0.0 are converted to None. This approach is to facilitate dealing with empty HTML form fields, i.e., if the user skips a nullable field, the attribute should end up as a NULL in the database.

If a value is provided, the code raises ValueError if the value does not agree with the specified (or defaulted) Python type. The only exceptions are that an int value is cast to float, and a unicode value is allowed if the type is str and we’re working in Python 2.x.