Design Notes on Database Application Development

19 December 2011 | jma

I recently took a brief look at web2py since its Database Abstraction Layer (DAL) sounded interesting. After following the tutorials in the Overview chapter, I ended up not delving into DAL. Instead, I revisited Ruby on Rails which I had last looked circa 2005 when its support for PostgreSQL was rather weak. Although that has improved, the basic approach of Rails hasn’t and is very similar to web2py’s as well as Django.

The essence of these full stack frameworks is that the developer —or should I say programmer— knows how to design classes and that that design can be transformed into a relational database design mechanically and flawlessly. Maybe I’m biased because my first involvements in software engineering were in the areas of IT system design, but the latter premise seems like a very weak foundation for developing robust systems.

Maybe this also explains, in part, why NoSQL solutions have cropped up and found a following. It’s much easier —albeit not reliable or sound— to design a system when you “liberate” yourself from the “straitjacket” constraints of a database design (aka schema). However, in my experience, most systems beyond the blog/wiki/toy database examples used in tutorials benefit from specialized data analysis and modeling that result in a formal database design.

If the database precedes the application development, it appears the Rails/Django/web2py approach is back asswards. If the database is available during development, doesn’t it make more sense to re-use the database design effort in creating the application rather than try to repeat it with Python or Ruby classes?

The Pyrseas project, in its quest to rescue Andromeda, plans to implement a utility to help in database application development (tentatively named dbappgen but that may change). The utility will start off by connecting to a database and will allow the developer to create the application based on the tables and other objects present in it, not the other way around. If database changes are needed, they can be made there first, not by imposing some application-induced “migration” (as web2py does).

Since Pyrseas’ dbtoyaml utility already outputs a YAML/JSON description of a database, it may even be possible to conduct an application development exercise, at least in part, without connecting to the database. For example, if the developer invokes dbappgen moviesdb, the utility could first check whether a file named moviesdb.yaml (or with similar extensions) exists in the current directory and if so, read it and recreate in memory the PostgreSQL catalogs, thus allowing “offline” development.

In other news, due to the recent discussions regarding Python 3, my hopes that Werkzeug will be ported soon have been dashed. It appears it’s either use WebOb or write my own framework (which ironically is what Armin recommended).

26 thoughts on “Design Notes on Database Application Development”

preciousbodilyfluids says:

19 December 2011 at 02:55

Before you give up on Ruby, try replacing ActiveRecord with Sequel (http://sequel.rubyforge.org/). It’s an absolutely wonderful library. It supports composite primary keys, window functions, schemas, CASE statements, all seamlessly. The author/maintainer is a big fan of Postgres, too, and also maintains a C extension that greatly speeds up the conversions of Postgres results to ruby objects (https://github.com/jeremyevans/sequel_pg).

Reply
- jma says:
  
  19 December 2011 at 06:38
  
  Thanks for your thoughts (I will take a look) but I’m afraid I gave up on Ruby a long time ago (it seems). My short re-incursion into Rails was simply to have a current, informed opinion. I have invested several years on Python and feel much more comfortable with it than with Ruby. Plus, I’d have to change the blog title: “Taming Rubies (Gems, Manga Girls?, Foxes?) and Pachyderms” don’t sound right 🙂
  
  Reply
ionrock says:

19 December 2011 at 04:19

CherryPy works on Python 3! It beats developing your own framework.

Reply
- jma says:
  
  19 December 2011 at 06:20
  
  I was going to use CherryPy and I may still suggest or support use of its standalone server, but I’ve become disenchanted with some aspects of it. However, I’m still using it for another (personal) project.
  
  Reply
  - jma says:
    
    19 December 2011 at 07:26
    
    … but maybe Rene’s panegyric will change my mind 🙂
rorycl says:

19 December 2011 at 06:01

I don’t believe that the objects in a web framework should map to the database, at least, not in a static way. The ability of SQL to generate novel result sets from a small number of tables means that there is an intrinsic mismatch between objects and relational result sets.

There can be some useful mapping between objects and the database when data is inserted into a database (at least, in the simplest cases). When data is returned from the database (other than to populate forms for the abovementioned) objects can be constructed at runtime to help render them via templating. An introduction to this approach is here: http://campbell-lange.net/company/articles/dbwrapper/

Reply
mike bayer says:

19 December 2011 at 10:26

You’re spot on that relational database design should be done up front, and the application be designed to talk to it, rather than the other way around. This is the entire philosophy and approach of SQLAlchemy, and it’s how I write applications with it. Limitations and pre-defined notions of the Python tooling in use should not be driving your schema design.

Reply
- jma says:
  
  19 December 2011 at 11:31
  
  Hi zzzeek, I almost added a comment about SQLAlchemy in the original post in the context of Pyramid/Pylons. What I like about SQLA is that it provides introspection of existing tables, albeit because of its generic nature it can’t directly support PG-specific types like tsvector. However, I’m reluctant to use its ORM capabilities (while admitting they’re an order better than other alternatives I’ve seen).
  
  Reply
  - mike bayer says:
    
    19 December 2011 at 11:53
    
    The reflection process today pulls in the types in the most specific form allowed, and we have lots of database-specific types within each dialect, as well as database-specific extensions to existing types (such as MySQL’s “collation” keyword on VARCHAR). Where applicable, these are subclasses of a more generic type, but in many cases are standalone types all their own. For PG, we currently have support for such types as ARRAY, BYTEA, INET, CIDR, MACADDR, INTERVAL. tsvector would be a pretty easy add (like two lines to our type registry).
sergiodlc says:

19 December 2011 at 14:04

If the database precedes the application development, it appears the Rails/Django/web2py approach is back asswards. If the database is available during development, doesn’t it make more sense to re-use the database design effort in creating the application rather than try to repeat it with Python or Ruby classes?

For the sake of completeness I must add that at least Django does support working with pre-existent databases. You wouldn’t need to create the clases “by hand”. From Django docs:

Django comes with a utility called inspectdb that can create models by introspecting an existing database. You can view the output by running this command:

python manage.py inspectdb

Save this as a file by using standard Unix output redirection:

python manage.py inspectdb > models.py

Reply
- jma says:
  
  19 December 2011 at 14:39
  
  Hi Sergio, that’s interesting. I see it derives ForeignKey information correctly, but it stlll doesn’t handle mutiple column keys. It maps tsvector to TextField, adding a “# This field type is a guess.” and also doesn’t know about fixed length CHAR types (again mapping them to TextField).
  
  Reply
  - sergiodlc says:
    
    19 December 2011 at 15:30
    
    Off course, nothing is perfect, but that’s a pretty good start point
Pingback: Joe Abbate: Design Notes on Database... | PostgreSQL | Syngu
Pingback: Weekly Link Roundup (weekly) | SiliconChaos
Anthony says:

5 January 2012 at 14:43

“The essence of these full stack frameworks is that the developer —or should I say programmer— knows how to design classes and that that design can be transformed into a relational database design mechanically and flawlessly.”

Note, web2py has a database abstraction layer (DAL), not an ORM. DAL models are not classes designed by the programmer, but instances of a single Table class. There is a one-to-one correspondence between a DAL table and the database table it represents.

“If the database is available during development, doesn’t it make more sense to re-use the database design effort in creating the application rather than try to repeat it with Python or Ruby classes?”

In many cases, the database is developed specifically for the web application. In that case, using the web2py DAL need not involve repeating anything in Python. You simply define your database tables using the DAL, and it creates those tables in the database for you — no need to write any SQL.

If you have an existing database, you may be repeating some schema definition, but web2py provides scripts to automate that process for MySQL (http://code.google.com/p/web2py/source/browse/scripts/extract_mysql_models.py) and PostgreSQL (http://code.google.com/p/web2py/source/browse/scripts/extract_pgsql_models.py).

Reply
- jma says:
  
  5 January 2012 at 15:06
  
  “In many cases, the database is developed specifically for the web application.”
  
  In that case, maybe web2py (or Django or Rails) are OK, but unless the organization using the database doesn’t grow or other areas within the organization don’t want/need to use it, the database will tend to want to “extend its horizons,” at which point the application-specific characteristics will need to be redesigned or will make other usage difficult.
  
  Reply
  - Anthony says:
    
    5 January 2012 at 16:00
    
    Perhaps. What are some examples?
  - jma says:
    
    5 January 2012 at 16:22
    
    If you start, say, with some of the tables in the Pagila database, building a web app to display the movies in your store for selection by customers, and use some external system for order processing and another system for your accounting. And then you want to start integrating some of these into your original movie/customer database, or you acquire/merge with another store that uses a different app, or you start adding other products or services (book, VOD, etc).
- rorycl says:
  
  5 January 2012 at 16:34
  
  I agree that database design can (and often should be) independent of the web application connecting to it. Objects and SQL queries are quite different things and aren’t mappable except in the simplest cases.
  
  I strongly believe that we need to find ways of providing SQL management interfaces, probably based on procedural sql, as the way of negotiating between web apps and the database.
  
  Reply
  - mike bayer says:
    
    5 January 2012 at 17:02
    
    procedural SQL when used in conjunction with a domain model implies that the persistence layer must be hand-coded twice – once within the procedural layer as hand-tailored INSERT/UPDATE/DELETE statements, and once within the application, as hand-tailored marshalling between each individual domain object or group of objects and the stored procedures that happen to be responsible for those objects, or worse among the multiple procedures that interact with those particular objects based on use cases.
    
    Contrast with using a modern object relational mapper such as SQLAlchemy. Here, the schema is designed, a domain model is designed, then an interaction layer based on SQLAlchemy constructs mediates between them. Then the persistence layer is done, with not a single INSERT or line of object marshaling code needing to be written.
rorycl says:

5 January 2012 at 17:19

Simple INSERT/UPDATE/DELETE is fine to be handled by an ORM. Complex INSERT/UPDATE/DELETEs won’t map to objects (for instance may affect several tables) and may have implications for integrity beyond the scope of the particular web app.

A practical solution is to provide certain “operating system” level functions in PL around the database, and for other simple or gui-centric stuff to be handled by the app.

The constraints of object-relational mappers can very easily be out-maneuvered by simple database query return sets. For example, a web page showing a tier of car owners, the cars they own, and the number of previous owners for each car will require most ORMs to to do lots of queries for the inner loops. A single, simple wide result set can be intelligently rendered into such a page using nested runtime objects. This is not ORM, but — to coin a phrase — a middleware to relational “conjoining” which I don’t believe should be automated.

I appreciate from my point of view the data and data structure _is_ the central application. This does not appear to be a common view.

Reply
- mike bayer says:
  
  5 January 2012 at 17:56
  
  > Complex INSERT/UPDATE/DELETEs won’t map to objects (for instance may affect several tables) and may have implications for integrity beyond the scope of the particular web app.
  
  Modern unit of work implementations insert/update/delete from any number of tables at once, provided they are represented by the domain model in some way. Domain representations can be automated – such as if every table “XXX” also needs to INSERT into a table “XXX_history”, that can be part of the mapping configuration and an INSERT will go out to “XXX_history” as well. Triggers can also be used for patterns like this, and the ORM can also be instructed to call out to SPs upon various events if needed.
  
  > require most ORMs to to do lots of queries for the inner loops
  
  this is called the “N+1” problem and is solved by eager loading (note Hibernate has similar features):
  
  http://www.sqlalchemy.org/docs/orm/loading.html
  
  >A single, simple wide result set can be intelligently rendered into such a page using nested runtime objects.
  
  This approach relies upon database views which join multiple tables together. I used these heavily at major league baseball – a serious shortcoming is that you need to rebuild the view, or just add new ones, for even slight alterations, and that typically means more hand-coded SQL. A refactoring to the database schema then means you have to revisit every single view affected and rewrite it. The temptation to JOIN various views together is also very strong, which leads to terrible performance, as the views themselves are already often very awkwardly joining among many tables and their combination grinds the query to a halt. A good query among many tables can be written more efficiently by referring to the actual tables directly.
  
  Reply
  - rorycl says:
    
    6 January 2012 at 08:26
    
    One can just write PLSQL as one would write in one’s standard web app language of choice. If one does that one doesn’t have to worry about the niceties of mapper logic or capabilities. By the way views tend not to be used much in my experience of PLSQL. Left outer joins, triggers, and window functions etc. are just a normal part of day-to-day RDMS work.
    
    I looked at the “eager loading” reference. It is clear SQLAlchemy is very capable. However the caveats on using eager loading suggest that using the SQLAlchemy approach means that one is close to writing a new language for querying databases. There are also straightforward technical issues, such as “When multiple levels of depth are used with joined or subquery loading, loading collections-within- collections will multiply the total number of rows fetched in a cartesian fashion. Both forms of eager loading always join from the original parent class.” You can get a cartesian product using a simple N+1? What about N+N+N+1. Why not do it in SQL directly?
  - rorycl says:
    
    6 January 2012 at 08:27
    
    Ran out of space. I wanted to add:
    
    There are some major drawbacks to working directly in the database. Most PLSQL
    cannot be easily debugged, and it is more difficult to integrate into a test suite. However since I and the rest of our development team have to know at least some SQL, we seem to benefit from doing all of our database-related work in it.
  - mike bayer says:
    
    6 January 2012 at 10:35
    
    > You can get a cartesian product using a simple N+1? What about N+N+N+1. Why not do it in SQL directly?
    
    The use case of subquery eager loading more than two levels is an edge case. If you need to optimize that particular edge on those rare occasions it occurs and the number of rows being received warrants it, then you can write that particular statement in SQL. The vast majority of statements that need no personalized attention can remain automated.
    
    The point is we’re trying to write large applications that don’t require thousands of hand-coded SQL statements, hardwired towards a specific backend, repeating identifier names and idioms throughout, linked to reams of hand-coded data marshaling code. It’s no different than why we use object oriented scripting languages rather than coding directly in C or machine instructions. The goal of SQL abstraction layers and ORMs (which are two different things) is about automation of repetitive tasks, producing less code, producing code that is much more amenable to simple and complex refactorings – increasing the ratio of intent to execution within source code. More intent and less execution (execution meaning, the series of steps in order to tell the computer to do something) is the goal of scripting languages in the first place.
Anthony says:

5 January 2012 at 18:29

“If you start, say, with some of the tables in the Pagila database, building a web app to display the movies in your store for selection by customers, and use some external system for order processing and another system for your accounting. And then you want to start integrating some of these into your original movie/customer database, or you acquire/merge with another store that uses a different app, or you start adding other products or services (book, VOD, etc).”

But in that case, the “application-specific characteristics” of your database would probably have to be redesigned regardless of whether your app uses an ORM/DAL.

Reply