Design Notes on Database Application Development

I recently took a brief look at web2py since its Database Abstraction Layer (DAL) sounded interesting. After following the tutorials in the Overview chapter, I ended up not delving into DAL.  Instead, I revisited Ruby on Rails which I had last looked circa 2005 when its support for PostgreSQL was rather weak. Although that has improved, the basic approach of Rails hasn’t and is very similar to web2py’s as well as Django.

The essence of these full stack frameworks is that the developer —or should I say programmer— knows how to design classes and that that design can be transformed into a relational database design mechanically and flawlessly. Maybe I’m biased because my first involvements in software engineering were in the areas of IT system design, but the latter premise seems like a very weak foundation for developing robust systems.

Maybe this also explains, in part, why NoSQL solutions have cropped up and found a following. It’s much easier —albeit not reliable or sound— to design a system when you “liberate” yourself from the “straitjacket” constraints of a database design (aka schema). However, in my experience, most systems beyond the blog/wiki/toy database examples used in tutorials benefit from specialized data analysis and modeling that result in a formal database design.

If the database precedes the application development, it appears the Rails/Django/web2py approach is back asswards. If the database is available during development, doesn’t it make more sense to re-use the database design effort in creating the application rather than try to repeat it with Python or Ruby classes?

The Pyrseas project, in its quest to rescue Andromeda, plans to implement a utility to help in database application development (tentatively named dbappgen but that may change). The utility will start off by connecting to a database and will allow the developer to create the application based on the tables and other objects present in it, not the other way around. If database changes are needed, they can be made there first, not by imposing some application-induced “migration” (as web2py does).

Since Pyrseas’ dbtoyaml utility already outputs a YAML/JSON description of a database, it may even be possible to conduct an application development exercise, at least in part, without connecting to the database. For example, if the developer invokes dbappgen moviesdb, the utility could first check whether a file named moviesdb.yaml (or with similar extensions) exists in the current directory and if so, read it and recreate in memory the PostgreSQL catalogs, thus allowing “offline” development.

In other news, due to the recent discussions regarding Python 3, my hopes that Werkzeug will be ported soon have been dashed. It appears it’s either use WebOb or write my own framework (which ironically is what Armin recommended).

About these ads

26 thoughts on “Design Notes on Database Application Development”

    1. Thanks for your thoughts (I will take a look) but I’m afraid I gave up on Ruby a long time ago (it seems). My short re-incursion into Rails was simply to have a current, informed opinion. I have invested several years on Python and feel much more comfortable with it than with Ruby. Plus, I’d have to change the blog title: “Taming Rubies (Gems, Manga Girls?, Foxes?) and Pachyderms” don’t sound right :-)

  1. I don’t believe that the objects in a web framework should map to the database, at least, not in a static way. The ability of SQL to generate novel result sets from a small number of tables means that there is an intrinsic mismatch between objects and relational result sets.

    There can be some useful mapping between objects and the database when data is inserted into a database (at least, in the simplest cases). When data is returned from the database (other than to populate forms for the abovementioned) objects can be constructed at runtime to help render them via templating. An introduction to this approach is here: http://campbell-lange.net/company/articles/dbwrapper/

  2. You’re spot on that relational database design should be done up front, and the application be designed to talk to it, rather than the other way around. This is the entire philosophy and approach of SQLAlchemy, and it’s how I write applications with it. Limitations and pre-defined notions of the Python tooling in use should not be driving your schema design.

    1. Hi zzzeek, I almost added a comment about SQLAlchemy in the original post in the context of Pyramid/Pylons. What I like about SQLA is that it provides introspection of existing tables, albeit because of its generic nature it can’t directly support PG-specific types like tsvector. However, I’m reluctant to use its ORM capabilities (while admitting they’re an order better than other alternatives I’ve seen).

      1. The reflection process today pulls in the types in the most specific form allowed, and we have lots of database-specific types within each dialect, as well as database-specific extensions to existing types (such as MySQL’s “collation” keyword on VARCHAR). Where applicable, these are subclasses of a more generic type, but in many cases are standalone types all their own. For PG, we currently have support for such types as ARRAY, BYTEA, INET, CIDR, MACADDR, INTERVAL. tsvector would be a pretty easy add (like two lines to our type registry).

  3. If the database precedes the application development, it appears the Rails/Django/web2py approach is back asswards. If the database is available during development, doesn’t it make more sense to re-use the database design effort in creating the application rather than try to repeat it with Python or Ruby classes?

    For the sake of completeness I must add that at least Django does support working with pre-existent databases. You wouldn’t need to create the clases “by hand”. From Django docs:

    Django comes with a utility called inspectdb that can create models by introspecting an existing database. You can view the output by running this command:

    python manage.py inspectdb

    Save this as a file by using standard Unix output redirection:

    python manage.py inspectdb > models.py

    1. Hi Sergio, that’s interesting. I see it derives ForeignKey information correctly, but it stlll doesn’t handle mutiple column keys. It maps tsvector to TextField, adding a “# This field type is a guess.” and also doesn’t know about fixed length CHAR types (again mapping them to TextField).

  4. “The essence of these full stack frameworks is that the developer —or should I say programmer— knows how to design classes and that that design can be transformed into a relational database design mechanically and flawlessly.”

    Note, web2py has a database abstraction layer (DAL), not an ORM. DAL models are not classes designed by the programmer, but instances of a single Table class. There is a one-to-one correspondence between a DAL table and the database table it represents.

    “If the database is available during development, doesn’t it make more sense to re-use the database design effort in creating the application rather than try to repeat it with Python or Ruby classes?”

    In many cases, the database is developed specifically for the web application. In that case, using the web2py DAL need not involve repeating anything in Python. You simply define your database tables using the DAL, and it creates those tables in the database for you — no need to write any SQL.

    If you have an existing database, you may be repeating some schema definition, but web2py provides scripts to automate that process for MySQL (http://code.google.com/p/web2py/source/browse/scripts/extract_mysql_models.py) and PostgreSQL (http://code.google.com/p/web2py/source/browse/scripts/extract_pgsql_models.py).

    1. “In many cases, the database is developed specifically for the web application.”

      In that case, maybe web2py (or Django or Rails) are OK, but unless the organization using the database doesn’t grow or other areas within the organization don’t want/need to use it, the database will tend to want to “extend its horizons,” at which point the application-specific characteristics will need to be redesigned or will make other usage difficult.

      1. If you start, say, with some of the tables in the Pagila database, building a web app to display the movies in your store for selection by customers, and use some external system for order processing and another system for your accounting. And then you want to start integrating some of these into your original movie/customer database, or you acquire/merge with another store that uses a different app, or you start adding other products or services (book, VOD, etc).

    2. I agree that database design can (and often should be) independent of the web application connecting to it. Objects and SQL queries are quite different things and aren’t mappable except in the simplest cases.

      I strongly believe that we need to find ways of providing SQL management interfaces, probably based on procedural sql, as the way of negotiating between web apps and the database.

      1. procedural SQL when used in conjunction with a domain model implies that the persistence layer must be hand-coded twice – once within the procedural layer as hand-tailored INSERT/UPDATE/DELETE statements, and once within the application, as hand-tailored marshalling between each individual domain object or group of objects and the stored procedures that happen to be responsible for those objects, or worse among the multiple procedures that interact with those particular objects based on use cases.

        Contrast with using a modern object relational mapper such as SQLAlchemy. Here, the schema is designed, a domain model is designed, then an interaction layer based on SQLAlchemy constructs mediates between them. Then the persistence layer is done, with not a single INSERT or line of object marshaling code needing to be written.

  5. Simple INSERT/UPDATE/DELETE is fine to be handled by an ORM. Complex INSERT/UPDATE/DELETEs won’t map to objects (for instance may affect several tables) and may have implications for integrity beyond the scope of the particular web app.

    A practical solution is to provide certain “operating system” level functions in PL around the database, and for other simple or gui-centric stuff to be handled by the app.

    The constraints of object-relational mappers can very easily be out-maneuvered by simple database query return sets. For example, a web page showing a tier of car owners, the cars they own, and the number of previous owners for each car will require most ORMs to to do lots of queries for the inner loops. A single, simple wide result set can be intelligently rendered into such a page using nested runtime objects. This is not ORM, but — to coin a phrase — a middleware to relational “conjoining” which I don’t believe should be automated.

    I appreciate from my point of view the data and data structure _is_ the central application. This does not appear to be a common view.

    1. > Complex INSERT/UPDATE/DELETEs won’t map to objects (for instance may affect several tables) and may have implications for integrity beyond the scope of the particular web app.

      Modern unit of work implementations insert/update/delete from any number of tables at once, provided they are represented by the domain model in some way. Domain representations can be automated – such as if every table “XXX” also needs to INSERT into a table “XXX_history”, that can be part of the mapping configuration and an INSERT will go out to “XXX_history” as well. Triggers can also be used for patterns like this, and the ORM can also be instructed to call out to SPs upon various events if needed.

      > require most ORMs to to do lots of queries for the inner loops

      this is called the “N+1″ problem and is solved by eager loading (note Hibernate has similar features):

      http://www.sqlalchemy.org/docs/orm/loading.html

      >A single, simple wide result set can be intelligently rendered into such a page using nested runtime objects.

      This approach relies upon database views which join multiple tables together. I used these heavily at major league baseball – a serious shortcoming is that you need to rebuild the view, or just add new ones, for even slight alterations, and that typically means more hand-coded SQL. A refactoring to the database schema then means you have to revisit every single view affected and rewrite it. The temptation to JOIN various views together is also very strong, which leads to terrible performance, as the views themselves are already often very awkwardly joining among many tables and their combination grinds the query to a halt. A good query among many tables can be written more efficiently by referring to the actual tables directly.

      1. One can just write PLSQL as one would write in one’s standard web app language of choice. If one does that one doesn’t have to worry about the niceties of mapper logic or capabilities. By the way views tend not to be used much in my experience of PLSQL. Left outer joins, triggers, and window functions etc. are just a normal part of day-to-day RDMS work.

        I looked at the “eager loading” reference. It is clear SQLAlchemy is very capable. However the caveats on using eager loading suggest that using the SQLAlchemy approach means that one is close to writing a new language for querying databases. There are also straightforward technical issues, such as “When multiple levels of depth are used with joined or subquery loading, loading collections-within- collections will multiply the total number of rows fetched in a cartesian fashion. Both forms of eager loading always join from the original parent class.” You can get a cartesian product using a simple N+1? What about N+N+N+1. Why not do it in SQL directly?

      2. Ran out of space. I wanted to add:

        There are some major drawbacks to working directly in the database. Most PLSQL
        cannot be easily debugged, and it is more difficult to integrate into a test suite. However since I and the rest of our development team have to know at least some SQL, we seem to benefit from doing all of our database-related work in it.

      3. > You can get a cartesian product using a simple N+1? What about N+N+N+1. Why not do it in SQL directly?

        The use case of subquery eager loading more than two levels is an edge case. If you need to optimize that particular edge on those rare occasions it occurs and the number of rows being received warrants it, then you can write that particular statement in SQL. The vast majority of statements that need no personalized attention can remain automated.

        The point is we’re trying to write large applications that don’t require thousands of hand-coded SQL statements, hardwired towards a specific backend, repeating identifier names and idioms throughout, linked to reams of hand-coded data marshaling code. It’s no different than why we use object oriented scripting languages rather than coding directly in C or machine instructions. The goal of SQL abstraction layers and ORMs (which are two different things) is about automation of repetitive tasks, producing less code, producing code that is much more amenable to simple and complex refactorings – increasing the ratio of intent to execution within source code. More intent and less execution (execution meaning, the series of steps in order to tell the computer to do something) is the goal of scripting languages in the first place.

  6. “If you start, say, with some of the tables in the Pagila database, building a web app to display the movies in your store for selection by customers, and use some external system for order processing and another system for your accounting. And then you want to start integrating some of these into your original movie/customer database, or you acquire/merge with another store that uses a different app, or you start adding other products or services (book, VOD, etc).”

    But in that case, the “application-specific characteristics” of your database would probably have to be redesigned regardless of whether your app uses an ORM/DAL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s