SQL Version Control Implementation Choices I

The Andromeda “data dictionary” is organized as multiple hierarchies, e.g., modules, generic columns, and tables with columns. Here is a very simple example:

module movies:
    description: Movies

column id:
    type_id: int

column title:
    type_id: vchar
    colprec: 32

column release_year:
    type_id: int

table film:
    module: movies
    description: Film information
    column film_id:
        primary_key: Y
    column title:
    column release_year:

This has to be created and maintained manually which may be OK if you’re starting off on a new database project, but not so cool if you have an existing database with a hundred tables (or hundreds of thousands of tables–see page 6).

Thus for Pyrseas I decided to create first a tool (dbtoyaml) to output an existing database schema in YAML format. Structurally, a single hierarchy, e.g., database → schemas → tables (somewhat like pgAdmin’s left panel), appeared preferable to Andromeda’s design. After a couple of iterations, the Pyrseas YAML specification ended as follows:

schema public:
  table film:
    columns:
    - id:
        not_null: true
        type: integer
    - title:
        type: character varying(32)
    - release_year:
        type: integer
    description: Film information
    primary_key:
      film_pkey:
        columns:
        - id

The above is the output from PyYAML’s dump function with default_flow_style=False. Internally, the structure is a Python dict or map which looks just like JSON, e.g.,

{
  'schema public':
  {
    'table film':
    {
      'primary_key':
      {
        'film_pkey':
        {
          'columns': ['id']
        }
      },
      'columns':
      [
        {
          'id':
          {
            'not_null': True,
            'type': 'integer'
          }
        },
        {
          'title':
          {
            'type': 'character varying(32)'
          }
        },
        {
          'release_year':
          {
            'type': 'integer'
          }
        }
      ],
      'description':
        'Film information'
    }
  }
}

PyYAML’s load function can read the YAML file and turn it into the Python dict, so the second tool (yamltodb) can deal with the structure above directly.

The most important implementation decision for yamltodb was how to compare the input map to the map created from the current database’s catalogs. A few options were possible:

  1. SQL-based comparisons
  2. Map comparisons
  3. Internal structure comparisons

SQL Comparisons

This is the approach taken by Andromeda. Its essence is described in step 4 of Dictionary Based Database Upgrades. It consists of storing the input specification into a set of tables and the existing catalogs into a parallel set (potentially one could use the information_schema views), and then issuing SQL queries to compare the two sets.

The presumed advantage of this method is that some of the operations can be done –as SQL set operations– in the DBMS. However, the input spec first has to be inserted into the tables, a row-at-a-time, thus negating the former benefit. In addition, the affected database has to carry at least one set of extraneous catalog-like tables.

Map Comparisons

This is the naive approach consisting of “walking the trees,” i.e., comparing the input map to an internal map created from the current database. This works, up to a point. For example, if a table in the input map is not present in the internal map, it’s easy enough to call a function to generate a CREATE TABLE statement.

However, as most readers know, a PostgreSQL database usually has many dependencies between the various system entities, e.g., foreign keys on primary keys, views on tables, etc. So comparing hierarchies isn’t ideal.

Internal Comparisons

In a forthcoming post, I’ll explore this method as well as other issues such as generating the SQL statements in the correct order.

About these ads

3 thoughts on “SQL Version Control Implementation Choices I”

  1. I looked at Andromeda years ago, from the perspective of “generate the client from the schema”. I was, and remain, puzzled that a Relational database tool would be implemented IMS style.

    1. Not sure what you mean by “IMS style.” From what I’ve seen Andromeda is implemented with SQL queries to do all the diff’ing, hardly like IMS. The YAML specs for Andromeda and Pyrseas are hierarchical because the system entities, albeit stored in tables, are related –more or less– in that manner. That also explains pgAdmin’s left panel.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s