The Andromeda “data dictionary” is organized as multiple hierarchies, e.g., modules, generic columns, and tables with columns. Here is a very simple example:
module movies:
description: Movies
column id:
type_id: int
column title:
type_id: vchar
colprec: 32
column release_year:
type_id: int
table film:
module: movies
description: Film information
column film_id:
primary_key: Y
column title:
column release_year:
This has to be created and maintained manually which may be OK if you’re starting off on a new database project, but not so cool if you have an existing database with a hundred tables (or hundreds of thousands of tables–see page 6).
Thus for Pyrseas I decided to create first a tool (dbtoyaml) to output an existing database schema in YAML format. Structurally, a single hierarchy, e.g., database → schemas → tables (somewhat like pgAdmin’s left panel), appeared preferable to Andromeda’s design. After a couple of iterations, the Pyrseas YAML specification ended as follows:
schema public:
table film:
columns:
- id:
not_null: true
type: integer
- title:
type: character varying(32)
- release_year:
type: integer
description: Film information
primary_key:
film_pkey:
columns:
- id
The above is the output from PyYAML’s dump function with default_flow_style=False. Internally, the structure is a Python dict or map which looks just like JSON, e.g.,
{
'schema public':
{
'table film':
{
'primary_key':
{
'film_pkey':
{
'columns': ['id']
}
},
'columns':
[
{
'id':
{
'not_null': True,
'type': 'integer'
}
},
{
'title':
{
'type': 'character varying(32)'
}
},
{
'release_year':
{
'type': 'integer'
}
}
],
'description':
'Film information'
}
}
}
PyYAML’s load function can read the YAML file and turn it into the Python dict, so the second tool (yamltodb) can deal with the structure above directly.
The most important implementation decision for yamltodb was how to compare the input map to the map created from the current database’s catalogs. A few options were possible:
- SQL-based comparisons
- Map comparisons
- Internal structure comparisons
SQL Comparisons
This is the approach taken by Andromeda. Its essence is described in step 4 of Dictionary Based Database Upgrades. It consists of storing the input specification into a set of tables and the existing catalogs into a parallel set (potentially one could use the information_schema views), and then issuing SQL queries to compare the two sets.
The presumed advantage of this method is that some of the operations can be done –as SQL set operations– in the DBMS. However, the input spec first has to be inserted into the tables, a row-at-a-time, thus negating the former benefit. In addition, the affected database has to carry at least one set of extraneous catalog-like tables.
Map Comparisons
This is the naive approach consisting of “walking the trees,” i.e., comparing the input map to an internal map created from the current database. This works, up to a point. For example, if a table in the input map is not present in the internal map, it’s easy enough to call a function to generate a CREATE TABLE statement.
However, as most readers know, a PostgreSQL database usually has many dependencies between the various system entities, e.g., foreign keys on primary keys, views on tables, etc. So comparing hierarchies isn’t ideal.
Internal Comparisons
In a forthcoming post, I’ll explore this method as well as other issues such as generating the SQL statements in the correct order.
I looked at Andromeda years ago, from the perspective of “generate the client from the schema”. I was, and remain, puzzled that a Relational database tool would be implemented IMS style.
Not sure what you mean by “IMS style.” From what I’ve seen Andromeda is implemented with SQL queries to do all the diff’ing, hardly like IMS. The YAML specs for Andromeda and Pyrseas are hierarchical because the system entities, albeit stored in tables, are related –more or less– in that manner. That also explains pgAdmin’s left panel.
Pingback: SQL Version Control Implementation Choices II | Taming Serpents and Pachyderms