In my first review of database VC tools, I mentioned Kenneth Down’s article where he recommended storing a single metadata file in a VCS and using a difference engine to do database upgrades. In a subsequent post, Kenneth expanded on this, providing a step-by-step description of how to implement such a tool. In fact, he had already gone further: he created an open source project named Andromeda that attempted to deliver on that vision.
Let’s see how our DBAs Carol and Dave can use Andromeda to implement the database changes specified in Version Control, Part 2: SQL Databases.
Andromeda is much more than a version control tool. Its SourceForge page advertises it as a “database development system” and a “complete tool for building database-centric” applications. So, to start off, Carol, Dave and Manny will have to install Apache, PHP and other dependencies before installing Andromeda. Then they’ll be able to access it via web browser at a URL such as http://localhost/~manny/andro_root/andro. After an initialization step, which includes creating a PostgreSQL database named andro, Manny can create the “movies” application as described here.
In order to create the first version of the film table, Carol has to create the YAML spec file movies.dd.yaml as follows (this is the file that will be placed under version control):
group admin: description: Administrators module movies: permsel: Y permins: Y permupd: Y permdel: Y group users: description: Employees module movies: permsel: Y module movies: description: Film Database uisort: 100 column id: description: Unique film ID type_id: int column film_title: description: Film title type_id: vchar colprec: 32 column release_year: description: Release year type_id: int value_min: 1888 table film: description: Films module: movies uisort: 30 spaceafter: Y column id: uisearch: Y primary_key: Y column film_title: uisearch: Y column release_year:
The group definitions are so that Andromeda can implement security constraints. The module definition is used to group tables logically, e.g., in the full Pagila database we may group tables into movies, customers and accounting. The column definitions precede the table definitions since the former can be used in multiple tables. The YAML spec is processed by selecting a “Build” step from Andromeda. If successful, it will create a “movies” database including the film table.
Note that Carol had to use film_title instead of title because the latter conflicts with an undocumented predefined column. Using title causes the following error:
ERROR >> Duplicate Column definition: ERROR >> column_id => title
Notice also that she used value_min instead of valuemin for the release_year constraint. Andromeda documents the latter. However, no CHECK constraint is actually created and the trigger functions (see below) don’t implement a validation.
If you look at the created table, you may be surprised:
Table "public.film" Column | Type | Modifiers --------------+-----------------------+----------- _agg | character(1) | skey_quiet | character(1) | film_title | character varying(32) | skey | integer | release_year | integer | id | integer | Indexes: "film_idx_film_skey_idx" btree (skey) "film_pk" btree (id) Triggers: film_ins_bef_r_t BEFORE INSERT ON film FOR EACH ROW EXECUTE PROCEDURE film_ins_bef_r_f() film_upd_bef_r_t BEFORE UPDATE ON film FOR EACH ROW EXECUTE PROCEDURE film_upd_bef_r_f()
As you can see, the order of the columns is not what you’d expect from the spec1 and three internal columns were added. Note that although there is an film_pk index, PostgreSQL doesn’t recognize it as the PRIMARY KEY, and all columns are nullable. The “magic” is all in the generated trigger functions.
1 Of course, according to the relational model, the order of the attributes is not significant, but in practice most people examining a relation expect some order.
Carol and Dave implement the second set of changes by editing movies.dd.yaml. The combined edits are as follows:
@@ -30,4 +30,15 @@ value_min: 1888 +column length: + description: Length in minutes + type_id: int + value_min: 1 + value_max: 9999 + +column rating: + description: Film rating + type_id: char + colprec: 5 + table film: description: Films @@ -44,2 +55,6 @@ column release_year: + + column length: + + column rating:
The departure from the original is minor: length was defined as int instead of smallint because Andromeda only allows a limited set of types. And although Dave specified the CHECK constraint on length, it’s not functional.
For the third iteration, Carol and Dave again edit movies.dd.yaml as shown below:
@@ -16,4 +16,44 @@ uisort: 100 +column language_id: + description: Language ID + type_id: int + +table language: + description: Spoken language + module: movies + uisort: 10 + + column language_id: + primary_key: Y + uisearch: Y + + column name: + description: Language name + uisearch: Y + + column ts_upd: + +column category_id: + description: Film category ID + type_id: int + +table category: + description: Categories + module: movies + uisort: 20 + spaceafter: Y + + column category_id: + auto: sequence + primary_key: Y + uisearch: Y + + column name: + description: Film category + uisearch: Y + + column ts_upd: + column id: description: Unique film ID @@ -59,2 +99,17 @@ column rating: + + foreign_key language: + +table film_category: + description: Film categories + module: movies + uisort: 40 + + foreign_key film: + primary_key: Y + + foreign_key category: + primary_key: Y + + column ts_upd:
- The foreign_key specification names the table only. Generated column names match those of the referenced table, which may be confusing.
- Each column of a PRIMARY KEY needs a primary_key: Y line.
- The predefined column name was used instead of table-specific columns as in the original. The former creates varchar(100) columns (although the documentation says varchar(40)).
- The predefined column ts_upd creates timestamp without time zone rather than with time zone as in the original. The latter is not supported by Andromeda.
- The sequence category_seq_category_id is created, as a result of auto: sequence on category_id but is not linked directly to the category table (it’s used in the trigger functions).
Andromeda is written in PHP (comments indicate it was earlier implemented in Java) and supports PostgreSQL databases. It also requires Apache, PEAR and PL/Perl.
Andromeda implements its difference engine by querying the information_schema views. It saves the data from the views and from the YAML spec into its own tables (it creates 105 tables in a schema named zdd and 77 tables in the public schema –not all of these are for version control or database automation tasks). It then uses SQL queries to determine what actions are needed for the upgrade. It applies the changes immediately.
As pointed out previously, Andromeda supports a limited set of data types. For example, booleans are only implemented as CHAR(1)’s that accept ‘Y’ or ‘N’. A follow-on project, Triangulum, appears to expand on data type support.
Andromeda can load data from CSV files and is able to retain data in an existing table whose structure is changing. However, it cannot deal with RENAMEs of columns or tables and it doesn’t seem to be able to DROP a column.
The last official release of Andromeda occurred in 2009. The guide for its successor, which made a third alpha release available in Feb. 2011, implies Andromeda will be superseded by Triangulum.
The concept of using a YAML specification for SQL database version control is very appealing (in fact, it’s what led to the Pyrseas project) because it disassociates SQL syntax from the definition and is not verbose like the XML used by other tools. Andromeda (and Triangulum) thus implement a useful mechanism for schema versioning.