Tag Archives: concurrency control

The Phantom of the Database – Part 4

At the end of November, I finished the third episode with mild suspense: I suggested that the problem of optimistic “locking” could perhaps be solved in PostgreSQL with something other than extra qualifications, row version numbers or timestamps.

Let’s start this episode with action!

moviesdb=> INSERT INTO film VALUES (47478, 'Seven  Samurai', 1956);
INSERT 0 1
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |     title      | release_year
--------+------+-------+----------------+--------------
 853969 |    0 | (0,1) | Seven  Samurai |         1956
(1 row)

moviesdb=> UPDATE film SET title = 'Seven Samurai' WHERE id = 47478;
UPDATE 1
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |     title     | release_year
--------+------+-------+---------------+--------------
 853970 |    0 | (0,2) | Seven Samurai |         1956
(1 row)

moviesdb=> UPDATE film SET title = 'Sichinin Samurai' WHERE id = 47478;
UPDATE 1
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
--------+------+-------+------------------+--------------
 853971 |    0 | (0,3) | Sichinin Samurai |         1956
(1 row)

moviesdb=> UPDATE film SET release_year = 1954 WHERE id = 47478;
UPDATE 1
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
--------+------+-------+------------------+--------------
 853972 |    0 | (0,4) | Sichinin Samurai |         1954
(1 row)

moviesdb=> VACUUM FULL film;
VACUUM
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
--------+------+-------+------------------+--------------
 853972 |    0 | (0,1) | Sichinin Samurai |         1954
(1 row)

moviesdb=> BEGIN; DELETE FROM film WHERE id = 47478; ROLLBACK;
BEGIN
DELETE 1
ROLLBACK
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  |  xmax  | ctid  |      title       | release_year
--------+--------+-------+------------------+--------------
 853972 | 853974 | (0,1) | Sichinin Samurai |         1954
(1 row)

moviesdb=> BEGIN; UPDATE film SET release_year = 1956 WHERE id = 47478;
BEGIN
UPDATE 1
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  | xmax | ctid  |      title       | release_year
--------+------+-------+------------------+--------------
 853975 |    0 | (0,2) | Sichinin Samurai |         1956
(1 row)

moviesdb=> ROLLBACK;
ROLLBACK
moviesdb=> SELECT xmin, xmax, ctid, title, release_year FROM film;
  xmin  |  xmax  | ctid  |      title       | release_year
--------+--------+-------+------------------+--------------
 853972 | 853975 | (0,1) | Sichinin Samurai |         1954
(1 row)

What element in the queries above could be used as a surrogate “row version identifier?”  If you examine the changes carefully, you’ll notice that the xmin system column provides that capability. The ctid, a row locator, on the other hand, does not survive the VACUUM operation, and xmax is only used when a row is deleted (or updated, causing it to move).

So my suggestion, in terms of web user interfaces, is that fetching a row for a possible update should include the xmin value, e.g., in the get() method of the Film class, use the following:

    def get(self, db):
        try:
            row = db.fetchone(
                "SELECT xmin, title, release_year FROM film WHERE id = %s",
                (self.id,))

The xmin value can then be sent as a hidden field to the web client, and used in the update() method to implement optimistic concurrency control.

Update: The DB UI Tutorial tag v0.4.3 now has an implementation of this concept for the command line client.

The Phantom of the Database – Part 3

In the previous episode: Alice and Bob were trying to simultaneously update the title of the following row of the film table:

  id   |     title      | release_year
-------+----------------+--------------
 47478 | Seven  Samurai |         1956

User Transaction Correctness

In the first episode, Carol also wanted to concurrently change the release_year to 1954. Let us now consider her update in conjunction with one of the others.

If Bob modifies the title and Carol changes the year —provided our SQL statements only include the updated fields— there will be no problem. In other words, the changes can be merged, just as when Alice updates one section of a Wikipedia page and Bob changes another section of the same page (Note to self: have to talk to them about this moonlighting).

So the problem with Alice’s and Bob’s updates to the title (one to remove the extra space, the other to change it to Japanese) is that they cannot be merged. How can we implement a correct user transaction without incurring the penalty of locking the row in question?

Optimistic Concurrency Control

Richard Stephan was on the right track when he asked in a comment to the previous post: Should the application be using a pessimistic or an optimistic locking technique?

Optimistic concurrency control (aka optimistic “locking”) is a transaction design approach that avoids the overhead of locking. As stated in the Wikipedia article, the “stateless nature of HTTP makes locking infeasible for web user interfaces,” so it’s almost a given that we should use OCC for our interface.

OCC involves validating that the record we updated was the correct one. A simple way to do this is to add a qualification to the original UPDATE that does the validation for us. For example, the DB-API cursor execute call could be coded as follows:

cursor.execute("UPDATE film SET title = %s "
               "WHERE id = %s AND title = %s",
               (newrec.title, oldrec.id, oldrec.title))

Above, newrec refers to the new record instance and oldrec to the old one. So both Alice and Bob would send UPDATEs with “AND title = ‘Seven  Samurai'”. Naturally, since they’re both modifying the title, only one of the statements can succeed.

The downside of having to add the extra qualification —in the general case— is that the user could’ve changed several fields so each UPDATE would have to be customized accordingly. This will have a negative impact, among other things, on the query planner. In addition, the web interface would have to deal with two values for each field.

Alternatives to the extra qualification are to add either a version number for each row or a sufficiently granular timestamp that also acts as a versioning scheme. However, since we’re using PostgreSQL, perhaps we can use something else …

Stay tuned.

The Phantom of the Database – Part 2

In the previous episode: Alice, Bob and Carol were trying to simultaneously update the following row of the film table:

  id   |     title      | release_year
-------+----------------+--------------
 47478 | Seven  Samurai |         1956

Alice wanted to remove the extra space in the title, Bob was trying to change the title to the phonetic Japanese version and Carol was correcting the release year to 1954. To simplify the analysis we’ll now limit ourselves to Alice’s and Bob’s updates.

The Lost Update Problem

If Alice’s statement executes first, Bob’s change will overwrite her update. Similarly, if Bob’s statement takes precedence, his change will be overwritten. Appropriately, this is conventionally known as the lost update problem. The updates are known as blind writes or dirty writes. How can our Python user interface prevent this problem?

The traditional solution to the lost update problem is to use two-phase locking. You can use PostgreSQL’s psql application to verify how this works (I used the PROMPT1 variable to show which user is issuing the statements).

Alice’s session starts as follows:

alice@moviesdb=> begin;
BEGIN
alice@moviesdb=> select title from film where id = 47478;
     title      
----------------
 Seven  Samurai
(1 row)

Bob’s session is identical but then he issues the UPDATE statement:

bob@moviesdb=> begin;
BEGIN
bob@moviesdb=> select title from film where id = 47478;
     title      
----------------
 Seven  Samurai
(1 row)

bob@moviesdb=> update film set title = 'Sichinin no Samurai' where id = 47478;
UPDATE 1

When Alice tries her UPDATE, her session hangs:

alice@moviesdb=> update film set title = 'Seven Samurai' where id = 47478;

You can examine the situation from another psql session (I cheated and excluded that session’s data). I won’t try to explain (or understand) all this but you can see that Alice’s session is waiting due to an ungranted lock.

moviesdb=# select procpid, waiting, current_query from pg_stat_activity;
 procpid | waiting |                       current_query
---------+---------+-----------------------------------------------------------
   25747 | t       | update film set title = 'Seven Samurai' where id = 47478;
   25900 | f       | <IDLE> in transaction
(2 rows)

moviesdb=# select pid, relation::regclass, locktype, page, tuple, mode, granted
moviesdb-# from pg_locks order by pid, relation, locktype;
  pid  | relation  |   locktype    | page | tuple |       mode       | granted
-------+-----------+---------------+------+-------+------------------+---------
 25747 | film      | relation      |      |       | AccessShareLock  | t
 25747 | film      | relation      |      |       | RowExclusiveLock | t
 25747 | film      | tuple         |    0 |    37 | ExclusiveLock    | t
 25747 | film_pkey | relation      |      |       | AccessShareLock  | t
 25747 | film_pkey | relation      |      |       | RowExclusiveLock | t
 25747 |           | transactionid |      |       | ShareLock        | f
 25747 |           | transactionid |      |       | ExclusiveLock    | t
 25747 |           | virtualxid    |      |       | ExclusiveLock    | t
 25900 | film      | relation      |      |       | RowExclusiveLock | t
 25900 | film      | relation      |      |       | AccessShareLock  | t
 25900 | film_pkey | relation      |      |       | RowExclusiveLock | t
 25900 | film_pkey | relation      |      |       | AccessShareLock  | t
 25900 |           | transactionid |      |       | ExclusiveLock    | t
 25900 |           | virtualxid    |      |       | ExclusiveLock    | t
(14 rows)

Bob’s COMMIT releases his locks …

bob@moviesdb=> commit;
COMMIT

and Alice’s UPDATE now goes through:

UPDATE 1
alice@moviesdb=> commit;
COMMIT
alice@moviesdb=> select title from film where id = 47478;
     title     
---------------
 Seven Samurai
(1 row)

Hey, what happened? Alice’s UPDATE overwrote Bob’s! Wasn’t that supposed to be prevented?

Here is the rub: if it is important for the application to update the row as was presented to the user, then we need to add another qualification to the UPDATE, i.e., we need something like “and title = 'Seven  Samurai'“. We’ll discuss this in a future installment.