An Introduction to PostgreSQL - Part 2

In Part 1 we looked at how PostgreSQL handles data concurrency, MVCC, and learned that understanding MVCC is critical for understanding PostgreSQL. Part 2 continues with MVCC and focuses on concurrent access.

Let's start by reviewing the four core characteristics of a database transaction (which gives us the acronym ACID):

  1. Atomicity - a transaction, the entire transaction, either completes fully or fails completely, with regard to other transactions.
  2. Consistency - a transaction cannot violate database integrity. i.e. referential integrity or other constraints.
  3. Isolation - individual transactions operate in a vacuum - transactions are not affected by concurrent-running transactions.
  4. Durability - Committed transactions are valid and lasting - implemented in PostgreSQL wal(write-ahead log).

For now, we will focus on the I (Isolation) attribute. To start, we will look at transaction anomalies:

  1. Dirty Reads - a transaction reads modified data from another transaction prior to commit. Impossible in PostgreSQL
  1. Non-Repeatable Reads - a transaction's multiple reads fail to maintain consistency due to changes made by another transaction
  1. we create an invoice table
select * from invoices;

 invoice_id |  amt  |         invoice_date
------------+-------+-------------------------------
          1 |  1000 | 2020-05-30 00:00:00-04
          2 |  1500 | 2020-06-01 00:00:00-04
          3 |   500 | 2020-06-02 12:23:03.058966-04
          4 | 20000 | 2020-05-26 12:23:37.257485-04
          5 |   400 | 2020-06-03 07:24:21.30561-04
          6 |  2300 | 2020-06-03 19:24:47.019604-04
          7 | 15000 | 2020-06-06 17:26:25.641259-04

 

  1. user test1 starts a transaction
(test1@btpg13) 12:26:27> begin;

BEGIN

(test1@btpg13) 12:28:19> select * from invoices where invoice_id = 4;

 invoice_id |  amt  |         invoice_date
------------+-------+-------------------------------
          4 | 20000 | 2020-05-26 12:23:37.257485-04
(1 row)

 

  1. user test2 starts a transaction
(test2@btpg13) 12:31:07> begin;

BEGIN

(test2@btpg13) 12:31:34> select * from invoices where amt > 15000;

 invoice_id |  amt  |         invoice_date
------------+-------+-------------------------------
          4 | 20000 | 2020-05-26 12:23:37.257485-04
(1 row)

 

  1. user test1 updates row
(test1@btpg13) 12:28:42> update invoices set amt = 12000 where invoice_id = 4;

UPDATE 1

(test1@btpg13) 12:34:21> commit;

COMMIT

 

  1. user test2 runs query again - as data has changed and is committed by user1 this is non-repeatable read.

This behavior is expected in the default isolation level (read committed) but can be an issue if the original data was required by test2.

(test2@btpg13) 12:32:22> select * from invoices where amt > 15000;

 invoice_id | amt | invoice_date
------------+-----+--------------
(0 rows)

 

  1. Phantom Rows - somewhat similar in concept to non-repeatable read, except it applies to a collection of rows
  1. user test2 executes select
(test2@btpg13) 12:36:56> select * from invoices where amt > 10000;

 invoice_id |  amt  |         invoice_date
------------+-------+-------------------------------
          7 | 15000 | 2020-06-06 17:26:25.641259-04
          4 | 12000 | 2020-05-26 12:23:37.257485-04
(2 rows)

 

  1. test1 inserts a row
(test1@btpg13) 12:51:34> insert into invoices (amt, invoice_date) values ((select 25000) ,(select current_timestamp - interval '31 hours'));

INSERT 0 1

(test1@btpg13) 12:51:36> commit;

COMMIT

 

  1. test2 re-runs same query
(test2@btpg13) 12:51:57> select * from invoices where amt > 10000;

 invoice_id |  amt  |         invoice_date
------------+-------+-------------------------------
          7 | 15000 | 2020-06-06 17:26:25.641259-04
          4 | 12000 | 2020-05-26 12:23:37.257485-04
          8 | 25000 | 2020-06-08 05:53:27.427788-04   <-- Phantom Row (3 rows)

 

These conditions may or may not be an problem, depending on application. To account for this, rdbms systems offer transaction isolation levels. The durability of the snapshot depends on the level.

 

Currently, there are 4 standard isolation levels in order of strictness:

 

  1. Read uncommitted - allows dirty reads/non-repeatable reads/rows - Not allowed in PostgreSQL
  1. Read Committed     - default level in PostgreSQL - non-repeatable reads/phantom rows possible - snapshot lasts for the duration of the select
  1. Repeatable read     - non-repeatable reads/phantom rows not possible - snapshot lasts for duration of transaction
  1. users test1 and test2 begin transactions - test 2 uses repeatable read
(test1@btpg13) 13:28:48> begin;

BEGIN

(test2@btpg13) 13:29:02> begin transaction isolation level repeatable read;

BEGIN

 

  1. test2 issues select
(test2@btpg13) 13:29:25> select * from invoices where amt > 10000;

 invoice_id |  amt  |         invoice_date
------------+-------+-------------------------------
          7 | 15000 | 2020-06-06 17:26:25.641259-04
          4 | 12000 | 2020-05-26 12:23:37.257485-04
          8 | 25000 | 2020-06-08 05:53:27.427788-04
(3 rows)

 

  1. test1 deletes invoice id 8
(test1@btpg13) 13:28:50> delete from invoices

btpg13-*> where invoice_id = 8;

DELETE 1

(test1@btpg13) 13:32:07> commit;

COMMIT

 

  1. test2 re-runs select - invoice_id 8 still exists in this snapshot - no chance of non-repeatable read
(test2@btpg13) 13:30:47> select * from invoices where amt > 10000;

 invoice_id |  amt  |         invoice_date
------------+-------+-------------------------------
          7 | 15000 | 2020-06-06 17:26:25.641259-04
          4 | 12000 | 2020-05-26 12:23:37.257485-04
          8 | 25000 | 2020-06-08 05:53:27.427788-04
(3 rows)

 

  1. test1 adds another invoice > 10000
(test1@btpg13) 13:32:09> insert into invoices (amt, invoice_date) values ((select 17000), (select current_timestamp - interval '51 hours'));

INSERT 0 1

(test1@btpg13) 13:34:13> commit;

COMMIT

(test1@btpg13) 13:34:56> select * from invoices where amt > 10000;

 invoice_id |  amt  |         invoice_date
------------+-------+-------------------------------
          7 | 15000 | 2020-06-06 17:26:25.641259-04
          4 | 12000 | 2020-05-26 12:23:37.257485-04
          9 | 17000 | 2020-06-07 10:34:13.693769-04
(3 rows)

 

  1. test2 re-runs select  - phantom row #9 does not exist in this current snapshot
(test2@btpg13) 13:35:34> select * from invoices where amt > 10000;

 invoice_id |  amt  |         invoice_date
------------+-------+-------------------------------
          7 | 15000 | 2020-06-06 17:26:25.641259-04
          4 | 12000 | 2020-05-26 12:23:37.257485-04
          8 | 25000 | 2020-06-08 05:53:27.427788-04
(3 rows)

 

  1. Serializable - non-repeatable reads/phantom rows not possible - most restricted - transactions are treated serially, not concurrent

Works as repeatable read with additional locking to enforce serializability.

  1. users test1 and test2 begin serializable transactions
(test1@btpg13) 15:05:00> select * from ser_test;

 status_id | status
-----------+--------
         1 | t
         2 | f
         3 | t
         4 | f
         5 | t
         6 | f
         7 | t
         8 | f
         9 | t
        10 | f
(10 rows)

 

  1. test1 updates table sets status = TRUE where status = FALSE
(test1@btpg13) 15:06:17> update ser_test

btpg13-*> set status = FALSE

btpg13-*> where status  = TRUE;

UPDATE 5

 

  1. test2 updates table sets status = FALSE where status = TRUE and commits;
(test2@btpg13) 15:06:41> update ser_test

btpg13-*> set status = TRUE;

btpg13-*> where status = FALSE;

UPDATE 5

(test2@btpg13) 15:06:57> commit;

COMMIT

 

  1. test1 attempts to commit;
(test1@btpg13) 15:06:32> commit;

ERROR:  could not serialize access due to read/write dependencies among transactions.

DETAIL:  Reason code: Canceled on identification as a pivot, during commit attempt.

HINT:  The transaction might succeed if retried.

 

Changing default transaction level to repeatable read/serializable comes with increased performance cost and should be tested accordingly - this can be set at the transaction, session, and database level.

UPDATE: June 14th 2020

On Friday, June 12 2020, Jepsen, a distributed systems research group, published a study on isolation levels in PostgreSQL.

They discovered issues with both Repeatable Read and Serializable Isolation. Both exhibited G2-item (anti-dependency cycle) characteristics. The consensus is that this is possibly allowable in RR but definitely in error for Serializable.

A patch has been written for the v12.3 release (August 2020)