An Introduction to PostgreSQL - Part 1

Originally published back in October 2012, this classic article series is undergoing a facelift and update!

We often get asked about PostgreSQL, and especially how it compares with proprietary enterprise-class Databases like Oracle or MS-SQL. In this classic series, revised and updated for today's PostgreSQL, we get right under the hood and explain, in plain English, just how the world's greatest Open Source Database works... You ask, we answer!

This first article will look at how PostgreSQL handles data concurrency, MVCC (Multi-Version Concurrency Control), which is quite different from other RDBMS platforms such as Oracle, MSSQL or MySQL.

PostgreSQL MVCC (Multi-version Concurrency Control) works in a similar way to making notes on a piece of paper. Imagine you note something ;down. Then you change your mind. You cross it out and draw a line on the paper to wherever there is a space further down, and write a new version. After a while, the sheet of paper will comprise mostly deletions and it will become difficult to assemble a current set of notes from it. Bear this analogy in mind in while you read the following explanations.

Let's start with a simple visual explanation of MVCC:

  1. CREATE TABLE
create table teams
(team_id serial primary key not null,
team_name varchar(10));

 

  1. CREATE RECORD

First, let's get the current transaction id:

select txid_current();
 txid_current
--------------
          561

insert into teams (team_name) values ('Eagles');

 

  1. EXAMINE DATA BLOCK

In PostgreSQL tables are stored as files under the PGDATA directory. File numbers can be retrieved using the following simple SQL. The Linux hexdump command will dump a block so we can take a close look at what's going on (we'll cover data encryption in a future post).

The following query tells us the physical location of the table.

select pg_relation_filepath('teams');
 pg_relation_filepath
----------------------
 base/19346/19373
hexdump  -C /data/pg13/data/base/19346/19373
00001fd0  00 00 00 00 00 00 00 00  32 02 00 00 00 00 00 00  |........2.......|
00001fe0  00 00 00 00 00 00 00 00  01 00 02 00 02 08 18 00  |................|
00001ff0  01 00 00 00 0f 45 61 67  6c 65 73 00 00 00 00 00  |.....Eagles.....|

 

  1. ADD ANOTHER RECORD
insert into teams (team_name) values ('Philles');
select * from teams;
team_id | team_name
---------+-----------
    1 | Eagles
    2 | Phillies

 

  1. GET TRANSACTIONAL ID OF DML STATEMENT

PostgreSQL adds system columns to tables:

  • xmin – transaction id of origin record
  • xmax – transaction id that deleted record
select xmin, xmax, * from teams;
 xmin | xmax | team_id | team_name
------+------+---------+-----------
  562 |    0 |       1 | Eagles
  563 |    0 |       2 | Phillies

 

  1. EXAMINE DATA BLOCK
hexdump  -C /data/pg13/data/base/19346/19373

00001fb0 33 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00  |3...............|
00001fc0 02 00 02 00 02 08 18 00 02 00 00 00 11 50 68 69  |.............Phi|
00001fd0 6c 6c 65 73 00 00 00 00 32 02 00 00 00 00 00 00  |lles....2.......|
00001fe0 00 00 00 00 00 00 00 00 01 00 02 00 02 08 18 00  |................|
00001ff0 01 00 00 00 0f 45 61 67 6c 65 73 00 00 00 00 00  |.....Eagles.....|

 

  1. UPDATE 1 RECORD

PostgreSQL will create a new row for the update, leaving the old row in the table, but not visible:

update teams set team_name = 'EAGLES'
where team_name = 'Eagles';

 select xmin, xmax, * from teams;
xmin | xmax | team_id | team_name
-----+------+---------+----------
 563 |    0 |       2 | Philles
 564 |    0 |       1 | EAGLES

 hexdump  -C /data/pg13/data/base/19346/19373

00001f80  00 00 00 00 00 00 00 00  34 02 00 00 00 00 00 00  |........4.......|
00001f90  00 00 00 00 00 00 00 00  03 00 02 80 02 29 18 00  |.............)..|
00001fa0  01 00 00 00 0f 45 41 47  4c 45 53 00 00 00 00 00  |.....EAGLES.....|  <-- UPDATE
00001fb0  33 02 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |3...............|
00001fc0  02 00 02 00 02 09 18 00  02 00 00 00 11 50 68 69  |.............Phi|
00001fd0  6c 6c 65 73 00 00 00 00  32 02 00 00 34 02 00 00  |lles....2...4...|
00001fe0  00 00 00 00 00 00 00 00  03 00 02 40 02 05 18 00  |...........@....|
00001ff0  01 00 00 00 0f 45 61 67  6c 65 73 00 00 00 00 00  |.....Eagles.....|  <-- ORIGINAL ROW

 

  1. INSPECT DATABASE PAGE

The pageinspect extension provides several useful functions to examine low level storage.

Get all rows information - Query provided by Bruce Momjian

Flag states – from source code:

* lp_flags has these possible states.  An UNUSED line pointer is available
 * for immediate re-use, the other states are not.
 */███████████████████████████████████████████████████████████████████████
#define LP_UNUSED       0       /* unused (should always have lp_len=0) */
#define LP_NORMAL       1       /* used (should always have lp_len>0) */
#define LP_REDIRECT     2       /* HOT redirect (should have lp_len=0) */
#define LP_DEAD         3       /* dead, may or may not have storage

 

 ctid  |  case  | xmin | xmax | t_ctid
-------+--------+------+------+-------
 (0,1) | Normal |  562 |  564 | (0,3)  <- ORIGINAL ROW – note xmax col – id of update session
 (0,2) | Normal |  563 |    0 | (0,2)
 (0,3) | Normal |  564 |    0 | (0,3)  <- UPDATED ROW
(3 rows)

 

  1. ADD MORE ROWS
insert into teams (team_name) values ('Flyers');

insert into teams (team_name) values ('Sixers');

 select xmin, xmax, * from teams;
 xmin | xmax | team_id | team_name
------+------+---------+----------
  563 |    0 |       2 | Philles
  564 |    0 |       1 | EAGLES
  565 |    0 |       3 | Flyers
  566 |    0 |       4 | Sixers

All transactions still exist in the block!

hexdump  -C /data/pg13/data/base/19346/19373

00001f30  00 00 00 00 00 00 00 00  36 02 00 00 00 00 00 00  |........6.......|
00001f40  00 00 00 00 00 00 00 00  05 00 02 00 02 08 18 00  |................|
00001f50  04 00 00 00 0f 53 69 78  65 72 73 00 00 00 00 00  |.....Sixers.....|
00001f60  35 02 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |5...............|
00001f70  04 00 02 00 02 08 18 00  03 00 00 00 0f 46 6c 79  |.............Fly|
00001f80  65 72 73 00 00 00 00 00  34 02 00 00 00 00 00 00  |ers.....4.......|
00001f90  00 00 00 00 00 00 00 00  03 00 02 80 02 29 18 00  |.............)..|
00001fa0  01 00 00 00 0f 45 41 47  4c 45 53 00 00 00 00 00  |.....EAGLES.....|
00001fb0  33 02 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |3...............|
00001fc0  02 00 02 00 02 09 18 00  02 00 00 00 11 50 68 69  |.............Phi|
00001fd0  6c 6c 65 73 00 00 00 00  32 02 00 00 34 02 00 00  |lles....2...4...|
00001fe0  00 00 00 00 00 00 00 00  03 00 02 40 02 05 18 00  |...........@....|
00001ff0  01 00 00 00 0f 45 61 67  6c 65 73 00 00 00 00 00  |.....Eagles.....|

 

  1. UPDATE 1 ROW
update teams set team_name = 'PHILLIES'
where team_name = 'Philles';

 select xmin, xmax, * from teams;
 xmin | xmax | team_id | team_name
------+------+---------+-----------
  564 |    0 |       1 | EAGLES
  565 |    0 |       3 | Flyers
  566 |    0 |       4 | Sixers
  567 |    0 |       2 | PHILLIES

hexdump  -C /data/pg13/data/base/19346/19373

00001f10  37 02 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |7...............|
00001f20  06 00 02 80 02 28 18 00  02 00 00 00 13 50 48 49  |.....(.......PHI|
00001f30  4c 4c 49 45 53 00 00 00  36 02 00 00 00 00 00 00  |LLIES...6.......|
00001f40  00 00 00 00 00 00 00 00  05 00 02 00 02 09 18 00  |................|
00001f50  04 00 00 00 0f 53 69 78  65 72 73 00 00 00 00 00  |.....Sixers.....|
00001f60  35 02 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |5...............|
00001f70  04 00 02 00 02 09 18 00  03 00 00 00 0f 46 6c 79  |.............Fly|
00001f80  65 72 73 00 00 00 00 00  34 02 00 00 00 00 00 00  |ers.....4.......|
00001f90  00 00 00 00 00 00 00 00  03 00 02 80 02 29 18 00  |.............)..|
00001fa0  01 00 00 00 0f 45 41 47  4c 45 53 00 00 00 00 00  |.....EAGLES.....|
00001fb0  33 02 00 00 37 02 00 00  00 00 00 00 00 00 00 00  |3...7...........|
00001fc0  06 00 02 40 02 01 18 00  02 00 00 00 11 50 68 69  |...@.........Phi|
00001fd0  6c 6c 65 73 00 00 00 00  32 02 00 00 34 02 00 00  |lles....2...4...|
00001fe0  00 00 00 00 00 00 00 00  03 00 02 40 02 05 18 00  |...........@....|
00001ff0  01 00 00 00 0f 45 61 67  6c 65 73 00 00 00 00 00  |.....Eagles.....|

 

  1. EXAMINE DATABASE PAGE WITH PAGEINSPECT

now 2 rows are updated, originals remain

 ctid  |  case  | xmin | xmax | t_ctid
-------+--------+------+------+--------
 (0,1) | Normal |  562 |  564 | (0,3)
 (0,2) | Normal |  563 |  567 | (0,6)
 (0,3) | Normal |  564 |    0 | (0,3)
 (0,4) | Normal |  565 |    0 | (0,4)
 (0,5) | Normal |  566 |    0 | (0,5)
 (0,6) | Normal |  567 |    0 | (0,6)

 

  1. DELETE ONE ROW
delete from teams
where team_name = 'EAGLES';

 select xmin, xmax, * from teams;
xmin | xmax | team_id | team_name
------+------+---------+-----------
  565 |    0 |       3 | Flyers
  566 |    0 |       4 | Sixers
  567 |    0 |       2 | PHILLIES
(3 rows)

hexdump  -C /data/pg13/data/base/19346/19373

00001f10  37 02 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |7...............|
00001f20  06 00 02 80 02 29 18 00  02 00 00 00 13 50 48 49  |.....).......PHI|
00001f30  4c 4c 49 45 53 00 00 00  36 02 00 00 00 00 00 00  |LLIES...6.......|
00001f40  00 00 00 00 00 00 00 00  05 00 02 00 02 09 18 00  |................|
00001f50  04 00 00 00 0f 53 69 78  65 72 73 00 00 00 00 00  |.....Sixers.....|
00001f60  35 02 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |5...............|
00001f70  04 00 02 00 02 09 18 00  03 00 00 00 0f 46 6c 79  |.............Fly|
00001f80  65 72 73 00 00 00 00 00  34 02 00 00 38 02 00 00  |ers.....4...8...|
00001f90  00 00 00 00 00 00 00 00  03 00 02 a0 02 21 18 00  |.............!..|
00001fa0  01 00 00 00 0f 45 41 47  4c 45 53 00 00 00 00 00  |.....EAGLES.....|
00001fb0  33 02 00 00 37 02 00 00  00 00 00 00 00 00 00 00  |3...7...........|
00001fc0  06 00 02 40 02 05 18 00  02 00 00 00 11 50 68 69  |...@.........Phi|
00001fd0  6c 6c 65 73 00 00 00 00  32 02 00 00 34 02 00 00  |lles....2...4...|
00001fe0  00 00 00 00 00 00 00 00  03 00 02 40 02 05 18 00  |...........@....|
00001ff0  01 00 00 00 0f 45 61 67  6c 65 73 00 00 00 00 00  |.....Eagles.....|

 ctid  |  case  | xmin | xmax | t_ctid
-------+--------+------+------+--------
 (0,1) | Normal |  562 |  564 | (0,3)
 (0,2) | Normal |  563 |  567 | (0,6)
 (0,3) | Normal |  564 |  568 | (0,3)
 (0,4) | Normal |  565 |    0 | (0,4)
 (0,5) | Normal |  566 |    0 | (0,5)
 (0,6) | Normal |  567 |    0 | (0,6)
(6 rows)

This is a simple example to illustrate what is going on. Imagine this problem if you have a table with millions of rows, all with frequent updates and deletes. For performance, as well as storage and transaction id limitations, PostgreSQL needs a mechanism to maintain this. In a future post, we will examine that utility, Vacuum.

The next post An Introduction to PostgreSQL - Part 2 explores more around MVCC, focusing on concurrent access and indexes. Understanding MVCC is critical for understanding PostgreSQL, so make sure you join us!