PostgreSQL: New features in Version 13 - Index Deduplication

With the second beta release of PostgreSQL 13, we now have first sight of the full range of new features the PostgreSQL and Open Source world is getting excited about! Naturally, Sirius engineers are getting deep into them, and we'd like to share these exciting new features with you!

Today’s post will examine one of the new features of PostgreSQL 13 – B-tree index de-duplication.

Index de-duplication allows for the grouping of duplicate tuples into a single key with an assocated array identifying the individual tuples.

The benefits are:

  1. Decreased storage
  2. Potential query throughput gain
  3. Reduced index maintenance overhead

 

We will test with a simple table of countries and isocodes

\d idx_test

              Table "geoname.idx_test"

 Column  |  Type   | Collation | Nullable | Default
---------+---------+-----------+----------+---------
 isocode | integer |           |          |
 name    | text    |           |          |

Indexes:
    "name2_ind" btree (name)

 

The table has 10M rows with the following distribution histogram (script provided by the Faraday Blog):

 bucket |   range   |  freq   |       bar
--------+-----------+---------+-----------------
      1 | [0,44]    |  479700 | ***
      2 | [48,86]   |  280620 | **
      3 | [90,132]  |  425070 | ***
      4 | [136,178] |  887670 | ******
      5 | [180,222] |  219510 | **
      6 | [226,268] |  301590 | **
      7 | [270,312] |  230670 | **
      8 | [316,356] |  665910 | *****
      9 | [360,400] |  870660 | ******
     10 | [404,446] |  287460 | **
     11 | [450,484] |  535410 | ****
     12 | [492,535] |  346320 | **
     13 | [540,580] |  693630 | *****
     14 | [583,624] |  486450 | ***
     15 | [626,670] |  387990 | ***
     16 | [674,710] |  250470 | **
     17 | [716,756] |  285750 | **
     18 | [760,804] |  436770 | ***
     19 | [807,840] | 2102760 | ***************
     20 | [850,887] |  151020 | *
     21 | [894,894] |   25470 |
(21 rows)

 

We created the same table in prior versions and a copy in v12 with the storage parameter ‘deduplicate_items’ set to off ( default is on)

The index storage difference is quite significant:

DB Version 13.2

13.2 No Deduplication

12.3 9.5.22
Index Size 69MB 266MB 266MB 265MB

 

 

 

If we run the following in all instances

explain (analyze, verbose ,buffers on )
select isocode from idx_test where name = 'United States';

 

DB Version 13.2

13.2 No Deduplication

12.3 9.5.22
Buffers Shared Hit 1695 7589 7589 7588

 

 

 

Buffers: shared hit are blocks that are read from cache.

 

Vacuum performance - update 2M rows / vacuum

DB VERSION Update Time Index Pages Vacuumed Vacuum Time
9.5.22 21 Sec 7583 1.97 Sec
12.3 19 Sec 7630 2.32 Sec
13.2 No Deduplication 21 Sec 7630 1.9 Sec
13.2 18 Sec 1701 1.5 Sec

 

 

 

 

 

Re-index performance - postgresql 9.5 is included for completeness as concurrent index rebuilds is not available in that version.

DB VERSION Pre Re-index Size Re-index Time Post Re-index Size
9.5.22 376 MB 39 Sec 300 MB
12.3 354 MB 24 Sec 300 MB
13.2 No Deduplication 354 MB 16 Sec 300 MB
13.2 83 MB 12 Sec 69 MB

 

 

 

 

 

This is a very useful implementation, and is enabled by default. To disable this (per index) you would use the following command:

create index test_nodedup on idx_test2(name) with (deduplicate_items = off);