Revitalize PostgreSQL with pg_repack: A Gentle Vacuum Solution

Revitalize PostgreSQL with pg_repack: A Gentle Vacuum Solution

Overview:

pg_repack is a PostgreSQL extension that lets you remove bloat from tables and indexes, and optionally restore the physical order of clustered indexes. Unlike CLUSTER and VACUUM FULL it works online, without holding an exclusive lock on the processed tables during processing. pg_repack is efficient to boot, with performance comparable to using CLUSTER directly.

What is pg_repack?

pg_repack is a powerful PostgreSQL extension designed to alleviate bloat in tables and indexes, with the added capability to restore the physical order of clustered indexes. By facilitating online table rebuilding, it offers a seamless solution for PostgreSQL users.

Advantages of Using pg_repack:

  • Release Storage: Reclaim disk space by efficiently releasing storage from a table.

  • Rebuild Tables: Significantly reduce I/O by rebuilding tables.

  • Bloat Removal: Clear storage piled due to dead tuples, often left uncleared by the auto-vacuum process.

Getting Started:

Step 1: Installation in different OS

RHEL/Fedora/Almalinux:

# Install pg_repack12
yum install pg_repack14

# Configure shared_preload_libraries
sudo su postgres
psql
psql # set shared_preload_libraries = 'pg_repack';
/usr/pgsql-14/bin/pg_ctl -D $PGDATA restart -mf

Ubuntu/Debian:

sudo apt-get update -y
sudo apt-get install postgresql-14-repack

Step 2: Create the Extension in the Target Database

Let's assume the database name is IMDB

sudo su postgres
psql
\c imdb;

imdb# CREATE EXTENSION pg_repack;
CREATE EXTENSION

imdb=# SELECT * FROM pg_available_extensions WHERE name = 'pg_repack';
   name    | default_version | installed_version |                           comment                            
-----------+-----------------+-------------------+--------------------------------------------------------------
 pg_repack | 1.5.0           | 1.5.0             | Reorganize tables in PostgreSQL databases with minimal locks

Step 3: Using pg_repack to Rebuild Tables Online

# Dry run for table 'thriller'
pg_repack --dry-run -d imdb --table thriller

# Actual run for table 'actor'
pg_repack -d imdb --table thriller

Rebuilding an Entire Database:

For RHEL/Almalinux/Rockylinux

# Dry run for the entire database
pg_repack --dry-run -d imdb

# Actual run for the entire database with parallel jobs (e.g., 2 jobs)
pg_repack -d imdb -j 2

For Ubuntu/Debian

/usr/lib/postgresql/14/bin$ pg_repack -d imdb -j 2

-j 2 option specifies the number of parallel jobs to use during the repack process. In this case, it's set to 2, indicating that the repack operation can be performed using two parallel jobs.

Note: Remote execution is possible, with the condition that the pg_repack version matches.

Appendix: Removing pg_repack Extension:

DROP EXTENSION pg_repack;