Overview:
pg_repack is a PostgreSQL extension that lets you remove bloat from tables and indexes, and optionally restore the physical order of clustered indexes. Unlike CLUSTER and VACUUM FULL it works online, without holding an exclusive lock on the processed tables during processing. pg_repack is efficient to boot, with performance comparable to using CLUSTER directly.
What is pg_repack?
pg_repack
is a powerful PostgreSQL extension designed to alleviate bloat in tables and indexes, with the added capability to restore the physical order of clustered indexes. By facilitating online table rebuilding, it offers a seamless solution for PostgreSQL users.
Advantages of Using pg_repack:
Release Storage: Reclaim disk space by efficiently releasing storage from a table.
Rebuild Tables: Significantly reduce I/O by rebuilding tables.
Bloat Removal: Clear storage piled due to dead tuples, often left uncleared by the auto-vacuum process.
Getting Started:
Step 1: Installation in different OS
RHEL/Fedora/Almalinux:
# Install pg_repack12
yum install pg_repack14
# Configure shared_preload_libraries
sudo su postgres
psql
psql # set shared_preload_libraries = 'pg_repack';
/usr/pgsql-14/bin/pg_ctl -D $PGDATA restart -mf
Ubuntu/Debian:
sudo apt-get update -y
sudo apt-get install postgresql-14-repack
Step 2: Create the Extension in the Target Database
Let's assume the database name is IMDB
sudo su postgres
psql
\c imdb;
imdb# CREATE EXTENSION pg_repack;
CREATE EXTENSION
imdb=# SELECT * FROM pg_available_extensions WHERE name = 'pg_repack';
name | default_version | installed_version | comment
-----------+-----------------+-------------------+--------------------------------------------------------------
pg_repack | 1.5.0 | 1.5.0 | Reorganize tables in PostgreSQL databases with minimal locks
Step 3: Using pg_repack to Rebuild Tables Online
# Dry run for table 'thriller'
pg_repack --dry-run -d imdb --table thriller
# Actual run for table 'actor'
pg_repack -d imdb --table thriller
Rebuilding an Entire Database:
For RHEL/Almalinux/Rockylinux
# Dry run for the entire database
pg_repack --dry-run -d imdb
# Actual run for the entire database with parallel jobs (e.g., 2 jobs)
pg_repack -d imdb -j 2
For Ubuntu/Debian
/usr/lib/postgresql/14/bin$ pg_repack -d imdb -j 2
-j 2
option specifies the number of parallel jobs to use during the repack process. In this case, it's set to 2, indicating that the repack operation can be performed using two parallel jobs.
Note: Remote execution is possible, with the condition that the pg_repack version matches.
Appendix: Removing pg_repack Extension:
DROP EXTENSION pg_repack;