I wanted to figure out how big (or approximately how big) my dump file would be. In QA, I have (relatively) large dbs (compared to prod). Enter the pgstattuple extension to help me determine how much free space is in my tables.
Step 1: Installing the pgstattuple Extension
First, you need to install the pgstattuple
extension. This extension allows you to gather statistics about the physical storage of tables and indexes, including free space and dead tuples.
CREATE EXTENSION IF NOT EXISTS pgstattuple;
This will enable the pgstattuple
extension, allowing you to analyze your tables’ physical layout. The function pgstattuple_approx()
is handy for approximating free space and dead tuple size.
Step 2: Analyzing Table Bloat
Now, let’s dive into the query that helps us understand bloat and free space in tables:
SELECT
n.nspname AS schemaname,
c.relname AS tablename,
pg_total_relation_size(c.oid) AS total_size,
(pgstattuple_approx(c.oid)).approx_free_space AS free_space_bytes,
(pgstattuple_approx(c.oid)).dead_tuple_len AS dead_tuples_bytes,
pg_total_relation_size(c.oid) - (pgstattuple_approx(c.oid)).approx_tuple_len AS bloat_size,
CASE
WHEN pg_total_relation_size(c.oid) > 0
THEN (100.0 * (pg_total_relation_size(c.oid) - (pgstattuple_approx(c.oid)).approx_tuple_len) / pg_total_relation_size(c.oid))::NUMERIC(5, 2)
ELSE 0.00
END AS bloat_ratio
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE
c.relkind = 'r' -- Only tables
AND n.nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY bloat_ratio DESC;
Explanation of the Query
- Table and Schema Names:
n.nspname AS schemaname
: Retrieves the schema name.c.relname AS tablename
: Retrieves the table name.
- Table Size:
pg_total_relation_size(c.oid) AS total_size
: This gives the total size of the table, including data, indexes, and TOAST (out-of-line storage for large values).
- Free Space and Dead Tuples:
(pgstattuple_approx(c.oid)).approx_free_space AS free_space_bytes
: This returns the approximate free space within the table in bytes.(pgstattuple_approx(c.oid)).dead_tuple_len AS dead_tuples_bytes
: This returns the size of dead tuples (rows marked for deletion but not yet vacuumed).
- Bloat Size:
pg_total_relation_size(c.oid) - (pgstattuple_approx(c.oid)).approx_tuple_len AS bloat_size
: The difference between the total table size and the live tuple size represents the “bloat” caused by unused space within the table.
- Bloat Ratio:
- The
CASE
statement calculates the bloat ratio as a percentage:
CASE
WHEN pg_total_relation_size(c.oid) > 0
THEN (100.0 * (pg_total_relation_size(c.oid) - (pgstattuple_approx(c.oid)).approx_tuple_len) / pg_total_relation_size(c.oid))::NUMERIC(5, 2)
ELSE 0.00
END AS bloat_ratio
This gives you the percentage of space in the table wasted due to bloat. A higher ratio indicates a higher level of bloat. - The
- Filtering and Sorting:
WHERE c.relkind = 'r'
: Filters the results only to include regular tables (ignores indexes, views, etc.).AND n.nspname NOT IN ('pg_catalog', 'information_schema')
: Excludes system schemas.ORDER BY bloat_ratio DESC
: Orders the tables by their bloat ratio, with the most bloated tables appearing at the top.
Step 3: Interpreting the Results
The result of this query gives you a clear view of table bloat in your PostgreSQL database:
schemaname
: The schema the table resides in.tablename
: The name of the table.total_size
: The total size of the table, including data, indexes, and TOAST.free_space_bytes
: The approximate amount of unused space in the table.dead_tuples_bytes
: The size of dead tuples (rows marked for deletion but not yet cleaned up).bloat_size
: The amount of space wasted due to dead tuples and fragmentation.bloat_ratio
: The percentage of the table’s size that is consumed by bloat.
Step 4: Addressing Table Bloat
If you find that some tables have a significant amount of bloat, it may be time to run a VACUUM
or REINDEX
operation. This is a topic for another time. Right now, I just wanted to find out how much free space there is to determine how big my dump file will be.
VACUUM
: This cleans up dead tuples, allowing space to be reused.REINDEX
: Rebuilds the indexes to reclaim space consumed by fragmentation.