490 阿裏雲技術社區[雲棲]

如何檢測、清理Greenplum垃圾 - 阿裏雲HybridDB for PG最佳實踐

標簽

PostgreSQL , Greenplum , HDB for PG

背景

Greenplum通過多版本支持數據的刪除和更新的並發和回滾，在刪除數據時（使用DELETE刪除），對記錄的頭部xmax值進行標記。在刪除記錄時，對記錄的頭部進行標記，同時插入新的版本。

這一就會導致一個問題，如果用戶經常刪除和插入或更新數據，表和索引都會膨脹。

PostgreSQL是通過HOT技術以及autovacuum來避免或減少垃圾的。但是Greenplum沒有自動回收的worker進程，所以需要人為的觸發。

如何查找膨脹的表或索引

Greenplum

1、首先要更新表的統計信息，因為接下來的兩個視圖是從統計信息來計算膨脹率的。

連接到所有目標庫執行：  
  
analyze;

2、查詢gp_toolkit.gp_bloat_diag，膨脹較厲害的表。

Use the gp_toolkit administrative schema:

gp_toolkit.gp_bloat_diag - this view shows tables with moderate and significant amount of bloat

列:

bdirelid - Object ID of the table (pg_class.oid)  
  
bdinspname - table schema name  
  
bdirelname - table name  
  
bdirelpages - number of pages currently in table data files  
  
bdiexppages - number of pages expected according to current statistics  
  
bdidiag - diagnosis of bloat (ratio from 1 to 3 -> no bloat, ratio from 4 to 10 -> moderate bloat, ratio > 10 -> significant bloat)

例子:

In this example the table "t1" is severely bloated (the calculated data size for data currently in table is 1 page, but table consists of 97 pages).

gpadmin=# select * from gp_toolkit.gp_bloat_diag;  
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages |                bdidiag                  
----------+------------+------------+-------------+-------------+---------------------------------------  
    21488 | public     | t1         |          97 |           1 | significant amount of bloat suspected  
(1 row)

3、查詢gp_toolkit.gp_bloat_expected_pages，所有對象的膨脹明細。

gp_toolkit.gp_bloat_expected_pages - this view shows the page data (current pages/expected pages) for all tables in the database Columns:

列：

btdrelid - Object ID of the table (pg_class.oid)  
  
btdrelpages - number of pages currently in table data files  
  
btdexppages - number of pages expected according to current statistics

例子：

In this example the tables shown all have calculated data size of 1 page and actual data file size 1 page. No bloat is detected.

gpadmin=# select * from gp_toolkit.gp_bloat_expected_pages limit 5;  
 btdrelid | btdrelpages | btdexppages   
----------+-------------+-------------  
    10789 |           1 |           1  
    10794 |           1 |           1  
    10799 |           1 |           1  
     5004 |           1 |           1  
     7175 |           1 |           1  
(5 rows)

4、GP的係統表也可能產生垃圾，例如頻繁的使用臨時表（臨時表是會話級別的，所以每次使用都要創建。會在pg_class, pg_attribute等係統表產生寫入和刪除的動作。產生係統表垃圾）。

PostgreSQL

1、首先要更新表的統計信息，因為接下來的兩個視圖是從統計信息來計算膨脹率的。

如果配置了autovacuum，PostgreSQL會自動生成統計信息，不需要人為幹預。

2、查看膨脹的表、索引

《PostgreSQL 如何精確計算表膨脹(fsm,數據塊layout講解) - PostgreSQL table exactly bloat monitor use freespace map data》

《PostgreSQL 垃圾回收原理以及如何預防膨脹 - How to prevent object bloat in PostgreSQL》

連接到對應的數據庫查詢。

SELECT  
  current_database() AS db, schemaname, tablename, reltuples::bigint AS tups, relpages::bigint AS pages, otta,  
  ROUND(CASE WHEN otta=0 OR sml.relpages=0 OR sml.relpages=otta THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat,  
  CASE WHEN relpages < otta THEN 0 ELSE relpages::bigint - otta END AS wastedpages,  
  CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes,  
  CASE WHEN relpages < otta THEN $$0 bytes$$::text ELSE (bs*(relpages-otta))::bigint || $$ bytes$$ END AS wastedsize,  
  iname, ituples::bigint AS itups, ipages::bigint AS ipages, iotta,  
  ROUND(CASE WHEN iotta=0 OR ipages=0 OR ipages=iotta THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat,  
  CASE WHEN ipages < iotta THEN 0 ELSE ipages::bigint - iotta END AS wastedipages,  
  CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes,  
  CASE WHEN ipages < iotta THEN $$0 bytes$$ ELSE (bs*(ipages-iotta))::bigint || $$ bytes$$ END AS wastedisize,  
  CASE WHEN relpages < otta THEN  
    CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta::bigint) END  
    ELSE CASE WHEN ipages < iotta THEN bs*(relpages-otta::bigint)  
      ELSE bs*(relpages-otta::bigint + ipages-iotta::bigint) END  
  END AS totalwastedbytes  
FROM (  
  SELECT  
    nn.nspname AS schemaname,  
    cc.relname AS tablename,  
    COALESCE(cc.reltuples,0) AS reltuples,  
    COALESCE(cc.relpages,0) AS relpages,  
    COALESCE(bs,0) AS bs,  
    COALESCE(CEIL((cc.reltuples*((datahdr+ma-  
      (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)),0) AS otta,  
    COALESCE(c2.relname,$$?$$) AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,  
    COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols  
  FROM  
     pg_class cc  
  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <> $$information_schema$$  
  LEFT JOIN  
  (  
    SELECT  
      ma,bs,foo.nspname,foo.relname,  
      (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,  
      (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2  
    FROM (  
      SELECT  
        ns.nspname, tbl.relname, hdr, ma, bs,  
        SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS datawidth,  
        MAX(coalesce(null_frac,0)) AS maxfracsum,  
        hdr+(  
          SELECT 1+count(*)/8  
          FROM pg_stats s2  
          WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND s2.tablename = tbl.relname  
        ) AS nullhdr  
      FROM pg_attribute att   
      JOIN pg_class tbl ON att.attrelid = tbl.oid  
      JOIN pg_namespace ns ON ns.oid = tbl.relnamespace   
      LEFT JOIN pg_stats s ON s.schemaname=ns.nspname  
      AND s.tablename = tbl.relname  
      AND s.inherited=false  
      AND s.attname=att.attname,  
      (  
        SELECT  
          (SELECT current_setting($$block_size$$)::numeric) AS bs,  
            CASE WHEN SUBSTRING(SPLIT_PART(v, $$ $$, 2) FROM $$#"[0-9]+.[0-9]+#"%$$ for $$#$$)  
              IN ($$8.0$$,$$8.1$$,$$8.2$$) THEN 27 ELSE 23 END AS hdr,  
          CASE WHEN v ~ $$mingw32$$ OR v ~ $$64-bit$$ THEN 8 ELSE 4 END AS ma  
        FROM (SELECT version() AS v) AS foo  
      ) AS constants  
      WHERE att.attnum > 0 AND tbl.relkind=$$r$$  
      GROUP BY 1,2,3,4,5  
    ) AS foo  
  ) AS rs  
  ON cc.relname = rs.relname AND nn.nspname = rs.nspname  
  LEFT JOIN pg_index i ON indrelid = cc.oid  
  LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid  
) AS sml order by wastedbytes desc limit 5

select datname from pg_database where datname not in ($$template0$$, $$template1$$)'`  
do  
psql -d $db --pset=pager=off -q -x -c 'SELECT  
  current_database() AS db, schemaname, tablename, reltuples::bigint AS tups, relpages::bigint AS pages, otta,  
  ROUND(CASE WHEN otta=0 OR sml.relpages=0 OR sml.relpages=otta THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat,  
  CASE WHEN relpages < otta THEN 0 ELSE relpages::bigint - otta END AS wastedpages,  
  CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes,  
  CASE WHEN relpages < otta THEN $$0 bytes$$::text ELSE (bs*(relpages-otta))::bigint || $$ bytes$$ END AS wastedsize,  
  iname, ituples::bigint AS itups, ipages::bigint AS ipages, iotta,  
  ROUND(CASE WHEN iotta=0 OR ipages=0 OR ipages=iotta THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat,  
  CASE WHEN ipages < iotta THEN 0 ELSE ipages::bigint - iotta END AS wastedipages,  
  CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes,  
  CASE WHEN ipages < iotta THEN $$0 bytes$$ ELSE (bs*(ipages-iotta))::bigint || $$ bytes$$ END AS wastedisize,  
  CASE WHEN relpages < otta THEN  
    CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta::bigint) END  
    ELSE CASE WHEN ipages < iotta THEN bs*(relpages-otta::bigint)  
      ELSE bs*(relpages-otta::bigint + ipages-iotta::bigint) END  
  END AS totalwastedbytes  
FROM (  
  SELECT  
    nn.nspname AS schemaname,  
    cc.relname AS tablename,  
    COALESCE(cc.reltuples,0) AS reltuples,  
    COALESCE(cc.relpages,0) AS relpages,  
    COALESCE(bs,0) AS bs,  
    COALESCE(CEIL((cc.reltuples*((datahdr+ma-  
      (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)),0) AS otta,  
    COALESCE(c2.relname,$$?$$) AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,  
    COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols  
  FROM  
     pg_class cc  
  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <> $$information_schema$$  
  LEFT JOIN  
  (  
    SELECT  
      ma,bs,foo.nspname,foo.relname,  
      (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,  
      (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2  
    FROM (  
      SELECT  
        ns.nspname, tbl.relname, hdr, ma, bs,  
        SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS datawidth,  
        MAX(coalesce(null_frac,0)) AS maxfracsum,  
        hdr+(  
          SELECT 1+count(*)/8  
          FROM pg_stats s2  
          WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND s2.tablename = tbl.relname  
        ) AS nullhdr  
      FROM pg_attribute att   
      JOIN pg_class tbl ON att.attrelid = tbl.oid  
      JOIN pg_namespace ns ON ns.oid = tbl.relnamespace   
      LEFT JOIN pg_stats s ON s.schemaname=ns.nspname  
      AND s.tablename = tbl.relname  
      AND s.inherited=false  
      AND s.attname=att.attname,  
      (  
        SELECT  
          (SELECT current_setting($$block_size$$)::numeric) AS bs,  
            CASE WHEN SUBSTRING(SPLIT_PART(v, $$ $$, 2) FROM $$#"[0-9]+.[0-9]+#"%$$ for $$#$$)  
              IN ($$8.0$$,$$8.1$$,$$8.2$$) THEN 27 ELSE 23 END AS hdr,  
          CASE WHEN v ~ $$mingw32$$ OR v ~ $$64-bit$$ THEN 8 ELSE 4 END AS ma  
        FROM (SELECT version() AS v) AS foo  
      ) AS constants  
      WHERE att.attnum > 0 AND tbl.relkind=$$r$$  
      GROUP BY 1,2,3,4,5  
    ) AS foo  
  ) AS rs  
  ON cc.relname = rs.relname AND nn.nspname = rs.nspname  
  LEFT JOIN pg_index i ON indrelid = cc.oid  
  LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid  
) AS sml order by wastedibytes desc limit 5

根據浪費的字節數, 設置合適的autovacuum_vacuum_scale_factor, 大表如果頻繁的有更新或刪除和插入操作, 建議設置較小的autovacuum_vacuum_scale_factor來降低浪費空間.   
  
同時還需要打開autovacuum, 根據服務器的內存大小, CPU核數, 設置足夠大的autovacuum_work_mem 或 autovacuum_max_workers 或 maintenance_work_mem, 以及足夠小的 autovacuum_naptime .   
  
同時還需要分析是否對大數據庫使用了邏輯備份pg_dump, 係統中是否經常有長SQL, 長事務. 這些都有可能導致膨脹.   
  
使用pg_reorg或者vacuum full可以回收膨脹的空間.   
  
參考: https://blog.163.com/digoal@126/blog/static/1638770402015329115636287/   
  
otta評估出的表實際需要頁數, iotta評估出的索引實際需要頁數;   
  
bs數據庫的塊大小;   
  
tbloat表膨脹倍數, ibloat索引膨脹倍數, wastedpages表浪費了多少個數據塊, wastedipages索引浪費了多少個數據塊;   
  
wastedbytes表浪費了多少字節, wastedibytes索引浪費了多少字節;

如何回收垃圾

Greenplum

1、vacuum full

注意，vacuum full不能回收索引的膨脹空間。vacuum full 加載的鎖與DDL鎖類似，是排它鎖。建議在沒有業務的時候執行，不要堵塞業務。

使用vacuum full回收垃圾的建議操作流程：

1 記錄下表的索引定義  
  
2 刪除索引  
  
3 vacuum full 表  
  
4 並行重建索引

例子

postgres=# create table bloat_test(id int , info text);  
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.  
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.  
CREATE TABLE  
  
postgres=# insert into bloat_test select generate_series(1,100000000), 'test';  
INSERT 0 100000000  
  
postgres=# create index idx_bloat_test on bloat_test(id);  
CREATE INDEX  
  
postgres=# update bloat_test set info='new';  
UPDATE 100000000  
  
postgres=# select * from gp_toolkit.gp_bloat_expected_pages where btdrelid='bloat_test'::Regclass;  
 btdrelid | btdrelpages | btdexppages   
----------+-------------+-------------  
   629640 |      244453 |      102753  
(1 row)  
  
postgres=# select pg_size_pretty(pg_relation_size('bloat_test'));  
 pg_size_pretty   
----------------  
 7639 MB  
(1 row)  
  
postgres=# select pg_size_pretty(pg_relation_size('idx_bloat_test'));  
 pg_size_pretty   
----------------  
 6380 MB  
(1 row)

回收垃圾的錯誤做法

postgres=# vacuum full bloat_test ;  
VACUUM  
postgres=# select pg_size_pretty(pg_relation_size('bloat_test'));  
 pg_size_pretty   
----------------  
 3526 MB  
(1 row)  
  
索引沒有被回收：  
postgres=# select pg_size_pretty(pg_relation_size('idx_bloat_test'));  
 pg_size_pretty   
----------------  
 6380 MB  
(1 row)

回收垃圾的正確做法

postgres=# drop index idx_bloat_test ;  
DROP INDEX  
postgres=# vacuum full bloat_test ;  
VACUUM  
postgres=# create index idx_bloat_test on bloat_test(id);  
CREATE INDEX  
  
postgres=# select pg_size_pretty(pg_relation_size('bloat_test'));  
 pg_size_pretty   
----------------  
 3526 MB  
(1 row)  
  
postgres=# select pg_size_pretty(pg_relation_size('idx_bloat_test'));  
 pg_size_pretty   
----------------  
 2174 MB  
(1 row)

2、alter table set distribute

alter table <table_name> set with (reorganize=true) distributed randomly;  
  
或  
  
alter table <table_name> set with (reorganize=true) distributed by (<column_names1>,<column_names2>....)

set distribute可以回收索引的膨脹空間。set distribute 加載的鎖與DDL鎖類似，是排它鎖。建議在沒有業務的時候執行，不要堵塞業務。

同時set distribute隻要分布條件不變，就是在節點內完成的，不會涉及數據的重分布。

建議的操作流程：

1 記錄下表的分布列  
  
2 執行set distribute (REORGANIZE=true)  
  
如果是隨機分布，則設置為隨機分布

例子

postgres=# update bloat_test set info='test';  
UPDATE 100000000  
  
postgres=# select pg_size_pretty(pg_relation_size('bloat_test'));  
 pg_size_pretty   
----------------  
 7727 MB  
(1 row)  
  
postgres=# select pg_size_pretty(pg_relation_size('idx_bloat_test'));  
 pg_size_pretty   
----------------  
 4411 MB  
(1 row)  
  
根據當前的分布規則，重分布：  
  
表和索引的垃圾都回收了。  
  
postgres=# alter table bloat_test set with (reorganize=true) distributed by (id);  
ALTER TABLE  
postgres=# select pg_size_pretty(pg_relation_size('bloat_test'));  
 pg_size_pretty   
----------------  
 4201 MB  
(1 row)  
  
postgres=# select pg_size_pretty(pg_relation_size('idx_bloat_test'));  
 pg_size_pretty   
----------------  
 2130 MB  
(1 row)

PostgreSQL

PostgreSQL 通常用在OLTP係統中，業務對數據庫的可用性比OLAP係統要高很多，所以長時間持有排它鎖的VACUUM FULL要少用。

通常PostgreSQL的autovacuum參數開啟後，不會導致大量的膨脹，除非有長事務、或者人為的設定（例如防止備庫QUERY與VACUUM衝突的設定）妨礙了垃圾回收。這些都有解決方法，如下：

《為什麼啤酒和紙尿褲最搭 - 用HybridDB/PostgreSQL查詢商品營銷最佳組合》

《PostgreSQL snapshot too old補丁, 防止數據庫膨脹》

《PostgreSQL 9.6 快照過舊 - 源碼淺析》

PostgreSQL的垃圾回收方法舉例：

1、首推reorg的方法，這種方法是新建一個對象，增量同步到新的對象，最後將新對象的DATAFILE和老對象（膨脹對象）的DATAFILE進行交換。

僅僅是交換文件時，需要一個排它鎖，非常短暫。

參考

《PostgreSQL 收縮膨脹表或索引 - pg_squeeze or pg_repack》

2、如果你沒有按照pg_squeeze或pg_repack插件，那麼在遇到膨脹後，可以通過vacuum full來回收，PostgreSQL 9.0以後，VACUUM FULL會回收索引的垃圾，比GP更高級一點。

建議的操作

set lock_timeout='2s';  
vacuum full tablename;

注意事項

1、在執行vacuum full或alter table回收垃圾時，務必注意這個是排它鎖，請在維護窗口執行，或者至少應該加一個鎖超時的設定在開始搞。

2、PG通常不會產生膨脹，除非配置或使用不規範。見文中詳解。

3、PG的垃圾回收，建議使用reorg的方式，盡量避免使用vacuum full。

4、GP評估垃圾時，如果你發現沒有垃圾，別高興太早，有可能是統計信息沒有收集。所以保持有節奏的analyze是好的習慣。

另外可以參考一下這個ISSUE，不需要依賴ANALYZE，通過采樣的方法評估垃圾比例。

https://github.com/greenplum-db/gpdb/issues/706

參考文檔

1、alter table 語法

https://gpdb.docs.pivotal.io/4370/ref_guide/sql_commands/ALTER_TABLE.html

2、性能診斷

https://gpdb.docs.pivotal.io/4330/admin_guide/perf_issues.html

3、日常維護

https://gpdb.docs.pivotal.io/4330/admin_guide/managing/maintain.html

4、表膨脹

https://discuss.pivotal.io/hc/en-us/articles/202873573-FAQ-Bloat-in-HEAP-tables

5、消除表膨脹的方法

https://discuss.pivotal.io/hc/en-us/articles/206578327-What-are-the-different-option-to-remove-bloat-from-a-table-

6、如何通過重分布消除表膨脹

https://discuss.pivotal.io/hc/en-us/articles/203248413-Tip-Removing-Fragmentaion-aka-bloat-on-a-relation-using-Redistribute-

最後更新：2017-08-20 17:06:17

如何檢測、清理Greenplum垃圾 - 阿裏雲HybridDB for PG最佳實踐

標簽

背景

如何查找膨脹的表或索引

Greenplum

PostgreSQL

如何回收垃圾

Greenplum

PostgreSQL

注意事項

參考文檔

上一篇：手工清理 cgroup中的page cache

下一篇：解讀2017-08-10發布的幾個安全漏洞

相關內容

熱門內容

最新內容

如何檢測、清理Greenplum垃圾 - 阿裏雲HybridDB for PG最佳實踐

標簽

背景

如何查找膨脹的表或索引

Greenplum

PostgreSQL

如何回收垃圾

Greenplum

PostgreSQL

注意事項

參考文檔

上一篇： 手工清理 cgroup中的page cache

下一篇： 解讀2017-08-10發布的幾個安全漏洞

相關內容

熱門內容

最新內容

上一篇：手工清理 cgroup中的page cache

下一篇：解讀2017-08-10發布的幾個安全漏洞