537 阿裏雲技術社區[雲棲]

阿裏雲HybridDB for PG實踐 - 列存儲加字段和默認值

標簽

PostgreSQL , Greenplum , 相對偏移 , 列存儲 , appendonly , AO表

背景

Greenplum的Append only table支持更新、刪除。通過什麼支持呢？bitmap文件，標記被刪除的行。

因此在更新，刪除後，數據可能膨脹。

另一方麵，列存儲每列一個文件，同一行通過偏移對應起來。例如INT8的兩個字段，通過偏移很快能找到某一行的A列對應的B列。

接下來談談加字段，在加字段時，AO表示不會REWRITE TABLE的。如果AO表以及有一些垃圾（被刪除的數據）記錄後，加的字段會使用相對偏移嗎（換句話說，需不需要填充已刪除的記錄），答案是要填充。

例子

1、創建3張AO列存表。

postgres=# create table tbl1 (id int, info text) with (appendonly=true, blocksize=8192, compresstype=none, orientation=column);  
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.  
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.  
CREATE TABLE  
  
postgres=# create table tbl2 (id int, info text) with (appendonly=true, blocksize=8192, compresstype=none, orientation=column);  
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.  
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.  
CREATE TABLE  
  
postgres=# create table tbl3 (id int, info text) with (appendonly=true, blocksize=8192, compresstype=none, orientation=column);  
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.  
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.  
CREATE TABLE

2、前兩張分別插入1000萬記錄，最後一張插入2000萬記錄。

postgres=# insertinto tbl1 select generate_series(1,10000000),'test';  
INSERT 0 10000000  
postgres=# insert into tbl2 select generate_series(1,10000000),'test';  
INSERT 0 10000000  
postgres=# insert into tbl3 select generate_series(1,20000000),'test';  
INSERT 0 20000000

3、分析表，並記錄它們的大小

postgres=# analyze tbl1;  
ANALYZE  
postgres=# analyze tbl2;  
ANALYZE  
postgres=# analyze tbl3;  
ANALYZE  
  
postgres=# select pg_size_pretty(pg_relation_size('tbl1'));  
 pg_size_pretty   
----------------  
 88 MB  
(1 row)  
postgres=# select pg_size_pretty(pg_relation_size('tbl2'));  
 pg_size_pretty   
----------------  
 88 MB  
(1 row)  
postgres=# select pg_size_pretty(pg_relation_size('tbl3'));  
 pg_size_pretty   
----------------  
 173 MB  
(1 row)

4、更新第一張表，全表更新。並記錄更新後的大小，翻了一倍。

postgres=# update tbl1 set info='test';  
UPDATE 10000000  
postgres=# analyze tbl1;  
ANALYZE  
postgres=# select pg_size_pretty(pg_relation_size('tbl1'));  
 pg_size_pretty   
----------------  
 173 MB  
(1 row)

5、對三個表添加字段，設置默認值。

postgres=# alter table tbl1 add column c1 int8 default 1;  
ALTER TABLE  
postgres=# alter table tbl2 add column c1 int8 default 1;  
ALTER TABLE  
postgres=# alter table tbl3 add column c1 int8 default 1;  
ALTER TABLE

6、分析表，查看表的大小。

postgres=# analyze tbl1;  
ANALYZE  
postgres=# analyze tbl2;  
ANALYZE  
postgres=# analyze tbl3;  
ANALYZE  
  
postgres=# select pg_size_pretty(pg_relation_size('tbl1'));  
 pg_size_pretty   
----------------  
 325 MB  
(1 row)  
  
postgres=# select pg_size_pretty(pg_relation_size('tbl2'));  
 pg_size_pretty   
----------------  
 163 MB  
(1 row)  
  
postgres=# select pg_size_pretty(pg_relation_size('tbl3'));  
 pg_size_pretty   
----------------  
 325 MB  
(1 row)

很顯然，AO表在添加字段時，以已有文件的記錄數為準（即使全部都刪除了，也需要在新增字段上初始化這個值），。

如果有相對偏移量的概念，至少tbl2可以忽略前1000萬行的偏移。

堆表不存在這個問題，因為堆表加字段會REWRITE全表。

表膨脹的處理

《如何檢測、清理Greenplum垃圾 - 阿裏雲HybridDB for PG最佳實踐》

最後更新：2017-08-20 17:06:23

阿裏雲HybridDB for PG實踐 - 列存儲加字段和默認值

標簽

背景

例子

表膨脹的處理

上一篇：空間索引(GiST、BRIN、R-Tree)選擇、優化 - 阿裏雲RDS PostgreSQL最佳實踐

下一篇：阿裏雲HybridDB for PG實踐 - 行存、列存，堆表、AO表的原理和選擇

相關內容

熱門內容

最新內容

阿裏雲HybridDB for PG實踐 - 列存儲加字段和默認值

標簽

背景

例子

表膨脹的處理

上一篇： 空間索引(GiST、BRIN、R-Tree)選擇、優化 - 阿裏雲RDS PostgreSQL最佳實踐

下一篇： 阿裏雲HybridDB for PG實踐 - 行存、列存，堆表、AO表的原理和選擇

相關內容

熱門內容

最新內容

上一篇：空間索引(GiST、BRIN、R-Tree)選擇、優化 - 阿裏雲RDS PostgreSQL最佳實踐

下一篇：阿裏雲HybridDB for PG實踐 - 行存、列存，堆表、AO表的原理和選擇