阿裏雲HybridDB for PG實踐 - 列存儲加字段和默認值
標簽
PostgreSQL , Greenplum , 相對偏移 , 列存儲 , appendonly , AO表
背景
Greenplum的Append only table支持更新、刪除。通過什麼支持呢?bitmap文件,標記被刪除的行。
因此在更新,刪除後,數據可能膨脹。
另一方麵,列存儲每列一個文件,同一行通過偏移對應起來。例如INT8的兩個字段,通過偏移很快能找到某一行的A列對應的B列。
接下來談談加字段,在加字段時,AO表示不會REWRITE TABLE的。如果AO表以及有一些垃圾(被刪除的數據)記錄後,加的字段會使用相對偏移嗎(換句話說,需不需要填充已刪除的記錄),答案是要填充。
例子
1、創建3張AO列存表。
postgres=# create table tbl1 (id int, info text) with (appendonly=true, blocksize=8192, compresstype=none, orientation=column);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
postgres=# create table tbl2 (id int, info text) with (appendonly=true, blocksize=8192, compresstype=none, orientation=column);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
postgres=# create table tbl3 (id int, info text) with (appendonly=true, blocksize=8192, compresstype=none, orientation=column);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
2、前兩張分別插入1000萬記錄,最後一張插入2000萬記錄。
postgres=# insertinto tbl1 select generate_series(1,10000000),'test';
INSERT 0 10000000
postgres=# insert into tbl2 select generate_series(1,10000000),'test';
INSERT 0 10000000
postgres=# insert into tbl3 select generate_series(1,20000000),'test';
INSERT 0 20000000
3、分析表,並記錄它們的大小
postgres=# analyze tbl1;
ANALYZE
postgres=# analyze tbl2;
ANALYZE
postgres=# analyze tbl3;
ANALYZE
postgres=# select pg_size_pretty(pg_relation_size('tbl1'));
pg_size_pretty
----------------
88 MB
(1 row)
postgres=# select pg_size_pretty(pg_relation_size('tbl2'));
pg_size_pretty
----------------
88 MB
(1 row)
postgres=# select pg_size_pretty(pg_relation_size('tbl3'));
pg_size_pretty
----------------
173 MB
(1 row)
4、更新第一張表,全表更新。並記錄更新後的大小,翻了一倍。
postgres=# update tbl1 set info='test';
UPDATE 10000000
postgres=# analyze tbl1;
ANALYZE
postgres=# select pg_size_pretty(pg_relation_size('tbl1'));
pg_size_pretty
----------------
173 MB
(1 row)
5、對三個表添加字段,設置默認值。
postgres=# alter table tbl1 add column c1 int8 default 1;
ALTER TABLE
postgres=# alter table tbl2 add column c1 int8 default 1;
ALTER TABLE
postgres=# alter table tbl3 add column c1 int8 default 1;
ALTER TABLE
6、分析表,查看表的大小。
postgres=# analyze tbl1;
ANALYZE
postgres=# analyze tbl2;
ANALYZE
postgres=# analyze tbl3;
ANALYZE
postgres=# select pg_size_pretty(pg_relation_size('tbl1'));
pg_size_pretty
----------------
325 MB
(1 row)
postgres=# select pg_size_pretty(pg_relation_size('tbl2'));
pg_size_pretty
----------------
163 MB
(1 row)
postgres=# select pg_size_pretty(pg_relation_size('tbl3'));
pg_size_pretty
----------------
325 MB
(1 row)
很顯然,AO表在添加字段時,以已有文件的記錄數為準(即使全部都刪除了,也需要在新增字段上初始化這個值),。
如果有相對偏移量的概念,至少tbl2可以忽略前1000萬行的偏移。
堆表不存在這個問題,因為堆表加字段會REWRITE全表。
表膨脹的處理
《如何檢測、清理Greenplum垃圾 - 阿裏雲HybridDB for PG最佳實踐》
最後更新:2017-08-20 17:06:23