空間|時間|對象 圈人 + 目標人群透視 - 暨PostgreSQL 10與Greenplum的對比和選擇
標簽
PostgreSQL , PostGIS , geohash , brin , gist索引 , Greenplum , HybridDB for PostgreSQL
背景
通常一個人的常駐地可能會包括:家、兒女家、雙方父母家、情人、異性伴侶家、公司、商圈若幹等。
通過對這些數據的運營,可以實現很多業務需求。例如:
1、尋人
《海量用戶實時定位和圈人 - 團圓社會公益係統(位置尋人\圈人)》
2、線下廣告投放人群圈選,選址,商圈人群畫像。
《數據尋龍點穴(空間聚集分析) - 阿裏雲RDS PostgreSQL最佳實踐》
《(新零售)商戶網格化(基於位置GIS)運營 - 阿裏雲RDS PostgreSQL、HybridDB for PostgreSQL最佳實踐》
3、基於位置的用戶畫像透視、基於用戶群體的位置透視、以上需求再疊加時間區間條件進行透視。比如
以地圖為底,在圖上展示每個BOX、GRID(例如每方圓100米,用width_bucket或自定義UDF可以得到這些box id)的平均收入、平均消費。通過顏色深淺來表示收入和消費的數值。
再細一點,可以再分消費領域(飲食、衣服、電子產品、書籍、。。。。),這樣是不是一眼就能看出該去哪裏開什麼類型的店了呢?當然電商發達的今天,一定要考慮線上和線下結合的。
這類空間圈人 + 人物透視的場景中,Greenplum無疑是一個很好的選擇(簡單粗暴、功能性能都滿足),PostgreSQL 10也可以,到底選擇PostgreSQL還是Greenplum呢?
Greenplum和PostgreSQL兩個產品的特色和選擇指導
1、PostgreSQL 10 適合以10TB ~ 100TB,OLTP為主,OLAP為輔的場景。與Oracle覆蓋的場景非常類似。
兼容SQL:2011,百萬+級tpmC。
支持多核並行計算。
支持可讀寫的OSS對象存儲外部表。
支持常用類型、擴展數據類型:JSON(B)、Hstore(KV), PostGIS空間數據庫、pgrouting(路由,圖式搜索)、數組、ltree樹類型、HLL估值類型, smlar, imgsmlr等。
支持SQL流計算插件
支持時序插件
支持btree, hash, gin, gist, sp-gist, bloom, brin等索引。
支持plpgsql, sql服務端編程。
支持分析型語法(多維計算、窗口查詢)、遞歸查詢(樹形查詢、圖式搜索、等場景)。支持文本全文檢索、模煳查詢、相似查詢、正則查詢。支持數組相似查詢,圖像相似查詢。
1.1 適合業務場景:
TB+級OLTP(在線事務處理)+OLAP(實時分析)。
模煳查詢、相似搜索、正則搜索
全文檢索
物聯網
流式數據處理
社交
圖式搜索
獨立事件分析
冷熱分離
異步消息
多值類型、圖像特征值 相似搜索
實時數據清洗
GIS應用
任意字段實時搜索
... ...
1.2 主打:功能、穩定性、性能、高可用、可靠性、Oracle兼容性、HTAP。
2、HybridDB for PostgreSQL(Greenplum開源版GPDB改進而來) 適合PB級實時OLAP,非常典型的海量數倉。
兼容SQL:2008,兼容TPC-H,TPC-DS。有數十年的商業化曆練經驗。
支持可讀寫的OSS對象存儲外部表。
支持常用類型、擴展數據類型:JSON、PostGIS空間數據庫、數組、HLL估值類型。
支持bitmap, hash, btree索引。
支持pljava服務端編程。
支持分析型語法(多維計算、窗口查詢、MADlib機器學習)、支持全文檢索語法。
支持列存儲、行存儲、壓縮、混合存儲。
支持4階段聚合,支持節點間自動重分布。
支持水平擴容。
2.1 適合業務場景:
PB+級實時分析。(傳統統計;時間、空間、屬性多維屬性透視、圈人;任意表、任意維度JOIN;)
2.2 主打:分析型SQL兼容性、功能、穩定性、性能、高可用、擴展性。
空間圈人+人物透視 DEMO
結構設計
1、表結構設計1,寬表(當標簽種類在1600以內時)
create table tbl_pos (
uid int8, -- 用戶ID
att1 int8, -- 用戶標簽1
att2 int8, -- 用戶標簽2
att3 int8, -- 用戶標簽3
....
pos1 geometry, -- 用戶家庭位置
pos2 geometry, -- 用戶公司位置
pos3 geometry, -- 用戶xx位置
pos4 geometry, -- 用戶xxx位置
...
);
或者
create table tbl_tag (
uid int8, -- 用戶ID
att1 int8, -- 用戶標簽1
att2 int8, -- 用戶標簽2
att3 int8, -- 用戶標簽3
....
);
create table tbl_pos (
uid int8,
pos_att int2, -- 位置屬性,(家、公司、。。。)
pos geometry, -- 位置
);
2、表結構設計2,JSONB作為標簽字段,當表簽種類大於1600時。
create table tbl_pos (
uid int8, -- 用戶ID
att jsonb, -- 用戶標簽,用JSONB表示
....
pos1 geometry, -- 用戶家庭位置
pos2 geometry, -- 用戶公司位置
pos3 geometry, -- 用戶xx位置
pos4 geometry, -- 用戶xxx位置
...
);
3、表結構設計3,數組存標簽設計(與結構2的覆蓋範圍一樣),這個設計曾經用在個方案裏麵:
《恭迎萬億級營銷(圈人)瀟灑的邁入毫秒時代 - 萬億user_tags級實時推薦係統數據庫設計》
create table tbl_tag (
uid int8,
tag text[],
...
pos1 geometry, -- 用戶家庭位置
pos2 geometry, -- 用戶公司位置
pos3 geometry, -- 用戶xx位置
pos4 geometry, -- 用戶xxx位置
...
);
4、表結構設計4,標簽倒排設計(當標簽種類超過1600,並且標簽為YES OR NO的類別時,變更標簽需要使用合並和UDF的方式,僅僅適合於PostgreSQL),這個設計層級用在這個方案裏麵:
《基於 阿裏雲 RDS PostgreSQL 打造實時用戶畫像推薦係統》
create table tbl_pos (
uid int8,
pos1 geometry, -- 用戶家庭位置
pos2 geometry, -- 用戶公司位置
pos3 geometry, -- 用戶xx位置
pos4 geometry, -- 用戶xxx位置
...
);
create table tbl_tag (
tag int,
userbits varbit
);
以上設計各有優劣以及覆蓋的場景,看場景進行選擇。
接下來就對比PG 10和GPDB,采用第一種設計進行對比,用例中有200種數值類型的標簽種類,有10個常用地址。
PostgreSQL 10
準備數據
1、建標簽表、位置表,寫入10億標簽數據,100億位置數據。
create extension postgis;
create or replace function ct1 () returns void as $$
declare
sql text := '';
begin
sql := 'create table tbl_tag(uid int8,';
for i in 1..200 loop
sql := sql||'c'||i||' int2 default random()*32767,';
end loop;
sql := rtrim(sql, ',');
sql := sql||')';
execute sql;
end;
$$ language plpgsql strict;
create table tbl_pos(
uid int8 primary key,
pos_att int2,
pos geometry default st_setsrid(st_makepoint(73+random()*62, 3+random()*50), 4326)
);
create table tbl_pos1 (like tbl_pos including all) inherits (tbl_pos);
create table tbl_pos2 (like tbl_pos including all) inherits (tbl_pos);
create table tbl_pos3 (like tbl_pos including all) inherits (tbl_pos);
create table tbl_pos4 (like tbl_pos including all) inherits (tbl_pos);
create table tbl_pos5 (like tbl_pos including all) inherits (tbl_pos);
create table tbl_pos6 (like tbl_pos including all) inherits (tbl_pos);
create table tbl_pos7 (like tbl_pos including all) inherits (tbl_pos);
create table tbl_pos8 (like tbl_pos including all) inherits (tbl_pos);
create table tbl_pos9 (like tbl_pos including all) inherits (tbl_pos);
create table tbl_pos10 (like tbl_pos including all) inherits (tbl_pos);
select ct1();
nohup psql -c "insert into tbl_tag select generate_series(1,1000000000)" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos1 select generate_series(1,1000000000),1" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos2 select generate_series(1,1000000000),2" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos3 select generate_series(1,1000000000),3" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos4 select generate_series(1,1000000000),4" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos5 select generate_series(1,1000000000),5" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos6 select generate_series(1,1000000000),6" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos7 select generate_series(1,1000000000),7" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos8 select generate_series(1,1000000000),8" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos9 select generate_series(1,1000000000),9" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos10 select generate_series(1,1000000000),10" >/dev/null 2>&1 &
2、根據位置整理位置表,(因為用戶的位置數據是通過行為算出來的,並且通常變化非常小,所以可以視為半靜態數據,適合整理)。
create table tbl_pos_1 (like tbl_pos including all, check (pos_att=1)) inherits(tbl_pos);
create table tbl_pos_2 (like tbl_pos including all, check (pos_att=2)) inherits(tbl_pos);
create table tbl_pos_3 (like tbl_pos including all, check (pos_att=3)) inherits(tbl_pos);
create table tbl_pos_4 (like tbl_pos including all, check (pos_att=4)) inherits(tbl_pos);
create table tbl_pos_5 (like tbl_pos including all, check (pos_att=5)) inherits(tbl_pos);
create table tbl_pos_6 (like tbl_pos including all, check (pos_att=6)) inherits(tbl_pos);
create table tbl_pos_7 (like tbl_pos including all, check (pos_att=7)) inherits(tbl_pos);
create table tbl_pos_8 (like tbl_pos including all, check (pos_att=8)) inherits(tbl_pos);
create table tbl_pos_9 (like tbl_pos including all, check (pos_att=9)) inherits(tbl_pos);
create table tbl_pos_10 (like tbl_pos including all, check (pos_att=10)) inherits(tbl_pos);
-- 10位已經精確到米級, 足夠使用
nohup psql -c "insert into tbl_pos_1 select * from tbl_pos1 order by pos_att, st_geohash(pos, 10); " >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos_2 select * from tbl_pos2 order by pos_att, st_geohash(pos, 10); " >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos_3 select * from tbl_pos3 order by pos_att, st_geohash(pos, 10); " >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos_4 select * from tbl_pos4 order by pos_att, st_geohash(pos, 10); " >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos_5 select * from tbl_pos5 order by pos_att, st_geohash(pos, 10); " >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos_6 select * from tbl_pos6 order by pos_att, st_geohash(pos, 10); " >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos_7 select * from tbl_pos7 order by pos_att, st_geohash(pos, 10); " >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos_8 select * from tbl_pos8 order by pos_att, st_geohash(pos, 10); " >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos_9 select * from tbl_pos9 order by pos_att, st_geohash(pos, 10); " >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos_10 select * from tbl_pos10 order by pos_att, st_geohash(pos, 10);" >/dev/null 2>&1 &
3、創建位置表的geohash brin塊級索引,加速空間數據圈選。
create index idx_tbl_pos1_pos on tbl_pos1 using brin( pos );
create index idx_tbl_pos2_pos on tbl_pos2 using brin( pos );
create index idx_tbl_pos3_pos on tbl_pos3 using brin( pos );
create index idx_tbl_pos4_pos on tbl_pos4 using brin( pos );
create index idx_tbl_pos5_pos on tbl_pos5 using brin( pos );
create index idx_tbl_pos6_pos on tbl_pos6 using brin( pos );
create index idx_tbl_pos7_pos on tbl_pos7 using brin( pos );
create index idx_tbl_pos8_pos on tbl_pos8 using brin( pos );
create index idx_tbl_pos9_pos on tbl_pos9 using brin( pos );
create index idx_tbl_pos10_pos on tbl_pos10 using brin( pos );
解除未整理表的繼承關係(隻需要保留整理後的數據)。
alter table tbl_pos1 no inherit tbl_pos;
alter table tbl_pos2 no inherit tbl_pos;
alter table tbl_pos3 no inherit tbl_pos;
alter table tbl_pos4 no inherit tbl_pos;
alter table tbl_pos5 no inherit tbl_pos;
alter table tbl_pos6 no inherit tbl_pos;
alter table tbl_pos7 no inherit tbl_pos;
alter table tbl_pos8 no inherit tbl_pos;
alter table tbl_pos9 no inherit tbl_pos;
alter table tbl_pos10 no inherit tbl_pos;
4、創建標簽數據btree索引。
for ((i=1;i<=200;i++))
do
nohup psql -c "set maintenance_work_mem='512MB'; create index idx_tbl_tag_$i on tbl_tag (c$i);" >/dev/null 2>&1 &
done
5、空間使用情況
數據,10億記錄標簽表 424 GB,100億記錄位置表 640 GB。
postgres=# \dt+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+-----------------+-------+----------+------------+-------------
public | tbl_pos | table | postgres | 8192 bytes |
public | tbl_pos_1 | table | postgres | 64 GB |
public | tbl_pos_10 | table | postgres | 64 GB |
public | tbl_pos_2 | table | postgres | 64 GB |
public | tbl_pos_3 | table | postgres | 64 GB |
public | tbl_pos_4 | table | postgres | 64 GB |
public | tbl_pos_5 | table | postgres | 64 GB |
public | tbl_pos_6 | table | postgres | 64 GB |
public | tbl_pos_7 | table | postgres | 64 GB |
public | tbl_pos_8 | table | postgres | 64 GB |
public | tbl_pos_9 | table | postgres | 64 GB |
public | tbl_tag | table | postgres | 424 GB |
索引,
標簽表單列索引21GB,總共4200 GB。
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+----------------------+-------+----------+-----------------+--------+-------------
public | idx_tbl_tag_1 | index | postgres | tbl_tag | 21 GB |
位置表單列BRIN索引2.7MB,總共27MB。
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+----------------------+-------+----------+-----------------+---------+-------------
public | idx_tbl_pos10_pos | index | postgres | tbl_pos10 | 2728 kB |
public | idx_tbl_pos1_pos | index | postgres | tbl_pos1 | 2728 kB |
public | idx_tbl_pos2_pos | index | postgres | tbl_pos2 | 2728 kB |
public | idx_tbl_pos3_pos | index | postgres | tbl_pos3 | 2728 kB |
public | idx_tbl_pos4_pos | index | postgres | tbl_pos4 | 2728 kB |
public | idx_tbl_pos5_pos | index | postgres | tbl_pos5 | 2728 kB |
public | idx_tbl_pos6_pos | index | postgres | tbl_pos6 | 2728 kB |
public | idx_tbl_pos7_pos | index | postgres | tbl_pos7 | 2728 kB |
public | idx_tbl_pos8_pos | index | postgres | tbl_pos8 | 2728 kB |
public | idx_tbl_pos9_pos | index | postgres | tbl_pos9 | 2728 kB |
空間、屬性圈人 + 透視 性能測試
1、100億空間數據,按空間圈出約1000萬人,400毫秒。
實際可以按遊標返回
postgres=# select count(*) from tbl_pos where pos_att=1 and st_within(pos, st_setsrid(st_makebox2d(st_makepoint(120,5), st_makepoint(125.5,10.5)),4326));
count
---------
9757154
(1 row)
Time: 399.846 ms
執行計劃如下,BRIN索引並行掃描
postgres=# explain select count(*) from tbl_pos where pos_att=1 and st_within(pos, st_setsrid(st_makebox2d(st_makepoint(120,5), st_makepoint(125.5,10.5)),4326));
QUERY PLAN
-------------------------------------------------------------------------------------------
Finalize Aggregate (cost=10016617202.13..10016617202.14 rows=1 width=8)
-> Gather (cost=10016617202.04..10016617202.05 rows=32 width=8)
Workers Planned: 32
-> Partial Aggregate (cost=10016617202.04..10016617202.05 rows=1 width=8)
-> Parallel Append (cost=0.00..10016617175.99 rows=10418 width=0)
-> Parallel Seq Scan on tbl_pos (cost=10000000000.00..10000000000.00 rows=1 width=0)
Filter: (('0103000020E610000001000000050000000000000000005E4000000000000014400000000000005E4000000000000025400000000000605F4000000000000025400000000000605F4000000000000014400000000000005E400000000000001440'::geometry ~ pos) AND (pos_att = 1) AND _st_contains('0103000020E610000001000000050000000000000000005E4000000000000014400000000000005E4000000000000025400000000000605F4000000000000025400000000000605F4000000000000014400000000000005E400000000000001440'::geometry, pos))
-> Parallel Bitmap Heap Scan on tbl_pos_1 (cost=2591.99..16617175.99 rows=10417 width=0)
Recheck Cond: ('0103000020E610000001000000050000000000000000005E4000000000000014400000000000005E4000000000000025400000000000605F4000000000000025400000000000605F4000000000000014400000000000005E400000000000001440'::geometry ~ pos)
Filter: ((pos_att = 1) AND _st_contains('0103000020E610000001000000050000000000000000005E4000000000000014400000000000005E4000000000000025400000000000605F4000000000000025400000000000605F4000000000000014400000000000005E400000000000001440'::geometry, pos))
-> Bitmap Index Scan on idx_tbl_pos1_posbn (cost=0.00..2508.66 rows=1000000000 width=0)
Index Cond: ('0103000020E610000001000000050000000000000000005E4000000000000014400000000000005E4000000000000025400000000000605F4000000000000025400000000000605F4000000000000014400000000000005E400000000000001440'::geometry ~ pos)
(12 rows)
2、100億空間數據,按空間圈出約1000萬人,JOIN 10億標簽數據,透視這群人的標簽屬性,7秒。
執行計劃如下,使用了並行BRIN掃描和並行nestloop JOIN。
postgres=# explain analyze select c1,count(*),avg(c2),max(c3) from tbl_tag t2 join (select uid from tbl_pos where pos_att=1 and st_within(pos, st_setsrid(st_makebox2d(st_makepoint(120,5), st_makepoint(125.5,10.5)),4326))) t1 on (t1.uid=t2.uid) group by c1;
QUERY PLAN
--------------------------------------------------------------------------
Finalize GroupAggregate (cost=10016650419.09..10016663638.78 rows=32873 width=44) (actual time=5417.105..6404.328 rows=32768 loops=1)
Group Key: t2.c1
-> Gather Merge (cost=10016650419.09..10016659894.42 rows=333344 width=44) (actual time=5417.071..6212.057 rows=1081163 loops=1)
Workers Planned: 32
Workers Launched: 32
-> Partial GroupAggregate (cost=10016650418.26..10016650652.64 rows=10417 width=44) (actual time=5392.695..5506.923 rows=32763 loops=33)
Group Key: t2.c1
-> Sort (cost=10016650418.26..10016650444.30 rows=10417 width=6) (actual time=5392.676..5442.197 rows=295671 loops=33)
Sort Key: t2.c1
Sort Method: quicksort Memory: 30914kB
-> Nested Loop (cost=0.57..10016649723.09 rows=10417 width=6) (actual time=8.413..5277.270 rows=295671 loops=33)
-> Parallel Append (cost=0.00..10016617175.99 rows=10418 width=8) (actual time=8.342..407.141 rows=295671 loops=33)
-> Parallel Seq Scan on tbl_pos (cost=10000000000.00..10000000000.00 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=33)
Filter: (('0103000020E610000001000000050000000000000000005E4000000000000014400000000000005E4000000000000025400000000000605F4000000000000025400000000000605F4000000000000014400000000000005E400000000000001440'::geometry ~ pos) AND (pos_att = 1) AND _st_contains('0103000020E610000001000000050000000000000000005E4000000000000014400000000000005E4000000000000025400000000000605F4000000000000025400000000000605F4000000000000014400000000000005E400000000000001440'::geometry, pos))
-> Parallel Bitmap Heap Scan on tbl_pos_1 (cost=2591.99..16617175.99 rows=10417 width=8) (actual time=8.341..381.660 rows=295671 loops=33)
Recheck Cond: ('0103000020E610000001000000050000000000000000005E4000000000000014400000000000005E4000000000000025400000000000605F4000000000000025400000000000605F4000000000000014400000000000005E400000000000001440'::geometry ~ pos)
Rows Removed by Index Recheck: 32474
Filter: ((pos_att = 1) AND _st_contains('0103000020E610000001000000050000000000000000005E4000000000000014400000000000005E4000000000000025400000000000605F4000000000000025400000000000605F4000000000000014400000000000005E400000000000001440'::geometry, pos))
Heap Blocks: lossy=3688
-> Bitmap Index Scan on idx_tbl_pos1_posbn (cost=0.00..2508.66 rows=1000000000 width=0) (actual time=19.182..19.182 rows=902400 loops=1)
Index Cond: ('0103000020E610000001000000050000000000000000005E4000000000000014400000000000005E4000000000000025400000000000605F4000000000000025400000000000605F4000000000000014400000000000005E400000000000001440'::geometry ~ pos)
-> Index Scan using idx on tbl_tag t2 (cost=0.57..3.11 rows=1 width=14) (actual time=0.016..0.016 rows=1 loops=9757154)
Index Cond: (uid = tbl_pos.uid)
Planning time: 0.690 ms
Execution time: 7098.662 ms
(25 rows)
3、10億標簽數據,按標簽圈人約1000萬,並行bitmap Or掃描,14.5秒。
postgres=# select count(*) from tbl_tag where c1=1 or c2 between 1 and 100 or c13=100 or c4 between 1 and 200;
count
---------
9196602
(1 row)
Time: 14491.705 ms (00:14.492)
執行計劃如下
postgres=# explain analyze select count(*) from tbl_tag where c1=1 or c2 between 1 and 100 or c13=100 or c4 between 1 and 200;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=10080648.93..10080648.94 rows=1 width=8) (actual time=16170.482..16170.482 rows=1 loops=1)
-> Gather (cost=10080648.84..10080648.85 rows=32 width=8) (actual time=16148.118..16170.475 rows=33 loops=1)
Workers Planned: 32
Workers Launched: 32
-> Partial Aggregate (cost=10080648.84..10080648.85 rows=1 width=8) (actual time=16135.754..16135.754 rows=1 loops=33)
-> Parallel Bitmap Heap Scan on tbl_tag (cost=132192.09..10079940.37 rows=283386 width=0) (actual time=2174.588..16107.422 rows=278685 loops=33)
Recheck Cond: ((c1 = 1) OR ((c2 >= 1) AND (c2 <= 100)) OR (c13 = 100) OR ((c4 >= 1) AND (c4 <= 200)))
Rows Removed by Index Recheck: 4311619
Heap Blocks: exact=3516 lossy=293793
-> BitmapOr (cost=132192.09..132192.09 rows=9087533 width=0) (actual time=2094.773..2094.773 rows=0 loops=1)
-> Bitmap Index Scan on idx_tbl_tag_1 (cost=0.00..333.62 rows=30020 width=0) (actual time=9.718..9.718 rows=30332 loops=1)
Index Cond: (c1 = 1)
-> Bitmap Index Scan on idx_tbl_tag_2 (cost=0.00..43418.50 rows=3200783 width=0) (actual time=787.952..787.952 rows=3053594 loops=1)
Index Cond: ((c2 >= 1) AND (c2 <= 100))
-> Bitmap Index Scan on idx_tbl_tag_13 (cost=0.00..332.99 rows=29936 width=0) (actual time=3.662..3.662 rows=30554 loops=1)
Index Cond: (c13 = 100)
-> Bitmap Index Scan on idx_tbl_tag_4 (cost=0.00..79038.62 rows=5826795 width=0) (actual time=1293.437..1293.437 rows=6101279 loops=1)
Index Cond: ((c4 >= 1) AND (c4 <= 200))
Planning time: 0.289 ms
Execution time: 16733.719 ms
(20 rows)
阿裏雲的多維metascan特性可以解決這個掃描數據塊過多的問題,性能提升到500毫秒以內。
4、10億標簽數據,按標簽圈人約1000萬,透視這群人的空間屬性,hashjoin, 並行,203秒。
explain (analyze,verbose,timing,costs,buffers)
select st_geohash(t1.pos,6), count(*) from
tbl_pos_1 t1 join (select uid from tbl_tag where c1=1 or c2 between 1 and 100 or c13=100 or c4 between 1 and 200) t2
on (t1.uid=t2.uid)
group by st_geohash(t1.pos, 6);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=292262288.63..292678044.97 rows=9068342 width=40) (actual time=187392.324..197157.012 rows=8421096 loops=1)
Output: (st_geohash(t1.pos, 6)), count(*)
Group Key: (st_geohash(t1.pos, 6))
Buffers: shared hit=38723554 read=635308
-> Gather Merge (cost=292262288.63..292519348.94 rows=9068352 width=40) (actual time=187392.317..194293.700 rows=9171920 loops=1)
Output: (st_geohash(t1.pos, 6)), (PARTIAL count(*))
Workers Planned: 32
Workers Launched: 32
Buffers: shared hit=38723554 read=635308
-> Partial GroupAggregate (cost=292262287.80..292267955.52 rows=283386 width=40) (actual time=187348.727..187501.151 rows=277937 loops=33)
Output: (st_geohash(t1.pos, 6)), PARTIAL count(*)
Group Key: (st_geohash(t1.pos, 6))
Buffers: shared hit=1272770058 read=11675191
Worker 0: actual time=187342.771..187498.250 rows=282452 loops=1
Buffers: shared hit=39055272 read=348022
...... 並行聚合
-> Sort (cost=292262287.80..292262996.26 rows=283386 width=32) (actual time=187348.715..187401.757 rows=278685 loops=33)
Output: (st_geohash(t1.pos, 6))
Sort Key: (st_geohash(t1.pos, 6))
Sort Method: quicksort Memory: 25570kB
Buffers: shared hit=1272770058 read=11675191
Worker 0: actual time=187342.758..187396.443 rows=283206 loops=1
Buffers: shared hit=39055272 read=348022
...... 並行SORT
-> Hash Join (cost=10413383.91..292236623.78 rows=283386 width=32) (actual time=79890.153..186716.320 rows=278685 loops=33)
Output: st_geohash(t1.pos, 6)
Hash Cond: (t1.uid = tbl_tag.uid)
Buffers: shared hit=1272769802 read=11675191
Worker 0: actual time=81406.113..186712.149 rows=283206 loops=1
Buffers: shared hit=39055264 read=348022
...... 並行索引掃描
-> Parallel Index Scan using idx1 on public.tbl_pos_1 t1 (cost=0.57..281390010.62 rows=31250000 width=40) (actual time=0.040..92949.279 rows=30303030 loops=33)
Output: t1.uid, t1.pos_att, t1.pos
Buffers: shared hit=991056941 read=11675191
Worker 0: actual time=0.078..91228.338 rows=30782430 loops=1
Buffers: shared hit=30518510 read=348022
...... 並行HASH
-> Hash (cost=10300029.06..10300029.06 rows=9068342 width=8) (actual time=77789.991..77789.991 rows=9196602 loops=33)
Output: tbl_tag.uid
Buckets: 16777216 Batches: 1 Memory Usage: 490315kB
Buffers: shared hit=281712413
Worker 0: actual time=79153.913..79153.913 rows=9196602 loops=1
Buffers: shared hit=8536740
...... 並行bitmap掃描
-> Bitmap Heap Scan on public.tbl_tag (cost=132192.09..10300029.06 rows=9068342 width=8) (actual time=44896.981..74587.551 rows=9196602 loops=33)
Output: tbl_tag.uid
Recheck Cond: ((tbl_tag.c1 = 1) OR ((tbl_tag.c2 >= 1) AND (tbl_tag.c2 <= 100)) OR (tbl_tag.c13 = 100) OR ((tbl_tag.c4 >= 1) AND (tbl_tag.c4 <= 200)))
Heap Blocks: exact=8511538
Buffers: shared hit=281712413
Worker 0: actual time=45358.544..75896.906 rows=9196602 loops=1
Buffers: shared hit=8536740
...... 並行bitmap掃描
-> BitmapOr (cost=132192.09..132192.09 rows=9087533 width=0) (actual time=38429.522..38429.522 rows=0 loops=33)
Buffers: shared hit=831659
Worker 0: actual time=38869.151..38869.151 rows=0 loops=1
Buffers: shared hit=25202
...... 並行bitmap掃描
-> Bitmap Index Scan on idx_tbl_tag_1 (cost=0.00..333.62 rows=30020 width=0) (actual time=9.922..9.922 rows=30332 loops=33)
Index Cond: (tbl_tag.c1 = 1)
Buffers: shared hit=2999
Worker 0: actual time=10.045..10.045 rows=30332 loops=1
Buffers: shared hit=91
...... 並行bitmap掃描
-> Bitmap Index Scan on idx_tbl_tag_2 (cost=0.00..43418.50 rows=3200783 width=0) (actual time=9529.886..9529.886 rows=3053594 loops=33)
Index Cond: ((tbl_tag.c2 >= 1) AND (tbl_tag.c2 <= 100))
Buffers: shared hit=275483
Worker 0: actual time=9710.639..9710.639 rows=3053594 loops=1
Buffers: shared hit=8348
...... 並行bitmap掃描
-> Bitmap Index Scan on idx_tbl_tag_13 (cost=0.00..332.99 rows=29936 width=0) (actual time=9019.691..9019.691 rows=30554 loops=33)
Index Cond: (tbl_tag.c13 = 100)
Buffers: shared hit=2903
Worker 0: actual time=9143.024..9143.024 rows=30554 loops=1
Buffers: shared hit=88
...... 並行bitmap掃描
-> Bitmap Index Scan on idx_tbl_tag_4 (cost=0.00..79038.62 rows=5826795 width=0) (actual time=19870.013..19870.013 rows=6101279 loops=33)
Index Cond: ((tbl_tag.c4 >= 1) AND (tbl_tag.c4 <= 200))
Buffers: shared hit=550274
Worker 0: actual time=20005.432..20005.432 rows=6101279 loops=1
Buffers: shared hit=16675
......
Planning time: 0.302 ms
Execution time: 203637.896 ms
(754 rows)
Greenplum
準備數據
create or replace function ct1 () returns void as $$
declare
sql text := '';
begin
sql := 'create table tbl_tag(uid int8,';
for i in 1..200 loop
sql := sql||'c'||i||' int2 default random()*32767,';
end loop;
sql := rtrim(sql, ',');
sql := sql||') with (APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zlib, CHECKSUM=false)';
execute sql;
end;
$$ language plpgsql strict;
select ct1();
create table tbl_pos(
uid int8,
pos_att int2,
pos geometry default st_setsrid(st_makepoint(73+random()*62, 3+random()*50), 4326)
)
with (APPENDONLY=true, ORIENTATION=row, COMPRESSTYPE=zlib, CHECKSUM=false)
partition by list (pos_att)
(
partition p1 values (1),
partition p2 values (2),
partition p3 values (3),
partition p4 values (4),
partition p5 values (5),
partition p6 values (6),
partition p7 values (7),
partition p8 values (8),
partition p9 values (9),
partition p10 values (10)
)
;
nohup psql -c "insert into tbl_tag select generate_series(1,1000000000)" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos select generate_series(1,1000000000),1" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos select generate_series(1,1000000000),2" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos select generate_series(1,1000000000),3" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos select generate_series(1,1000000000),4" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos select generate_series(1,1000000000),5" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos select generate_series(1,1000000000),6" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos select generate_series(1,1000000000),7" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos select generate_series(1,1000000000),8" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos select generate_series(1,1000000000),9" >/dev/null 2>&1 &
nohup psql -c "insert into tbl_pos select generate_series(1,1000000000),10" >/dev/null 2>&1 &
nohup psql -c "copy (select uid,pos_att,st_geohash(pos, 10) from tbl_pos) to stdout"|psql -c "copy tbl_pos from stdin" >/dev/null 2>&1 &
使用阿裏雲metascan特性(類似PostgreSQL BRIN索引),加速tbl_pos.pos字段的過濾。
空間、屬性圈人 + 透視 性能測試
測試Greenplum時,由於環境限製沒有使用PostGIS空間插件,使用geohash text code代替,測試結果如下。
1、100億空間數據,按空間圈出約1000萬人,21秒。
select count(*) from tbl_pos where pos_att=1 and st_within(pos, st_setsrid(st_makebox2d(st_makepoint(120,5), st_makepoint(125.5,10.5)),4326));
postgres=# select count(*) from tbl_pos where pos_att=1 and pos between 't9m' and 'tbbd' ;
count
---------
9635855
(1 row)
Time: 21371.543 ms
2、100億空間數據,按空間圈出約1000萬人,JOIN 10億標簽數據,透視這群人的標簽屬性,29.3秒。
explain analyze select c1,count(*),avg(c2),max(c3) from tbl_tag t2 join (select uid from tbl_pos where pos_att=1 and st_within(pos, st_setsrid(st_makebox2d(st_makepoint(120,5), st_makepoint(125.5,10.5)),4326))) t1 on (t1.uid=t2.uid) group by c1;
postgres=# explain analyze select c1,count(*),avg(c2),max(c3) from tbl_tag t2 join (select uid from tbl_pos where pos_att=1 and pos between 't9m' and 'tbbd') t1 on (t1.uid=t2.uid) group by c1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 48:1 (slice2; segments: 48) (cost=43547383.92..43547955.26 rows=32648 width=44)
Rows out: 32768 rows at destination with 28854 ms to end, start offset by 448 ms.
-> HashAggregate (cost=43547383.92..43547955.26 rows=681 width=44)
Group By: t2.c1
Rows out: Avg 682.7 rows x 48 workers. Max 689 rows (seg6) with 0.001 ms to first row, 11625 ms to end, start offset by 466 ms.
-> Redistribute Motion 48:48 (slice1; segments: 48) (cost=43546078.00..43546730.96 rows=681 width=44)
Hash Key: t2.c1
Rows out: Avg 32694.9 rows x 48 workers at destination. Max 33008 rows (seg31) with 17172 ms to end, start offset by 494 ms.
-> HashAggregate (cost=43546078.00..43546078.00 rows=681 width=44)
Group By: t2.c1
Rows out: Avg 32694.9 rows x 48 workers. Max 32719 rows (seg22) with 0.009 ms to first row, 82 ms to end, start offset by 491 ms.
-> Hash Join (cost=18492191.00..43506178.00 rows=83125 width=6)
Hash Cond: t2.uid = postgres.tbl_pos.uid
Rows out: Avg 200747.0 rows x 48 workers. Max 201863 rows (seg25) with 0.044 ms to first row, 25419 ms to end, start offset by 494 ms.
Executor memory: 6274K bytes avg, 6309K bytes max (seg25).
Work_mem used: 6274K bytes avg, 6309K bytes max (seg25). Workfile: (0 spilling, 0 reused)
-> Append-only Columnar Scan on tbl_tag t2 (cost=0.00..22464112.00 rows=20833334 width=14)
Rows out: 0 rows (seg0) with 0.004 ms to end, start offset by 501 ms.
-> Hash (cost=17993441.00..17993441.00 rows=831251 width=8)
Rows in: (No row requested) 0 rows (seg0) with 0 ms to end.
-> Append (cost=0.00..17993441.00 rows=831251 width=8)
Rows out: 0 rows (seg0) with 15 ms to end, start offset by 503 ms.
-> Append-only Scan on tbl_pos_1_prt_p1 tbl_pos (cost=0.00..17993441.00 rows=831251 width=8)
Filter: pos_att = 1 AND pos >= 't9m'::text AND pos <= 'tbbd'::text
Rows out: Avg 200747.0 rows x 48 workers. Max 201863 rows (seg25) with 48 ms to end, start offset by 494 ms.
Slice statistics:
(slice0) Executor memory: 501K bytes.
(slice1) Executor memory: 1613K bytes avg x 48 workers, 1613K bytes max (seg0). Work_mem: 6309K bytes max.
(slice2) Executor memory: 524K bytes avg x 48 workers, 524K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 29302.351 ms
(34 rows)
Time: 29306.897 ms
3、10億標簽數據,按標簽圈人約1000萬,3.4秒。
select count(*) from tbl_tag where c1=1 or c2 between 1 and 100 or c13=100 or c4 between 1 and 200;
postgres=# select count(*) from tbl_tag where c1=1 or c2 between 1 and 100 or c13=100 or c4 between 1 and 200;
count
---------
9198749
(1 row)
Time: 3426.337 ms
4、10億標簽數據,按標簽圈人約1000萬,透視這群人的空間屬性,26.2秒。
explain (analyze,verbose,timing,costs,buffers)
select st_geohash(t1.pos,6), count(*) from
tbl_pos_1 t1 join (select uid from tbl_tag where c1=1 or c2 between 1 and 100 or c13=100 or c4 between 1 and 200) t2
on (t1.uid=t2.uid)
group by st_geohash(t1.pos, 6);
postgres=# explain analyze
postgres-# select substring(pos,1,6), count(*) from
postgres-# tbl_pos t1 join (select uid from tbl_tag where c1=1 or c2 between 1 and 100 or c13=100 or c4 between 1 and 200) t2
postgres-# on (t1.uid=t2.uid and t1.pos_att=1)
postgres-# group by substring(pos,1,6);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 48:1 (slice2; segments: 48) (cost=53124325.91..53135748.71 rows=913824 width=40)
Rows out: 8421444 rows at destination with 25714 ms to end, start offset by 449 ms.
-> HashAggregate (cost=53124325.91..53135748.71 rows=19038 width=40)
Group By: "?column1?"
Rows out: Avg 175446.8 rows x 48 workers. Max 176265 rows (seg2) with 0.001 ms to first row, 8243 ms to end, start offset by 466 ms.
-> Redistribute Motion 48:48 (slice1; segments: 48) (cost=53090057.51..53110618.55 rows=19038 width=40)
Hash Key: unnamed_attr_1
Rows out: Avg 191284.2 rows x 48 workers at destination. Max 192297 rows (seg37) with 15634 ms to end, start offset by 529 ms.
-> HashAggregate (cost=53090057.51..53092342.07 rows=19038 width=40)
Group By: "substring"(t1.pos, 1, 6)
Rows out: Avg 191284.2 rows x 48 workers. Max 191966 rows (seg1) with 0.006 ms to first row, 134 ms to end, start offset by 468 ms.
-> Hash Join (cost=37578340.02..53085488.39 rows=19039 width=11)
Hash Cond: t1.uid = tbl_tag.uid
Rows out: Avg 191640.6 rows x 48 workers. Max 192331 rows (seg1) with 0.039 ms to first row, 18171 ms to end, start offset by 468 ms.
Executor memory: 5989K bytes avg, 6011K bytes max (seg1).
Work_mem used: 5989K bytes avg, 6011K bytes max (seg1). Workfile: (0 spilling, 0 reused)
-> Append (cost=0.00..12993441.00 rows=20833334 width=19)
Rows out: 0 rows (seg0) with 1228 ms to end, start offset by 531 ms.
-> Append-only Scan on tbl_pos_1_prt_p1 t1 (cost=0.00..12993441.00 rows=20833334 width=19)
Filter: pos_att = 1
Rows out: Avg 20833333.3 rows x 48 workers. Max 20833547 rows (seg37) with 0.005 ms to first row, 0.006 ms to end, start offset by 531 ms.
-> Hash (cost=37464112.00..37464112.00 rows=190381 width=8)
Rows in: (No row requested) 0 rows (seg0) with 0 ms to end.
-> Append-only Columnar Scan on tbl_tag (cost=0.00..37464112.00 rows=190381 width=8)
Filter: c1 = 1 OR (c2 >= 1 AND c2 <= 100) OR c13 = 100 OR (c4 >= 1 AND c4 <= 200)
Rows out: 0 rows (seg0) with 57 ms to end, start offset by 528 ms.
Slice statistics:
(slice0) Executor memory: 487K bytes.
(slice1) Executor memory: 1725K bytes avg x 48 workers, 1725K bytes max (seg0). Work_mem: 6011K bytes max.
(slice2) Executor memory: 524K bytes avg x 48 workers, 524K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 26166.164 ms
(35 rows)
Time: 26170.031 ms
小結
對比:
空間
數據庫 | 表 | 記錄數 | SIZE |
---|---|---|---|
PostgreSQL 10 | 標簽表(201個字段) | 10 億 | 424 GB |
Greenplum | 標簽表(201個字段) | 10 億 | 369 GB |
PostgreSQL 10 | 位置表(12個字段) | 100 億 | 640 GB |
Greenplum | 位置表(12個字段) | 100 億 | 150 GB |
數據庫 | 索引 | 索引類型 | SIZE |
---|---|---|---|
PostgreSQL 10 | 標簽表 | btree | 4200 GB |
PostgreSQL 10 | 位置表 | brin | 27 MB |
性能
數據庫 | 業務 | 耗時 |
---|---|---|
PostgreSQL 10 | 100億空間數據,按空間圈出約1000萬人 | 400 毫秒 |
Greenplum | 100億空間數據,按空間圈出約1000萬人 | 21 秒 |
PostgreSQL 10 | 100億空間數據,按空間圈出約1000萬人,JOIN 10億標簽數據,透視這群人的標簽屬性 | 7 秒 |
Greenplum | 100億空間數據,按空間圈出約1000萬人,JOIN 10億標簽數據,透視這群人的標簽屬性 | 29.3 秒 |
PostgreSQL 10 | 10億標簽數據,按標簽圈人約1000萬 | 14.5 秒(能通過metascan優化到500毫秒以內) |
Greenplum | 10億標簽數據,按標簽圈人約1000萬 | 3.4 秒 |
PostgreSQL 10 | 10億標簽數據,按標簽圈人約1000萬,透視這群人的空間屬性 | 203秒 (PG 11版本 merge join partial scan可以大幅提高性能) |
Greenplum | 10億標簽數據,按標簽圈人約1000萬,透視這群人的空間屬性 | 26.2 秒 |
PostgreSQL 10未來優化點
PG 10 通過brin索引,bitmap scan,在部分場景的性能已經超過同等資源的Greenplum。
引入列存引擎、parallel hash補丁、range merge join不對,在同等資源下,另外幾個場景的性能會做到和Greenplum差不多(甚至更好)。
1、parallel hash join
https://commitfest.postgresql.org/14/871/
2、range merge join
https://commitfest.postgresql.org/14/1106/
3、parallel append scan
https://commitfest.postgresql.org/14/987/
但是Greenplum的強項是更大的數據量,例如通過打散,並行玩轉PB級的實時分析。
而PG,更加適合以TP為主,AP為輔的場景,即Oracle數據庫覆蓋到的場景。
Greenplum和PostgreSQL兩個產品的選擇還是請參考前麵所述。
最後更新:2017-09-24 16:33:10