284
技術社區[雲棲]
PostgreSQL 按需切片的實現(TimescaleDB插件自動切片功能的plpgsql schemaless實現)
標簽
PostgreSQL , schemaless , 自動切片 , track_count , 計數器 , udf , plpgsql , timescaledb
背景
TimescaleDB是PostgreSQL的一款時序數據庫插件,其中自動切片是一個非常深入人心的功能。
實際上PostgreSQL plpgsql也可以實現類似的功能,當然,前提是需要使用schemaless的模式。
schemaless的設計思路和應用舉例:
《PostgreSQL 在鐵老大訂單係統中的schemaless設計和性能壓測》
《PostgreSQL schemaless 的實現(類mongodb collection)》
《PostgreSQL 時序最佳實踐 - 證券交易係統數據庫設計 - 阿裏雲RDS PostgreSQL最佳實踐》
下麵,就自動切片這個功能,我們看看schemaless的實現例子。
1、首先要監測寫入量,通過track_counts參數,可以給數據的寫入計數(默認track_counts參數是開啟的)。
《PostgreSQL pg_stat_reset清除track_counts的隱患》
postgres=# select * from pg_stat_all_tables where relname='test1';
-[ RECORD 1 ]-------+-------
relid | 31129
schemaname | public
relname | test1
seq_scan | 0
seq_tup_read | 0
idx_scan |
idx_tup_fetch |
n_tup_ins | 1000
n_tup_upd | 0
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 1000
n_dead_tup | 0
n_mod_since_analyze | 1000
last_vacuum |
last_autovacuum |
last_analyze |
last_autoanalyze |
vacuum_count | 0
autovacuum_count | 0
analyze_count | 0
autoanalyze_count | 0
2、當數據寫入到一定量時,自動寫下一張表。
schemaless自動切片例子
1、設計:
寫入時,通過UDF寫入,並且實時監控每種數據流的寫入速度,並動態做數據分片。
2、測試表:
create table log(id int, info text, crt_time timestamp default now());
create index idx_log_crt_time on log(crt_time);
create table log_tmp (like log including all) inherits(log);
3、分片規則:
當記錄數超過100000時,自動切換分區。
4、UDF定義:
create or replace function f(v_id int, v_info text) returns void as $$
declare
suffix int;
v_rows int8;
min_time timestamp;
max_time timestamp;
begin
-- 插入
insert into log_tmp(id,info) values (v_id, v_info);
-- 判斷記錄數是否達到切換閾值
select n_live_tup into v_rows from pg_stat_all_tables where relname='log_tmp' and schemaname='public';
-- 達到閾值,切換表
if v_rows >=100000 then
select count(*) into suffix from pg_inherits where inhparent='log'::regclass;
select min(crt_time), max(crt_time) into min_time, max_time from log_tmp ;
execute 'alter table log_tmp add constraint ck_log_'||suffix||'_1 check (crt_time>='''||min_time||''' and crt_time<='''||max_time||''')';
execute 'alter table log_tmp rename to log_'||suffix;
create table log_tmp (like log including all) inherits(log);
end if;
return;
exception when others then
return;
end;
$$ language plpgsql strict;
5、壓測
vi test.sql
select f(1, 'test');
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 120
NOTICE: merging column "id" with inherited definition
NOTICE: merging column "info" with inherited definition
NOTICE: merging column "crt_time" with inherited definition
progress: 1.0 s, 25350.5 tps, lat 2.487 ms stddev 0.986
progress: 2.0 s, 26309.0 tps, lat 2.432 ms stddev 0.688
progress: 3.0 s, 26251.9 tps, lat 2.438 ms stddev 0.741
progress: 4.0 s, 26451.0 tps, lat 2.420 ms stddev 0.737
NOTICE: merging column "id" with inherited definition
NOTICE: merging column "info" with inherited definition
NOTICE: merging column "crt_time" with inherited definition
progress: 5.0 s, 29471.0 tps, lat 2.172 ms stddev 0.844
progress: 6.0 s, 32971.0 tps, lat 1.941 ms stddev 0.670
progress: 7.0 s, 33028.0 tps, lat 1.938 ms stddev 0.661
progress: 8.0 s, 33101.0 tps, lat 1.933 ms stddev 0.656
NOTICE: merging column "id" with inherited definition
NOTICE: merging column "info" with inherited definition
NOTICE: merging column "crt_time" with inherited definition
progress: 9.0 s, 32805.0 tps, lat 1.951 ms stddev 0.752
......
自動切片成功:
postgres=# \d log_1
Table "public.log_1"
Column | Type | Collation | Nullable | Default
----------+-----------------------------+-----------+----------+---------
id | integer | | |
info | text | | |
crt_time | timestamp without time zone | | | now()
Indexes:
"log_tmp_crt_time_idx" btree (crt_time)
Check constraints:
"ck_log_1_1" CHECK (crt_time >= '2017-11-02 23:52:34.264264'::timestamp without time zone AND crt_time <= '2017-11-02 23:54:05.939958'::timestamp without time zone)
Inherits: log
postgres=# \d log_2
Table "public.log_2"
Column | Type | Collation | Nullable | Default
----------+-----------------------------+-----------+----------+---------
id | integer | | |
info | text | | |
crt_time | timestamp without time zone | | | now()
Indexes:
"log_tmp_crt_time_idx1" btree (crt_time)
Check constraints:
"ck_log_2_1" CHECK (crt_time >= '2017-11-02 23:54:05.948796'::timestamp without time zone AND crt_time <= '2017-11-02 23:54:10.946987'::timestamp without time zone)
Inherits: log
小結
timescaleDB插件還有很多其他的功能,在使用方便也更加的邊界,待TimescaleDB插件成熟,還是推薦使用TimescaleDB。
對於阿裏雲RDS PG,使用本文提到的方法,還可以實現實時寫入RDS PG,同時根據設置的閾值,批量寫入OSS外部表(寫OSS外部表可以使用DBLINK的異步接口)。
《阿裏雲RDS PostgreSQL OSS 外部表 - (dblink異步調用封裝)並行寫提速案例》
最後更新:2017-11-12 01:34:44