ALICloudDB for PostgreSQL 試用報告 - 4 水平分庫 之 節點擴展
RDS現在還欠缺一個功能,就是數據庫克隆,你可以這麼理解,給現有的數據庫創建STANDBY,然後將這個STANDBY激活,就完成了對數據庫的克隆。
為什麼我們需要數據庫克隆功能呢?
這會使得數據庫的擴容變得非常簡單,比如我們這裏的應用場景,如果要將16個RDS,變成32個RDS,那麼克隆無疑是最好的辦法。因為不需要做邏輯數據遷移的事情,隻需要刪除不需要的數據庫,以及調整plproxy的cluster配置即可。
我們先假設RDS有創建STANDBYD的功能(相信未來會增加),看看如何來實現RDS的擴容。
假設主RDS1包含db0,db16兩個庫,現在要拆分為兩個RDS,RDS1(db0), RDS2(db16),分別包含db0和db16。
1. 為需要拆分的主RDS創建standby RDS, 確認流複製正常,確認standby RDS到達catchup狀態。
2. 配置密碼文件~/.pgpass,包含克隆庫的認證信息。
3. 調整plproxy cluster配置。使用隻讀事務連接需要遷移的數據庫,避免寫操作帶來的主備不一致。
例如 :
alter server rds_pg_cluster options (set p16 'host=old_rds_hostname dbname=db16 port=3433 user=digoal keepalives_idle=30 keepalives_interval=10 keepalives_count=10 application_name=16 options=''-c default_transaction_read_only=true'' ');
4. 確認主RDS需遷移的庫(db16)沒有連接,確認standby處於catchup狀態。
5. 激活standby。
6. 調整plproxy cluster配置。原連接RDS1(db16),修改為RDS2(db16)。
例如 :
alter server rds_pg_cluster options (set p16 'host=new_rds_hostname dbname=db16 port=3433 user=digoal keepalives_idle=30 keepalives_interval=10 keepalives_count=10 application_name=16');
7. 刪除主RDS節點已遷移的數據庫(db16),刪除standby節點多餘的數據庫(db0)。
循環1-7,將所有RDS拆分。
回到現實,現實是目前RDS沒有提供克隆功能。那麼我們需要人工實現數據遷移,需遷移的東西還挺多,包括表,視圖,函數,。。。。。可以用pg_dump,但是怎麼實現增量呢?可以通過PostgreSQL的logical decoding來實現增量複製。現在阿裏雲RDS PG已經支持了邏輯複製的功能。
下麵是人力擴容的例子:
源:
postgres=> select datname,pg_database_size(oid)/1024/1024 from pg_database;
datname | ?column?
-----------+----------
template1 | 6
template0 | 6
postgres | 3618
digoal | 6
db7 | 179
db23 | 179
(6 rows)
目標,把db23遷移到以下RDS:
postgres=> select datname,pg_database_size(oid)/1024/1024 from pg_database;
datname | ?column?
-----------+----------
template1 | 6
template0 | 6
postgres | 6
digoal | 6
db8 | 179
db24 | 179
(6 rows)
在目標RDS創建db23數據庫:
postgres=> create database db23;
CREATE DATABASE
postgres=> \c db23
psql (9.4.3, server 9.4.1)
You are now connected to database "db23" as user "digoal".
創建schema和需要遷移的函數:
db23=> create schema digoal;
CREATE SCHEMA
db23=> CREATE OR REPLACE FUNCTION digoal.dy(sql text)
db23-> RETURNS SETOF record
db23-> LANGUAGE plpgsql
db23-> STRICT
db23-> AS $function$
db23$> declare
db23$> rec record;
db23$> begin
db23$> for rec in execute sql loop
db23$> return next rec;
db23$> end loop;
db23$> return;
db23$> end;
db23$> $function$;
CREATE FUNCTION
db23=> CREATE OR REPLACE FUNCTION digoal.dy_ddl(sql text)
db23-> RETURNS VOID
db23-> LANGUAGE plpgsql
db23-> STRICT
db23-> AS $function$
db23$> declare
db23$> begin
db23$> execute sql;
db23$> return;
db23$> exception when others then return;
db23$> end;
db23$> $function$;
CREATE FUNCTION
準備需要遷移的數據的外部表:
db23=> create extension postgres_fdw;
CREATE EXTENSION
CREATE SERVER db23
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'old.pg.rds.aliyuncs.com', port '3433', dbname 'db23');
CREATE USER MAPPING FOR digoal -- locale user
SERVER db23
OPTIONS (user 'digoal', password 'digoal'); -- remote user/password
CREATE FOREIGN TABLE digoal.ft_userinfo (
dbid int default 23,
userid int,
info text
)
SERVER db23
OPTIONS (schema_name 'digoal', table_name 'userinfo');
CREATE FOREIGN TABLE digoal.ft_session (
dbid int default 23,
userid int,
last_login timestamp without time zone
)
SERVER db23
OPTIONS (schema_name 'digoal', table_name 'session');
CREATE FOREIGN TABLE digoal.ft_tbl_small (
userid int,
info text
)
SERVER db23
OPTIONS (schema_name 'digoal', table_name 'tbl_small');
CREATE FOREIGN TABLE digoal.ft_login_log (
dbid int default 23,
userid int,
db_user name,
client_addr inet,
client_port int,
server_addr inet,
server_port int,
login_time timestamp without time zone
)
SERVER db23
OPTIONS (schema_name 'digoal', table_name 'login_log');
創建物化視圖,日誌表(日誌數據不遷移)
CREATE MATERIALIZED VIEW digoal.userinfo (
dbid ,
userid ,
info
) as select * from digoal.ft_userinfo;
set maintenance_work_mem='10GB'; -- 超出RDS內存限製,可能會被殺掉
create unique index pk_userinfo on digoal.userinfo (userid);
CREATE MATERIALIZED VIEW digoal.session (
dbid ,
userid ,
last_login
) as select * from digoal.ft_session;
set maintenance_work_mem='10GB'; -- 超出RDS內存限製,可能會被殺掉
create unique index pk_session on digoal.session (userid);
CREATE MATERIALIZED VIEW digoal.tbl_small (
userid ,
info
) as select * from digoal.ft_tbl_small;
set maintenance_work_mem='10GB'; -- 超出RDS內存限製,可能會被殺掉
create unique index pk_tbl_small on digoal.tbl_small (userid);
CREATE TABLE digoal.login_log (
dbid int default 23,
userid int,
db_user name,
client_addr inet,
client_port int,
server_addr inet,
server_port int,
login_time timestamp without time zone
);
創建需要遷移的函數:
CREATE OR REPLACE FUNCTION digoal.dy_generate_test_ddl()
RETURNS VOID
LANGUAGE plpgsql
STRICT
AS $function$
declare
node int;
sql text;
begin
select application_name::int into node from pg_stat_activity where pid=pg_backend_pid();
sql := $a$insert into digoal.userinfo select $a$||node||$a$,generate_series($a$||node||$a$,32000000,32)$a$;
execute sql;
sql := $a$insert into digoal.session select dbid,userid from digoal.userinfo$a$;
execute sql;
return;
exception when others then return;
end;
$function$;
CREATE OR REPLACE FUNCTION digoal.query_pk(IN i_userid int, OUT dbid int, OUT userid int, OUT info text)
RETURNS record
LANGUAGE plpgsql
STRICT
AS $function$
declare
begin
select t.dbid,t.userid,t.info into dbid,userid,info from digoal.userinfo t where t.userid=i_userid;
return;
end;
$function$;
CREATE OR REPLACE FUNCTION digoal.insert_log(IN i_userid int)
RETURNS void
LANGUAGE plpgsql
STRICT
AS $function$
declare
begin
insert into digoal.login_log (userid,db_user,client_addr,client_port,server_addr,server_port,login_time)
values (i_userid,current_user,inet_client_addr(),inet_client_port(),inet_server_addr(),inet_server_port(),now());
end;
$function$;
CREATE OR REPLACE FUNCTION digoal.query_insert(IN i_userid int, OUT dbid int, OUT userid int, OUT info text)
RETURNS record
LANGUAGE plpgsql
STRICT
AS $function$
declare
begin
select t.dbid,t.userid,t.info into dbid,userid,info from digoal.userinfo t where t.userid=i_userid;
insert into digoal.login_log (userid,db_user,client_addr,client_port,server_addr,server_port,login_time)
values (i_userid,current_user,inet_client_addr(),inet_client_port(),inet_server_addr(),inet_server_port(),now());
return;
end;
$function$;
CREATE OR REPLACE FUNCTION digoal.update_pk(IN i_userid int)
RETURNS void
LANGUAGE plpgsql
STRICT
AS $function$
declare
begin
update digoal.session t set last_login=now() where t.userid=i_userid;
end;
$function$;
CREATE OR REPLACE FUNCTION digoal.query_update_insert(IN i_userid int, OUT dbid int, OUT userid int, OUT info text)
RETURNS record
LANGUAGE plpgsql
STRICT
AS $function$
declare
begin
select t.dbid,t.userid,t.info into dbid,userid,info from digoal.userinfo t where t.userid=i_userid;
insert into digoal.login_log (userid,db_user,client_addr,client_port,server_addr,server_port,login_time)
values (i_userid,current_user,inet_client_addr(),inet_client_port(),inet_server_addr(),inet_server_port(),now());
update digoal.session t set last_login=now() where t.userid=i_userid;
return;
end;
$function$;
CREATE OR REPLACE FUNCTION digoal.query_smalltbl(IN i_userid int, OUT userid int, OUT info text)
RETURNS record
LANGUAGE plpgsql
STRICT
AS $function$
declare
begin
select t.userid,t.info into userid,info from digoal.tbl_small t where t.userid=i_userid;
return;
end;
$function$;
PL/Proxy節點操作如下:
配置.pgpass, 新增:
new.pg.rds.aliyuncs.com:3433:*:digoal:digoal
使用default_transaction_read_only默認讀事務,屏蔽寫操作.
在遷移時間段內,用戶可以正常執行讀請求,但是如果執行寫請求會失敗,這樣確保數據遷移的一致性,同時降低遷移過程對業務的影響。
postgres# alter server rds_pg_cluster options (set p23 'host=old.pg.rds.aliyuncs.com dbname=db23 port=3433 user=digoal keepalives_idle=30 keepalives_interval=10 keepalives_count=10 application_name=23 options=''-c default_transaction_read_only=true'''); -- 注意裏麵是兩個單引號
測試讀正常,寫失敗:
postgres=# select query_pk(23);
query_pk
----------
(23,23,)
(1 row)
postgres=# select insert_log(23);
ERROR: public.insert_log(1): [db23] REMOTE ERROR: cannot execute INSERT in a read-only transaction
CONTEXT: Remote context: SQL statement "insert into digoal.login_log (userid,db_user,client_addr,client_port,server_addr,server_port,login_time)
values (i_userid,current_user,inet_client_addr(),inet_client_port(),inet_server_addr(),inet_server_port(),now())"
PL/pgSQL function insert_log(integer) line 4 at SQL statement
new RDS執行操作如下,刷新物化視圖:
refresh materialized view CONCURRENTLY digoal.userinfo;
REFRESH MATERIALIZED VIEW
Time: 10953.220 ms
refresh materialized view CONCURRENTLY digoal.session;
REFRESH MATERIALIZED VIEW
Time: 11013.860 ms
refresh materialized view CONCURRENTLY digoal.tbl_small;
REFRESH MATERIALIZED VIEW
Time: 5084.118 ms
這裏卡住,因為要修改數據字典需要超級用戶。而RDS提供的用戶是普通用戶,無法修改數據字典(雖然有風險,這裏隻為演示)。所以這樣遷移行不通。
如果是超級用戶,那麼操作請參考我前期寫的BLOG
https://blog.163.com/digoal@126/blog/static/163877040201559105235803/
為了演示下去,我隻能選擇全量遷移。(其他增量方法也有,本文不演示)
db23=> drop materialized view digoal.session ;
DROP MATERIALIZED VIEW
Time: 16.528 ms
db23=> drop materialized view digoal.userinfo;
DROP MATERIALIZED VIEW
Time: 15.781 ms
db23=> drop materialized view digoal.tbl_small;
DROP MATERIALIZED VIEW
Time: 9.458 ms
為了提高遷移速度,用了一些手段。
set synchronous_commit=off;
set maintenance_work_mem='10GB'; -- 超出RDS內存限製,可能會被殺掉
CREATE TABLE digoal.userinfo (
dbid int default 23,
userid int,
info text
) with (autovacuum_enabled=off, toast.autovacuum_enabled=off);
CREATE TABLE digoal.session (
dbid int default 23,
userid int,
last_login timestamp without time zone
) with (autovacuum_enabled=off, toast.autovacuum_enabled=off);
CREATE TABLE digoal.tbl_small (
userid int,
info text
) with (autovacuum_enabled=off, toast.autovacuum_enabled=off);
insert into digoal.userinfo select * from digoal.ft_userinfo;
INSERT 0 1000000
Time: 45290.701 ms
insert into digoal.session select * from digoal.ft_session;
INSERT 0 1000000
Time: 42212.278 ms
insert into digoal.tbl_small select * from digoal.ft_tbl_small;
INSERT 0 500000
Time: 22885.456 ms
alter table digoal.userinfo add constraint pk_userinfo primary key (userid);
ALTER TABLE
Time: 16962.174 ms
alter table digoal.session add constraint pk_session primary key (userid);
ALTER TABLE
Time: 20809.422 ms
alter table digoal.tbl_small add constraint pk_tbl_small primary key (userid);
ALTER TABLE
Time: 17484.201 ms
vacuum analyze digoal.userinfo;
Time: 65.790 ms
vacuum analyze digoal.session;
Time: 65.427 ms
vacuum analyze digoal.tbl_small;
Time: 45.453 ms
alter table digoal.userinfo set (autovacuum_enabled=on, toast.autovacuum_enabled=on);
alter table digoal.session set (autovacuum_enabled=on, toast.autovacuum_enabled=on);
alter table digoal.tbl_small set (autovacuum_enabled=on, toast.autovacuum_enabled=on);
PL/Proxy, 修改集群,db23的目標主機為新的RDS,並且開放讀寫權限:
postgres# alter server rds_pg_cluster options (set p23 'host=old.pg.rds.aliyuncs.com dbname=db23 port=3433 user=digoal keepalives_idle=30 keepalives_interval=10 keepalives_count=10 application_name=23');
刪除老RDS上的db23.
psql -h old.pg.rds.aliyuncs.com -p 3433 -U digoal postgres
postgres=> drop database db23;
DROP DATABASE
測試plproxy分發是否正常分發到新的數據庫:
postgres=# select * from query_pk(23);
dbid | userid | info
------+--------+------
23 | 23 |
(1 row)
vi test.sql
\setrandom id 1 32000000
select insert_log(:id);
pgbench -M prepared -n -r -f ./test.sql -P 1 -c 30 -j 30 -T 30
progress: 1.0 s, 7853.2 tps, lat 3.340 ms stddev 6.056
progress: 2.0 s, 10766.4 tps, lat 2.432 ms stddev 5.433
progress: 3.0 s, 11395.0 tps, lat 2.277 ms stddev 4.590
progress: 4.0 s, 11622.1 tps, lat 2.216 ms stddev 4.493
progress: 5.0 s, 10519.9 tps, lat 2.454 ms stddev 5.600
progress: 6.0 s, 11153.4 tps, lat 2.317 ms stddev 4.795
progress: 7.0 s, 11474.3 tps, lat 2.312 ms stddev 4.802
progress: 8.0 s, 11398.5 tps, lat 2.253 ms stddev 4.308
progress: 9.0 s, 12106.7 tps, lat 2.174 ms stddev 3.302
progress: 10.0 s, 12567.8 tps, lat 2.068 ms stddev 2.075
連接到新的db23:
psql -h new_rds -p 3433 -U digoal
postgres=> \c db23
psql (9.4.3, server 9.4.1)
You are now connected to database "db23" as user "digoal".
db23=> select count(*) from login_log ;
count
-------
10547
(1 row)
[補充]
1. 如果實際使用內存超出了RDS內存限製,會被殺掉。
postgres=> set maintenance_work_mem='10GB';
SET
postgres=> alter table session add constraint pk_session primary key(userid);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q
[參考]
1. https://blog.163.com/digoal@126/blog/static/163877040201559105235803/
2. https://www.postgresql.org/docs/9.4/static/libpq-connect.html
最後更新:2017-04-01 13:44:33