Deepgreen & Greenplum 高可用(一) - Segment節點故障轉移
尚書中雲:惟事事,乃其有備,有備無患。這教導我們做事一定要有準備,做事尚且如此,在企事業單位發展中處於基礎地位的數據倉庫軟件在運行過程中,何嚐不需要有備無患呢?
今天別的不表,主要來談談企業級數據倉庫軟件Deepgreen和Greenplum的高可用特性之一:計算節點鏡像。
一、首先從理論上來講,正常Segment節點和他的Mirror是分布在不同主機上的,以防止單點故障導致的數據庫訪問異常。當正常Segment節點出現故障時,Mirror節點可以自動接管Segment節點的服務,數據庫仍然可以正常使用。這個過程對前台應用來說是透明的。下麵我們來看實操例子:
1.測試環境
本人筆記本電腦;
4核心,8G內存,1T硬盤
已安裝1個主節點和2個計算節點,無鏡像
2.首先查看集群狀態:1 Master,2 Segments
## 執行命令:gpstate
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:---------------------------
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Master instance parameters
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:---------------------------
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Database = template1
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Master Port = 15432
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Master directory = /dgdata/master/dg-1
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Timeout = 600 seconds
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Master standby = Off
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:---------------------------------------
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Segment instances that will be started
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:---------------------------------------
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:- Host Datadir Port
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:- flash /dgdata/primary/dg0 25432
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:- flash /dgdata/primary/dg1 25433
3.創建鏡像節點目錄並添加節點
## 創建mirror目錄
dgadmin@flash:~$ mkdir /dgdata/mirror
## 執行命令:gpaddmirrors添加節點
dgadmin@flash:~$ gpaddmirrors
20170628:06:41:50:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Starting gpaddmirrors with args:
20170628:06:41:50:004294 gpaddmirrors:flash:dgadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build Deepgreen DB'
20170628:06:41:50:004294 gpaddmirrors:flash:dgadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.99.00 build Deepgreen DB) on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4 compiled on May 18 2017 05:19:19'
20170628:06:41:50:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Obtaining Segment details from master...
Enter mirror segment data directory location 1 of 2 >
/dgdata/mirror
Enter mirror segment data directory location 2 of 2 >
/dgdata/mirror
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Greenplum Add Mirrors Parameters
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Greenplum master data directory = /dgdata/master/dg-1
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Greenplum master port = 15432
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Parallel batch limit = 16
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Mirror 1 of 2
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance host = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance address = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance directory = /dgdata/primary/dg0
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance port = 25432
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance replication port = 28432
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance host = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance address = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance directory = /dgdata/mirror/dg0
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance port = 26432
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance replication port = 27432
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Mirror 2 of 2
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance host = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance address = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance directory = /dgdata/primary/dg1
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance port = 25433
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance replication port = 28433
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance host = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance address = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance directory = /dgdata/mirror/dg1
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance port = 26433
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance replication port = 27433
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:----------------------------------------------------------
Continue with add mirrors procedure Yy|Nn (default=N):
> y
20170628:06:42:06:004294 gpaddmirrors:flash:dgadmin-[INFO]:-2 segment(s) to add
20170628:06:42:06:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Building template directory
20170628:06:42:07:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Validating remote directories
......
20170628:06:42:13:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Copying template directory file
.
20170628:06:42:14:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Configuring new segments
......
20170628:06:42:20:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Cleaning files
.
20170628:06:42:21:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Starting file move procedure for flash:/dgdata/mirror/dg0:content=0:dbid=4:mode=r:status=u
20170628:06:42:21:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Starting file move procedure for flash:/dgdata/mirror/dg1:content=1:dbid=5:mode=r:status=u
20170628:06:42:21:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Updating configuration with new mirrors
20170628:06:42:22:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Updating mirrors
......
20170628:06:42:28:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Starting mirrors
20170628:06:42:28:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
.......
20170628:06:42:35:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Process results...
20170628:06:42:35:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Updating configuration to mark mirrors up
20170628:06:42:35:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Updating primaries
20170628:06:42:35:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Commencing parallel primary conversion of 2 segments, please wait...
......
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Process results...
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Done updating primaries
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-******************************************************************
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Mirror segments have been added; data synchronization is in progress.
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Data synchronization will continue in the background.
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Use gpstate -s to check the resynchronization progress.
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-*****************************************************************
4.重新查看集群狀態:確定2個Segments都已經添加好Mirror
dgadmin@flash:~$ gpstate
20170628:06:59:19:005589 gpstate:flash:dgadmin-[INFO]:-Starting gpstate with args:
20170628:06:59:19:005589 gpstate:flash:dgadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build Deepgreen DB'
20170628:06:59:20:005589 gpstate:flash:dgadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.99.00 build Deepgreen DB) on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4 compiled on May 18 2017 05:19:19'
20170628:06:59:20:005589 gpstate:flash:dgadmin-[INFO]:-Obtaining Segment details from master...
20170628:06:59:20:005589 gpstate:flash:dgadmin-[INFO]:-Gathering data from segments...
....
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:-Greenplum instance status summary
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Master instance = Active
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Master standby = No master standby configured
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total segment instance count from metadata = 4
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Primary Segment Status
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total primary segments = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total primary segment valid (at master) = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total primary segment failures (at master) = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Mirror Segment Status
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total mirror segments = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total mirror segment valid (at master) = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total mirror segment failures (at master) = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number mirror segments acting as primary segments = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number mirror segments acting as mirror segments = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
dgadmin@flash:~$
5.登錄數據庫進行查詢測試
dgadmin@flash:~$ psql -d postgres
psql (8.2.15)
Type "help" for help.
postgres=# \c
You are now connected to database "postgres" as user "dgadmin".
postgres=# \l
List of databases
Name | Owner | Encoding | Access privileges
-----------+---------+----------+---------------------
postgres | dgadmin | UTF8 |
template0 | dgadmin | UTF8 | =c/dgadmin
: dgadmin=CTc/dgadmin
template1 | dgadmin | UTF8 | =c/dgadmin
: dgadmin=CTc/dgadmin
tpch | dgadmin | UTF8 |
tpch_1g | dgadmin | UTF8 |
(5 rows)
postgres=# \c tpch
You are now connected to database "tpch" as user "dgadmin".
tpch=# \dt
List of relations
Schema | Name | Type | Owner | Storage
--------+----------+-------+---------+----------------------
public | customer | table | dgadmin | append only columnar
public | lineitem | table | dgadmin | append only columnar
public | nation | table | dgadmin | append only columnar
public | orders | table | dgadmin | append only columnar
public | part | table | dgadmin | append only columnar
public | partsupp | table | dgadmin | append only columnar
public | region | table | dgadmin | append only columnar
public | supplier | table | dgadmin | append only columnar
(8 rows)
tpch=# select * from region;
r_regionkey | r_name | r_comment
-------------+---------------------------+---------------------------------------------------------------------------------------------------------------------
1 | AMERICA | hs use ironic, even requests. s
3 | EUROPE | ly final courts cajole furiously final excuse
0 | AFRICA | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
2 | ASIA | ges. thinly even pinto beans ca
4 | MIDDLE EAST | uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
(5 rows)
6.模擬實驗:Segment1實例異常
## 查詢dg1的後台進程號:3898
dgadmin@flash:~$ ps -ef | grep postgres
postgres 1089 1 0 Jun27 ? 00:00:01 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c config_file=/etc/postgresql/9.3/main/postgresql.conf
postgres 1099 1089 0 Jun27 ? 00:00:00 postgres: checkpointer process
postgres 1100 1089 0 Jun27 ? 00:00:00 postgres: writer process
postgres 1101 1089 0 Jun27 ? 00:00:00 postgres: wal writer process
postgres 1102 1089 0 Jun27 ? 00:00:00 postgres: autovacuum launcher process
postgres 1103 1089 0 Jun27 ? 00:00:02 postgres: stats collector process
dgadmin 3898 1 0 06:26 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/primary/dg1 -p 25433 -b 3 -z 2 --silent-mode=true -i -M mirrorless -C 1
dgadmin 3899 1 0 06:26 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/primary/dg0 -p 25432 -b 2 -z 2 --silent-mode=true -i -M mirrorless -C 0
dgadmin 3915 3898 0 06:26 ? 00:00:00 postgres: port 25433, sweeper process
dgadmin 3927 1 0 06:27 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/master/dg-1 -p 15432 -b 1 -z 2 --silent-mode=true -i -M master -C -1 -x 0 -E
dgadmin 3928 3927 0 06:27 ? 00:00:00 postgres: port 15432, master logger process
dgadmin 3931 3927 0 06:27 ? 00:00:00 postgres: port 15432, stats collector process
dgadmin 3932 3927 0 06:27 ? 00:00:00 postgres: port 15432, writer process
dgadmin 3933 3927 0 06:27 ? 00:00:00 postgres: port 15432, checkpoint process
dgadmin 3934 3927 0 06:27 ? 00:00:00 postgres: port 15432, seqserver process
dgadmin 3935 3927 0 06:27 ? 00:00:00 postgres: port 15432, ftsprobe process
dgadmin 3936 3927 0 06:27 ? 00:00:00 postgres: port 15432, sweeper process
dgadmin 5012 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg1 -p 26433 -b 5 -z 2 --silent-mode=true -i -M quiescent -C 1
dgadmin 5013 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg0 -p 26432 -b 4 -z 2 --silent-mode=true -i -M quiescent -C 0
dgadmin 5014 5013 0 06:42 ? 00:00:00 postgres: port 26432, logger process
dgadmin 5015 5012 0 06:42 ? 00:00:00 postgres: port 26433, logger process
dgadmin 5022 5013 0 06:42 ? 00:00:00 postgres: port 26432, mirror process
dgadmin 5023 5012 0 06:42 ? 00:00:00 postgres: port 26433, mirror process
dgadmin 5106 5103 0 06:42 ? 00:00:00 postgres: port 25433, primary consumer ack process
dgadmin 5107 5103 0 06:42 ? 00:00:01 postgres: port 25433, primary recovery process
dgadmin 5108 5103 0 06:42 ? 00:00:00 postgres: port 25433, primary verification process
dgadmin 5114 3899 0 06:42 ? 00:00:01 postgres: port 25432, primary process
dgadmin 5115 5114 0 06:42 ? 00:00:00 postgres: port 25432, primary receiver ack process
dgadmin 5116 5114 1 06:42 ? 00:00:20 postgres: port 25432, primary sender process
dgadmin 5117 5114 0 06:42 ? 00:00:00 postgres: port 25432, primary consumer ack process
dgadmin 5118 5114 0 06:42 ? 00:00:01 postgres: port 25432, primary recovery process
dgadmin 5119 5114 0 06:42 ? 00:00:00 postgres: port 25432, primary verification process
dgadmin 5919 3733 0 07:08 pts/0 00:00:00 grep --color=auto postgres
## 殺死進程,模擬進程異常
dgadmin@flash:~$ ps -ef | grep postgres
postgres 1089 1 0 Jun27 ? 00:00:01 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c config_file=/etc/postgresql/9.3/main/postgresql.conf
postgres 1099 1089 0 Jun27 ? 00:00:00 postgres: checkpointer process
dgadmin 3898 1 0 06:26 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/primary/dg1 -p 25433 -b 3 -z 2 --silent-mode=true -i -M mirrorless -C 1
dgadmin 3899 1 0 06:26 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/primary/dg0 -p 25432 -b 2 -z 2 --silent-mode=true -i -M mirrorless -C 0
dgadmin 3927 1 0 06:27 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/master/dg-1 -p 15432 -b 1 -z 2 --silent-mode=true -i -M master -C -1 -x 0 -E
dgadmin 3936 3927 0 06:27 ? 00:00:00 postgres: port 15432, sweeper process
dgadmin 5012 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg1 -p 26433 -b 5 -z 2 --silent-mode=true -i -M quiescent -C 1
dgadmin 5013 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg0 -p 26432 -b 4 -z 2 --silent-mode=true -i -M quiescent -C 0
dgadmin 5919 3733 0 07:08 pts/0 00:00:00 grep --color=auto postgres
## 殺死進程,模擬異常
dgadmin@flash:~$ kill -9 3898
## 重新查看,dg1的進程3989已不存在
dgadmin@flash:~$ ps -ef | grep postgres
postgres 1089 1 0 Jun27 ? 00:00:01 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c config_file=/etc/postgresql/9.3/main/postgresql.conf
dgadmin 3899 1 0 06:26 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/primary/dg0 -p 25432 -b 2 -z 2 --silent-mode=true -i -M mirrorless -C 0
dgadmin 3927 1 0 06:27 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/master/dg-1 -p 15432 -b 1 -z 2 --silent-mode=true -i -M master -C -1 -x 0 -E
dgadmin 5012 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg1 -p 26433 -b 5 -z 2 --silent-mode=true -i -M quiescent -C 1
dgadmin 5013 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg0 -p 26432 -b 4 -z 2 --silent-mode=true -i -M quiescent -C 0
dgadmin 5014 5013 0 06:42 ? 00:00:00 postgres: port 26432, logger process
dgadmin 5119 5114 0 06:42 ? 00:00:00 postgres: port 25432, primary verification process
dgadmin 5939 3733 0 07:09 pts/0 00:00:00 grep --color=auto postgres
7.訪問數據庫測試連通性:仍然可以正常查詢
dgadmin@flash:~$ psql -d tpch
psql (8.2.15)
Type "help" for help.
tpch=# select * from region;
r_regionkey | r_name | r_comment
-------------+---------------------------+---------------------------------------------------------------------------------------------------------------------
1 | AMERICA | hs use ironic, even requests. s
3 | EUROPE | ly final courts cajole furiously final excuse
0 | AFRICA | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
2 | ASIA | ges. thinly even pinto beans ca
4 | MIDDLE EAST | uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
(5 rows)
8.查看集群狀態:顯示有1個primary節點異常
dgadmin@flash:~$ gpstate
20170628:07:15:28:006079 gpstate:flash:dgadmin-[INFO]:-Starting gpstate with args:
20170628:07:15:28:006079 gpstate:flash:dgadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build Deepgreen DB'
20170628:07:15:28:006079 gpstate:flash:dgadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.99.00 build Deepgreen DB) on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4 compiled on May 18 2017 05:19:19'
20170628:07:15:28:006079 gpstate:flash:dgadmin-[INFO]:-Obtaining Segment details from master...
20170628:07:15:28:006079 gpstate:flash:dgadmin-[INFO]:-Gathering data from segments...
.
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:-Greenplum instance status summary
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Master instance = Active
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Master standby = No master standby configured
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total segment instance count from metadata = 4
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Primary Segment Status
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total primary segments = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total primary segment valid (at master) = 1
20170628:07:15:29:006079 gpstate:flash:dgadmin-[WARNING]:-Total primary segment failures (at master) = 1 <<<<<<<<
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[WARNING]:-Total number postmaster processes missing = 1 <<<<<<<<
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 1
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Mirror Segment Status
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total mirror segments = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total mirror segment valid (at master) = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total mirror segment failures (at master) = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes missing = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[WARNING]:-Total number mirror segments acting as primary segments = 1 <<<<<<<<
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number mirror segments acting as mirror segments = 1
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
dgadmin@flash:~$
至此本實驗完成,可以看到,在Segment主實例異常情況下,Mirror會立即接管服務,不會對前台應用產生停機影響。
實驗完成了,但是事情還沒有結束哦,因為這是故障,所以故障產生後,需要進行修複,那麼基於整個集群,我們都需要做什麼呢?
要點一:及時恢複故障節點
要點二:雖然Mirror可以臨時接管服務,保持服務的連續性,但是在實際生產過程中,由於節點及其Mirror的分散性,長期使用Mirror會導致數據分布不均勻,所以故障修複後,建議及時切換回原來的架構。
二、故障節點恢複
節點在故障以後,可以通過gprecoverseg命令恢複故障節點,如下:
dgadmin@flash:~$ gprecoverseg
20170628:22:01:34:001650 gprecoverseg:flash:dgadmin-[INFO]:-Starting gprecoverseg with args:
20170628:22:01:34:001650 gprecoverseg:flash:dgadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build Deepgreen DB'
20170628:22:01:34:001650 gprecoverseg:flash:dgadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.99.00 build Deepgreen DB) on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4 compiled on May 18 2017 05:19:19'
20170628:22:01:34:001650 gprecoverseg:flash:dgadmin-[INFO]:-Checking if segments are ready
20170628:22:01:34:001650 gprecoverseg:flash:dgadmin-[INFO]:-Obtaining Segment details from master...
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:-Obtaining Segment details from master...
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:-Greenplum instance recovery parameters
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:-Recovery type = Standard
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:-Recovery 1 of 1
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Synchronization mode = Incremental
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Failed instance host = flash
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Failed instance address = flash
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Failed instance directory = /dgdata/primary/dg1
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Failed instance port = 25433
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Failed instance replication port = 28433
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Source instance host = flash
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Source instance address = flash
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Source instance directory = /dgdata/mirror/dg1
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Source instance port = 26433
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Source instance replication port = 27433
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Target = in-place
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:----------------------------------------------------------
Continue with segment recovery procedure Yy|Nn (default=N):
> y
20170628:22:01:38:001650 gprecoverseg:flash:dgadmin-[INFO]:-1 segment(s) to recover
20170628:22:01:38:001650 gprecoverseg:flash:dgadmin-[INFO]:-Ensuring 1 failed segment(s) are stopped
20170628:22:01:39:001650 gprecoverseg:flash:dgadmin-[INFO]:-Ensuring that shared memory is cleaned up for stopped segments
20170628:22:01:45:001650 gprecoverseg:flash:dgadmin-[INFO]:-Updating configuration with new mirrors
20170628:22:01:45:001650 gprecoverseg:flash:dgadmin-[INFO]:-Updating mirrors
......
20170628:22:01:51:001650 gprecoverseg:flash:dgadmin-[INFO]:-Starting mirrors
20170628:22:01:51:001650 gprecoverseg:flash:dgadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
.......
20170628:22:01:58:001650 gprecoverseg:flash:dgadmin-[INFO]:-Process results...
20170628:22:01:58:001650 gprecoverseg:flash:dgadmin-[INFO]:-Updating configuration to mark mirrors up
20170628:22:01:58:001650 gprecoverseg:flash:dgadmin-[INFO]:-Updating primaries
20170628:22:01:58:001650 gprecoverseg:flash:dgadmin-[INFO]:-Commencing parallel primary conversion of 1 segments, please wait...
.......
20170628:22:02:05:001650 gprecoverseg:flash:dgadmin-[INFO]:-Process results...
20170628:22:02:05:001650 gprecoverseg:flash:dgadmin-[INFO]:-Done updating primaries
20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-******************************************************************
20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-Updating segments for resynchronization is completed.
20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-For segments updated successfully, resynchronization will continue in the background.
20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-
20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-Use gpstate -s to check the resynchronization progress.
20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-******************************************************************
查看集群狀態
dgadmin@flash:~$ gpstate
20170628:22:19:45:003527 gpstate:flash:dgadmin-[INFO]:-Starting gpstate with args:
20170628:22:19:45:003527 gpstate:flash:dgadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build Deepgreen DB'
20170628:22:19:45:003527 gpstate:flash:dgadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.99.00 build Deepgreen DB) on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4 compiled on May 18 2017 05:19:19'
20170628:22:19:45:003527 gpstate:flash:dgadmin-[INFO]:-Obtaining Segment details from master...
20170628:22:19:45:003527 gpstate:flash:dgadmin-[INFO]:-Gathering data from segments...
.
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:-Greenplum instance status summary
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Master instance = Active
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Master standby = No master standby configured
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total segment instance count from metadata = 4
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Primary Segment Status
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total primary segments = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total primary segment valid (at master) = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total primary segment failures (at master) = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Mirror Segment Status
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total mirror segments = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total mirror segment valid (at master) = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total mirror segment failures (at master) = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[WARNING]:-Total number mirror segments acting as primary segments = 1 <<<<<<<<
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number mirror segments acting as mirror segments = 1
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
三、切換回原集群狀態
從第二部分的最後狀態代碼也可以看出,目前是一個Mirror節點接管了Primary節點的服務,我們本節要講Mirror的服務交還給Primary。
下麵執行:gprecoverseg -r 命令進行節點切換,切換完成後執行gpstate查看狀態,代碼略。
另外,我們也可以在數據庫裏麵通過字典表查看切換信息:
postgres=# select * from gp_configuration_history;
time | dbid | desc
-------------------------------+------+-----------------------------------------------------------------------------------------------
2017-06-28 06:42:21.789641+08 | 4 | gpaddmirrors: segment config for resync: inserted mirror segment configuration
2017-06-28 06:42:21.789641+08 | 5 | gpaddmirrors: segment config for resync: inserted mirror segment configuration
2017-06-28 06:42:21.789641+08 | 2 | gpaddmirrors: segment config for resync: segment mode, status, and replication port
2017-06-28 06:42:21.789641+08 | 3 | gpaddmirrors: segment config for resync: segment mode, status, and replication port
2017-06-28 06:42:35.327377+08 | 2 | gpaddmirrors: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 06:42:35.327377+08 | 3 | gpaddmirrors: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 06:42:35.327377+08 | 4 | gpaddmirrors: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 06:42:35.327377+08 | 5 | gpaddmirrors: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 06:56:03.98806+08 | 2 | FTS: changed segment to insync from resync.
2017-06-28 06:56:03.98806+08 | 4 | FTS: changed segment to insync from resync.
2017-06-28 06:56:04.045572+08 | 3 | FTS: changed segment to insync from resync.
2017-06-28 06:56:04.045572+08 | 5 | FTS: changed segment to insync from resync.
2017-06-28 07:10:07.745568+08 | 3 | FTS: content 1 fault marking status DOWN role m
2017-06-28 07:10:07.745849+08 | 5 | FTS: content 1 fault marking status UP mode: change-tracking role p
2017-06-28 22:01:45.341958+08 | 3 | gprecoverseg: segment config for resync: segment mode and status
2017-06-28 22:01:58.732622+08 | 3 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 22:01:58.732622+08 | 5 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 22:02:57.53962+08 | 5 | FTS: changed segment to insync from resync.
2017-06-28 22:02:57.53962+08 | 3 | FTS: changed segment to insync from resync.
2017-06-28 22:31:02.680013+08 | 5 | FTS: content 1 fault marking status DOWN role m
2017-06-28 22:31:02.680309+08 | 3 | FTS: content 1 fault marking status UP mode: change-tracking role p
2017-06-28 22:31:12.655695+08 | 5 | gprecoverseg: segment config for resync: segment mode and status
2017-06-28 22:31:25.848857+08 | 3 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 22:31:25.848857+08 | 5 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 22:31:58.752335+08 | 3 | FTS: changed segment to insync from resync.
2017-06-28 22:31:58.752335+08 | 5 | FTS: changed segment to insync from resync.
(26 rows)
這樣,整個今天的分享就結束了,最後再囉嗦一句。從本文例子看出,數據庫主/備切換相當的簡單,gprecoverseg命令相當的智能,在Primary的主機出現故障之後,Mirror會自動切換為Primary,不影響數據庫的正常工作,但是對監控不是很到位的係統來說,不建議使用這個功能,首先這個功能存在一定的BUG,其次,監控不到位,一旦發現切換,並不能及時發現,如果再有節點出現故障,可能對數據恢複造成影響,而且如果單個節點的數據量非常大的時候,gprecoverseg同步數據的過程將會很漫長。
最後,祝大家再Deepgreen & Greenplum的路上一去不複返^_^
最後更新:2017-06-29 09:02:40