閱讀409 返回首頁    go 阿裏雲 go 技術社區[雲棲]


Greenplum 激活standby master失敗後的異常修複

激活standby master失敗後,主庫和備庫都起不來了。

如下,修改了MASTER_DATA_DIRECTORY和PGPORT環境變量為新的主庫,啟動主庫。
$gpstart -a
20151222:16:49:41:073138 gpstart:digoal_host:digoal-[INFO]:-Starting gpstart with args: -a
20151222:16:49:41:073138 gpstart:digoal_host:digoal-[INFO]:-Gathering information and validating the environment...
20151222:16:49:41:073138 gpstart:digoal_host:digoal-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.6.1 build 2'
20151222:16:49:41:073138 gpstart:digoal_host:digoal-[INFO]:-Greenplum Catalog Version: '201310150'
20151222:16:49:41:073138 gpstart:digoal_host:digoal-[INFO]:-Starting Master instance in admin mode
20151222:16:49:43:073138 gpstart:digoal_host:digoal-[CRITICAL]:-Failed to start Master instance in admin mode
20151222:16:49:43:073138 gpstart:digoal_host:digoal-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /disk1/digoal/gpdata/gpseg-2 -l /disk1/digoal/gpdata/gpseg-2/pg_log/startup.log -w -t 600 -o " -p 1922 -b 48 -z 0 --silent-mode=true -i -M master -C -1 -x 0 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start...... stopped waiting
', stderr='pg_ctl: PID file "/disk1/digoal/gpdata/gpseg-2/postmaster.pid" does not exist
pg_ctl: could not start server
Examine the log output.
'
失敗
手工執行命令當然也不行
env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /disk1/digoal/gpdata/gpseg-2 -l /disk1/digoal/gpdata/gpseg-2/pg_log/startup.log -w -t 600 -o " -p 1922 -b 48 -z 0 --silent-mode=true -i -M master -C -1 -x 0 -c gp_role=utility " start

使用master only模式啟動當然也是不行的。
$gpstart -m
20151222:16:58:05:077478 gpstart:digoal_host:digoal-[INFO]:-Starting gpstart with args: -m
20151222:16:58:05:077478 gpstart:digoal_host:digoal-[INFO]:-Gathering information and validating the environment...
20151222:16:58:05:077478 gpstart:digoal_host:digoal-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.6.1 build 2'
20151222:16:58:05:077478 gpstart:digoal_host:digoal-[INFO]:-Greenplum Catalog Version: '201310150'
20151222:16:58:05:077478 gpstart:digoal_host:digoal-[INFO]:-Master-only start requested in configuration without a standby master.

Continue with master-only startup Yy|Nn (default=N):
> y
20151222:16:58:06:077478 gpstart:digoal_host:digoal-[INFO]:-Starting Master instance in admin mode
20151222:16:58:08:077478 gpstart:digoal_host:digoal-[CRITICAL]:-Failed to start Master instance in admin mode
20151222:16:58:08:077478 gpstart:digoal_host:digoal-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /disk1/digoal/gpdata/gpseg-2 -l /disk1/digoal/gpdata/gpseg-2/pg_log/startup.log -w -t 600 -o " -p 1922 -b 48 -z 0 --silent-mode=true -i -M master -C -1 -x 0 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start...... stopped waiting
', stderr='pg_ctl: PID file "/disk1/digoal/gpdata/gpseg-2/postmaster.pid" does not exist
pg_ctl: could not start server
Examine the log output.
'

限製模式啟動也不行
$gpstart -R
20151222:16:57:21:076997 gpstart:digoal_host:digoal-[INFO]:-Starting gpstart with args: -R
20151222:16:57:21:076997 gpstart:digoal_host:digoal-[INFO]:-Gathering information and validating the environment...
20151222:16:57:21:076997 gpstart:digoal_host:digoal-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.6.1 build 2'
20151222:16:57:21:076997 gpstart:digoal_host:digoal-[INFO]:-Greenplum Catalog Version: '201310150'
20151222:16:57:21:076997 gpstart:digoal_host:digoal-[INFO]:-Starting Master instance in admin mode
20151222:16:57:24:076997 gpstart:digoal_host:digoal-[CRITICAL]:-Failed to start Master instance in admin mode
20151222:16:57:24:076997 gpstart:digoal_host:digoal-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /disk1/digoal/gpdata/gpseg-1 -l /disk1/digoal/gpdata/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 1921 -b 1 -z 0 --silent-mode=true -i -M master -C -1 -x 48 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start...... stopped waiting
', stderr='pg_ctl: PID file "/disk1/digoal/gpdata/gpseg-1/postmaster.pid" does not exist
pg_ctl: could not start server
Examine the log output.
'

修改了MASTER_DATA_DIRECTORY和PGPORT環境變量為老的主庫
然後試圖激活原來的主庫也失敗
$gpactivatestandby  -f
20151222:16:51:28:074293 gpactivatestandby:digoal_host:digoal-[INFO]:------------------------------------------------------
20151222:16:51:28:074293 gpactivatestandby:digoal_host:digoal-[INFO]:-Standby data directory    = /disk1/digoal/gpdata/gpseg-1
20151222:16:51:28:074293 gpactivatestandby:digoal_host:digoal-[INFO]:-Standby port              = 1921
20151222:16:51:28:074293 gpactivatestandby:digoal_host:digoal-[INFO]:-Standby running           = no
20151222:16:51:28:074293 gpactivatestandby:digoal_host:digoal-[INFO]:-Force standby activation  = yes
20151222:16:51:28:074293 gpactivatestandby:digoal_host:digoal-[INFO]:------------------------------------------------------
Do you want to continue with standby master activation? Yy|Nn (default=N):
> y
20151222:16:51:29:074293 gpactivatestandby:digoal_host:digoal-[INFO]:-Starting standby master database in utility mode...
20151222:16:51:31:074293 gpactivatestandby:digoal_host:digoal-[CRITICAL]:-Error activating standby master: ExecutionError: 'non-zero rc: 2' occured.  Details: 'GPSTART_INTERNAL_MASTER_ONLY=1 $GPHOME/bin/gpstart -a -m -v'  cmd had rc=2 completed=True halted=False
  stdout='20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Starting gpstart with args: -a -m -v
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Setting level of parallelism to: 64
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Gathering information and validating the environment...
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Checking if GPHOME env variable is set.
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable is set.
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Checking if LOGNAME or USER env variable is set.
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:---Checking that current user can use GP binaries
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Obtaining master's port from master data directory
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Read from postgresql.conf port=1921
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Read from postgresql.conf max_connections=48
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-gp_external_grant_privileges is None
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Reading the gp_dbid file - /disk1/digoal/gpdata/gpseg-1/gp_dbid...
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Parsing : # Greenplum Database identifier for this master/segment. ...
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Parsing : # Do not change the contents of this file. ...
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Parsing : dbid = 1 ...
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Found match for dbid: 1.
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Parsing : standby_dbid = 48 ...
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Found match for standby_dbid: 48.
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.6.1 build 2'
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Greenplum Catalog Version: '201310150'
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Check if Master is already running...
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Master-only start requested for management utilities.
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Starting Master instance in admin mode
20151222:16:51:31:074365 gpstart:digoal_host:digoal-[CRITICAL]:-Failed to start Master instance in admin mode
20151222:16:51:31:074365 gpstart:digoal_host:digoal-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /disk1/digoal/gpdata/gpseg-1 -l /disk1/digoal/gpdata/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 1921 -b 1 -z 0 --silent-mode=true -i -M master -C -1 -x 48 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start...... stopped waiting
', stderr='pg_ctl: PID file "/disk1/digoal/gpdata/gpseg-1/postmaster.pid" does not exist
pg_ctl: could not start server
Examine the log output.
'
'
  stderr=''


老的主庫,以master only啟動也失敗。
$gpstart -m
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[INFO]:-Starting gpstart with args: -m
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[INFO]:-Gathering information and validating the environment...
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.6.1 build 2'
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[INFO]:-Greenplum Catalog Version: '201310150'
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[WARNING]:-****************************************************************************
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[WARNING]:-Master-only start requested in a configuration with a standby master.
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[WARNING]:-This is advisable only under the direct supervision of Greenplum support. 
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[WARNING]:-This mode of operation is not supported in a production environment and 
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[WARNING]:-may lead to a split-brain condition and possible unrecoverable data loss.
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[WARNING]:-****************************************************************************

Continue with master-only startup Yy|Nn (default=N):
> y
20151222:16:57:44:077229 gpstart:digoal_host:digoal-[INFO]:-Starting Master instance in admin mode
20151222:16:57:46:077229 gpstart:digoal_host:digoal-[CRITICAL]:-Failed to start Master instance in admin mode
20151222:16:57:46:077229 gpstart:digoal_host:digoal-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /disk1/digoal/gpdata/gpseg-1 -l /disk1/digoal/gpdata/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 1921 -b 1 -z 0 --silent-mode=true -i -M master -C -1 -x 48 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start...... stopped waiting
', stderr='pg_ctl: PID file "/disk1/digoal/gpdata/gpseg-1/postmaster.pid" does not exist
pg_ctl: could not start server
Examine the log output.
'

老的主庫啟動時,報錯如下:
2015-12-22 16:57:45.959837 CST,,,p77246,th273340192,,,,0,,,seg-1,,,,,"LOG","00000","Found recovery.conf file, checking appropriate parameters  for recovery in standby mode",,,,,,,0,,"xlog.c",5663,
2015-12-22 16:57:46.010953 CST,,,p77246,th273340192,,,,0,,,seg-1,,,,,"FATAL","XX000","recovery command file ""recovery.conf"" request for standby mode not specified (xlog.c:5756)",,,,,,,0,,"xlog.c",5756,"Stack trace:
1    0xb04cde postgres errstart (elog.c:502)
2    0x54afe7 postgres XLogReadRecoveryCommandFile (xlog.c:5754)
3    0x560a84 postgres StartupXLOG (xlog.c:6441)
4    0x564966 postgres StartupProcessMain (xlog.c:10970)
5    0x5f4675 postgres AuxiliaryProcessMain (bootstrap.c:463)
6    0x8eacd4 postgres <symbol not found> (postmaster.c:7589)
7    0x8eaefd postgres StartMasterOrPrimaryPostmasterProcesses (postmaster.c:1576)
8    0x8fce76 postgres doRequestedPrimaryMirrorModeTransitions (primary_mirror_mode.c:1735)
9    0x8f4122 postgres <symbol not found> (postmaster.c:2272)
10   0x8f76f0 postgres PostmasterMain (postmaster.c:7589)
11   0x7fa58f postgres main (main.c:206)
12   0x7f1b11cb4cdd libc.so.6 __libc_start_main (??:0)
13   0x4c2cf9 postgres <symbol not found> (??:0)
"
2015-12-22 16:57:46.012709 CST,,,p77240,th273340192,,,,0,,,seg-1,,,,,"LOG","00000","startup process (PID 77246) exited with exit code 1",,,,,,,0,,"postmaster.c",5854,
2015-12-22 16:57:46.012735 CST,,,p77240,th273340192,,,,0,,,seg-1,,,,,"LOG","00000","aborting startup due to startup process failure",,,,,,,0,,"postmaster.c",4706,

進入老的主庫數據目錄,發現多了兩個文件
cd /disk1/digoal/gpdata/gpseg-1
-rw-r--r-- 1 digoal users     0 Dec 22 16:51 promote
-rw-r--r-- 1 digoal users     0 Dec 22 16:51 recovery.conf
promote代表要激活它,recovery.conf沒有用。
把這兩個文件刪掉。

現在要做的時,把老的主庫起來,然後刪掉不能起來的standby master。
$gpstart -m
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[INFO]:-Starting gpstart with args: -m
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[INFO]:-Gathering information and validating the environment...
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.6.1 build 2'
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[INFO]:-Greenplum Catalog Version: '201310150'
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[WARNING]:-****************************************************************************
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[WARNING]:-Master-only start requested in a configuration with a standby master.
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[WARNING]:-This is advisable only under the direct supervision of Greenplum support. 
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[WARNING]:-This mode of operation is not supported in a production environment and 
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[WARNING]:-may lead to a split-brain condition and possible unrecoverable data loss.
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[WARNING]:-****************************************************************************

Continue with master-only startup Yy|Nn (default=N):
> y
20151222:18:20:29:116706 gpstart:digoal_host:digoal-[INFO]:-Starting Master instance in admin mode
20151222:18:20:32:116706 gpstart:digoal_host:digoal-[INFO]:-Obtaining Greenplum Master catalog information
20151222:18:20:32:116706 gpstart:digoal_host:digoal-[INFO]:-Obtaining Segment details from master...
20151222:18:20:32:116706 gpstart:digoal_host:digoal-[INFO]:-Setting new master era
20151222:18:20:32:116706 gpstart:digoal_host:digoal-[INFO]:-Master Started...

刪除standby master
$gpinitstandby -r
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:------------------------------------------------------
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Warm master standby removal parameters
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:------------------------------------------------------
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Greenplum master hostname               = digoal_host.sqa.zmf
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Greenplum master data directory         = /disk1/digoal/gpdata/gpseg-1
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Greenplum master port                   = 1921
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Greenplum standby master hostname       = digoal_host.sqa.zmf
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Greenplum standby master port           = 1922
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Greenplum standby master data directory = /disk1/digoal/gpdata/gpseg-2
Do you want to continue with deleting the standby master? Yy|Nn (default=N):
> y
20151222:18:20:52:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Removing standby master from catalog...
20151222:18:20:52:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Database catalog updated successfully.
20151222:18:20:52:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Removing standby entry from gp_transaction_files_filespace flat file
20151222:18:20:52:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Removing standby entry from gp_temporary_files_filespace flat file
20151222:18:20:52:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Removing filespace directories on standby master...
20151222:18:20:52:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Successfully removed standby master

現在可以關閉並啟動主庫了。
$gpstop -M fast -a
$gpstart -a


最後更新:2017-04-01 13:44:32

  上一篇:go PPAS 外部插件管理
  下一篇:go 安全監控、告警及自動化!