閱讀678 返回首頁    go 阿裏雲 go 技術社區[雲棲]


MySQL · 捉蟲動態 · 5.7 mysql_upgrade 元數據鎖等待

問題描述

如下圖,mysql_upgrade 過程中,執行 DROP DATABASE IF EXISTS performance_schema 一直在等待 metadata lock

屏幕快照 2017-04-01 14.30.03.png

問題排查

簡單粗暴的方法

有一種簡單的解決方法,把其他連接kill掉,釋放 metadata lock

對於這個案例,占用元數據鎖的是 Id = 107768,User = xx1 的連接

但是這種方法指標不治本,案例中占用元數據鎖的連接,是一個agent服務建立的

mysql_upgrade也是程序執行,不能每次都手工kill連接,需要查明為什麼占用鎖

詳細查明問題原因

據業務方反饋,agent服務和調用mysql_upgrade的代碼和5.6也在用,沒有出現問題。

懷疑是5.7引入的bug

根據上述現象,顯然是agent占了metadata lock,大概率不是mysql的bug

為了說服業務方,我們繼續排查是在等待什麼鎖

查詢 performance_schema.metadata_locks

首先想到5.7的 performance_schema.metadata_locks ,很遺憾這張表裏並沒有記錄

screenshot.png

gdb 獲取元數據鎖信息

我們嚐試使用 gdb 獲取鎖等待信息

ps aux | grep 端口號,找出mysqld進程號 pid,pstack pid > stack.log

在stack.log中搜索 acquire_lock(請求mdl鎖的函數),可以看出是 thread 3 在請求元數據鎖

screenshot.png

gdb -p pid
thread 3
切換到目標線程

#0  0x0000003fe940ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000bd3fb2 in native_cond_timedwait (this=0x7eff640e05d8, owner=0x7eff640e0540, abs_timeout=0x7effa83b2ce0, set_status_on_timeout=Unhandled dwarf expression opcode 0xf3
) 
#2  my_cond_timedwait (this=0x7eff640e05d8, owner=0x7eff640e0540, abs_timeout=0x7effa83b2ce0, set_status_on_timeout=Unhandled dwarf expression opcode 0xf3
) 
#3  inline_mysql_cond_timedwait (this=0x7eff640e05d8, owner=0x7eff640e0540, abs_timeout=0x7effa83b2ce0, set_status_on_timeout=Unhandled dwarf expression opcode 0xf3
) 
#4  MDL_wait::timed_wait (this=0x7eff640e05d8, owner=0x7eff640e0540, abs_timeout=0x7effa83b2ce0, set_status_on_timeout=Unhandled dwarf expression opcode 0xf3
) 
#5  0x0000000000bd6048 in MDL_context::acquire_lock (this=0x7eff640e05d8, mdl_request=0x7eff640aa870, lock_wait_timeout=Unhandled dwarf expression opcode 0xf3
) 

f 5
跳轉到  MDL_context::acquire_lock
acquire_lock 函數參數中有 MDL_request
MDL_request::MDL_key 中有詳細的鎖信息

p mdl_request->key

{m_length = 34, m_db_name_length = 18, m_ptr = "\003performance_schema\000global_status", '\000' <repeats 353 times>, static m_namespace_to_wait_state_name = \{ \{m_key = 0, 
  m_name = 0x130aa9b "Waiting for global read lock", m_flags = 0}, {m_key = 0, m_name = 0x130abb0 "Waiting for tablespace metadata lock", m_flags = 0}, {m_key = 0, 
  m_name = 0x130abd8 "Waiting for schema metadata lock", m_flags = 0}, {m_key = 0, m_name = 0x130ac00 "Waiting for table metadata lock", m_flags = 0}, {m_key = 0, 
  m_name = 0x130ac20 "Waiting for stored function metadata lock", m_flags = 0}, {m_key = 0, m_name = 0x130ac50 "Waiting for stored procedure metadata lock", m_flags = 0}, {m_key = 0, 
  m_name = 0x130ac80 "Waiting for trigger metadata lock", m_flags = 0}, {m_key = 0, m_name = 0x130aca8 "Waiting for event metadata lock", m_flags = 0}, {m_key = 0, 
  m_name = 0x130aab8 "Waiting for commit lock", m_flags = 0}, {m_key = 0, m_name = 0x130aad0 "User lock", m_flags = 0}, {m_key = 0, m_name = 0x130acc8 "Waiting for locking service lock", 
  m_flags = 0} } }

上述信息可以看出,正在請求performance_schema.global_status這張表的鎖

排查業務代碼

和業務方確認,agent中確實執行了 “show global status” , 但是已經設置了autocommit

簡化邏輯後,agent代碼如下

import MySQLdb
from time import sleep
conn = MySQLdb.connect(host='47.93.49.119', port=3001, user='xx1')
conn.autocommit = True
cur=conn.cursor()
cur.execute("show global status")
while 1:
    sleep(1)

代碼中確實設置了autocommit,但是並沒有生效(如果執行了commit,不可能不釋放元數據鎖)

MySQLdb.connect 返回 Connection 類,根據上述代碼,autocommit是 Connection的成員屬性

class Connection(_mysql.connection):

Connection 繼承自_mysql.connection,_mysql 是c語言實現的python庫,查看_mysql.c

static PyMethodDef _mysql_ConnectionObject_methods[] = {
    {
        "affected_rows",
        (PyCFunction)_mysql_ConnectionObject_affected_rows,
        METH_VARARGS,
        _mysql_ConnectionObject_affected_rows__doc__
    },
    {
        "autocommit",
        (PyCFunction)_mysql_ConnectionObject_autocommit,
        METH_VARARGS,
        _mysql_ConnectionObject_autocommit__doc__
    },
    {
        "commit",
        (PyCFunction)_mysql_ConnectionObject_commit,
        METH_VARARGS,
        _mysql_ConnectionObject_commit__doc__
    },

autommit 並不是成員屬性,而是一個成員方法

結論

conn.autocommit = True 強行將 autocommit 的函數指針賦值為 True,並沒有真正設置autocommit

5.6中沒有發現這個問題

一是 agent 中隻有查詢語句,不設autocommit也能返回查詢結果

二是 5.6中 “show global status” 查詢的是 information_shcema,5.7中是performance_schema,5.6中不會影響 drop database performance_schema

最後更新:2017-04-21 09:30:47

  上一篇:go linux性能分析工具
  下一篇:go HybridDB · 穩定性 · HybridDB如何優雅的處理Out Of Memery問題