閱讀342 返回首頁    go 阿裏雲 go 技術社區[雲棲]


P_S數據不完整原因詳析

In this post, we’ll examine why in an initial flushing analysis we find that Performance Schema data is incomplete.

本文,我們將從初始化刷新分析角度來闡述P_S數據不完整的原因。

Having shown the performance impact of Percona Server 5.7 patches, we can now discuss their technical reasoning and details. Let’s revisit the MySQL 5.7.11 performance schema synch wait graph from the previous post, for the case of unlimited InnoDB concurrency:

Percona Server 5.7 performance improvements (詳見文末延伸閱讀)文中已經表明了Percona Server 5.7補丁對於性能的影響,我們現在可以討論它們的技術原理和細節。讓我們從上文中回顧一下MySQL 5.7.11 performance schema synch wait曲線圖,在這個測試中不限製InnoDB並發線程數(innodb_thread_concurrency):

640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=

First of all, this graph is a little “nicer” than reality, which limits its diagnostic value. There are two reasons for this. The first one is that page cleaner worker threads are invisible to Performance Schema (see bug 79894). This alone limits PFS value in 5.7 if, for example, one tries to select only the events in the page cleaner threads or monitors low concurrency where the cleaner thread count is non-negligible part of the total threads.

首先,這個曲線圖看起來要比實際情況“好一些”,這使得它的診斷價值有限。這有兩個原因,第一個原因是由於page cleaner線程在Performance Schema中不可見(見bug 79894,詳見文末延伸閱讀)。這個僅限於5.7中的PFS值,如果隻查page cleanner線程中的事件,或者監視低並發性,其中cleaner線程數是整個線程的不可忽視的部分。

To understand the second reason, let’s look into PMP for the same setting. Note that selected intermediate stack frames were removed for clarity, especially in the InnoDB mutex implementation.

為了理解第二個原因,讓我們來看看相同設置的PMP(譯者注:用pt-pmp工具抓取)。 請注意,為了清楚起見,移除了所選的中間堆棧幀,尤其是在InnoDB互斥實現中。

640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=

The top wait in both PMP and the graph is the 660 samples of enter mutex inbuf_dblwr_write_single_pages, which is the doublewrite mutex. Now try to find the nearly as hot 631 samples of event wait inbuf_dblwr_write_single_page in the PFS output. You won’t find it because InnoDB OS event waits are not annotated in Performance Schema. In most cases this is correct, as OS event waits tend to be used when there is no work to do. The thread waits for work to appear, or for time to pass. But in the report above, the waiting thread is blocked from proceeding with useful work (see bug 80979).

在PMP和圖表中最多的等待事件是660個inbuf_dblwr_write_single_pages事件,這是doublewrite mutex。現在嚐試在PFS中找到同樣也很高的的631個inbuf_dblwr_write_single_page事件。但卻無法找到,因為InnoDB OS事件等待並不再PFS中記錄。這在大多數情況下是沒問題的,因為當InnoDB內部沒事做時,則進入InnoDB OS wait事件狀態。該線程要麼等著出現工作時間,要麼隨著時間消逝。但是在上麵的報告中,等待線程被阻止無法處理必要的工作(見bug 80979,詳見文末延伸閱讀)。

Now that we’ve shown the two reasons why PFS data is not telling the whole server story, let’s take PMP data instead and consider how to proceed. Those top two PMP waits suggest 1) the server is performing a lot of single page flushes, and 2) those single page flushes have their concurrency limited by the eight doublewrite single-page flush slots available, and that the wait for a free slot to appear is significant.

現在我們已經解釋完了PFS數據中並沒體現mysql全部狀態的兩個原因,那麼讓我們轉而考慮PMP數據,並考慮如何進行。上麵的兩個PMP等待意味著,1)服務器執行大量的single page flush;2)這些single page flush的並發性受限於doublewrite中8個可用的single-page flush slot,並且很明顯是在等待空閑的slot出現。

Two options become apparent at this point: either make the single-page flush doublewrite more parallel or reduce the single-page flushing in the first place. We’re big fans of the latter option since version 5.6 performance work, where we configured Percona Server to not perform single-page flushes at all by introducing the innodb_empty_free_list_algorithm option, with the “backoff” default.

這點上顯然有兩種選擇:要麼使single-page flush doublewrite更加並行,要麼減少single-page flush。自5.6版本的性能優化開始,我們一直堅定後一種選擇,在這裏,我們通過引入innodb_empty_free_list_algorithm(詳見文末延伸閱讀)選項來配置Percona Server,使得不執行single-page flush,默認情況下是“backoff”。

The next post in the series will describe how we removed single-page flushing in 5.7.

葉師傅

本文來自雲棲社區合作夥伴“老葉茶館”,了解相關信息可以關注“老葉茶館”微信公眾號

最後更新:2017-10-24 18:03:53

  上一篇:go  雲效公有雲如何構建一個基於Composer的PHP項目
  下一篇:go  Elasticsearch 創始人 Shay Banon:讓數據自己說話