TokuDB · 捉蟲動態 · MRR 導致查詢失敗

問題背景

最近有用戶在使用 TokuDB 時，遇到了一個查詢報錯的問題，這裏給大家分享下。

具體的報錯信息是這樣的：

mysql> select * from t2 where uid > 1 limit 10;
ERROR 1030 (HY000): Got error 1 from storage engine

表結構如下：

CREATE TABLE `t2` (
  `id` bigint(20) NOT NULL,
  `uid` bigint(20) DEFAULT NULL,
  `post` text,
  `note` text,
  PRIMARY KEY (`id`),
  KEY `idx_uid` (`uid`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8

問題分析

從報錯信息來看，是引擎層返回錯誤的，難道是 TokuDB 數據出問題了麼，我們首先要確認的是用戶數據是否還能訪問。

從表結構來看，出錯的語句應該走了二級索引，那麼我們強製走 PK 是否能訪問數據呢。

select * from t2 force index(primary) where uid > 1 limit 3;
xxx
xxx
xxx
3 rows in set (0.00 sec)

上麵的測試可以說明走 PK 是沒問題呢，那麼問題可能在二級索引。

同時我們在觀察用戶的其它 SQL 時發現，二級索引也是可以訪問數據的。

比如下麵這種：

select * from t2  where uid > 1 order by uid limit 3;
xxx
xxx
xxx
3 rows in set (0.00 sec)

都是走二級索引，為什麼有的會報錯呢，這 2 條語句有啥區別呢，explain 看下：

mysql> explain select * from t2  where uid > 1 limit 3;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+-----------------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra                                         |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+-----------------------------------------------+
|  1 | SIMPLE      | t2    | range | idx_uid       | idx_uid | 9       | NULL | 523677 | Using index condition; Using where; Using MRR |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+-----------------------------------------------+
1 row in set (0.00 sec)

mysql> explain select * from t2  where uid > 1 order by uid limit 3;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra                              |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------+
|  1 | SIMPLE      | t2    | range | idx_uid       | idx_uid | 9       | NULL | 523677 | Using index condition; Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------+
1 row in set (0.00 sec)

可以看到出錯的語句，用到了 MRR（不了解 MRR 的可以看下我們之前的月報優化器 MRR & BKA），這是優化器在走二級索引時，為了減少回表的磁盤 IO 的一個優化。

把這個優化關掉呢？

set optimizer_switch='mrr=off';
mysql> explain select id from t2  where uid > 1 limit 3;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | t2    | range | idx_uid       | idx_uid | 9       | NULL | 523677 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)

select * from t2  where uid > 1 limit 3;
xxx
xxx
xxx
3 rows in set (0.00 sec)

可以看到，關掉優化器的 MRR 後，語句就返回正常了。因此基本可以判斷是 MRR 導致的。

下麵我們從源碼層麵分析下看，到底是怎麼回事。

根據報錯信息，來 gdb 跟蹤，發現導致報錯的棧是這樣的，可以看到是在 mrr 執行初始化階段：

#0  DsMrr_impl::dsmrr_init()
#1  ha_tokudb::multi_range_read_init()
#2  QUICK_RANGE_SELECT::reset()
#3  join_init_read_record()
#4  sub_select()
#5  do_select()
#6  JOIN::exec()
#7  mysql_execute_select()
#8  mysql_select()
#9  handle_select()
#10 execute_sqlcom_select()
#11 mysql_execute_command()
...

具體在 DsMrr_impl::dsmrr_init 中的邏輯是這樣的：

// Transfer ICP from h to h2
if (mrr_keyno == h->pushed_idx_cond_keyno)
{
  if (h2->idx_cond_push(mrr_keyno, h->pushed_idx_cond))
  {
    retval= 1;
    goto error;
  }
}

我們對應看下 TokuDB 裏條件下推接口實現：

// we cache the information so we can do filtering ourselves,
// but as far as MySQL knows, we are not doing any filtering,
// so if we happen to miss filtering a row that does not match
// idx_cond_arg, MySQL will catch it.
// This allows us the ability to deal with only index_next and index_prev,
// and not need to worry about other index_XXX functions
Item* ha_tokudb::idx_cond_push(uint keyno_arg, Item* idx_cond_arg) {
    toku_pushed_idx_cond_keyno = keyno_arg;
    toku_pushed_idx_cond = idx_cond_arg;
    return idx_cond_arg;
}

可以看到 ha_tokudb::idx_cond_push 是會將原條件在返回給 server 的。因此就導致了 DsMrr_impl::dsmrr_init 返回錯誤碼 1 (Got error 1 from storage engine)。

而 handler:idx_cond_push() 接口是允許引擎層返回非 NULL 值的，引擎層認為自己沒有完全過濾結果集，那麼是可以返回條件給 server 層，讓 server 層再做一次過濾的：

/**
  Push down an index condition to the handler.

  The server will use this method to push down a condition it wants
  the handler to evaluate when retrieving records using a specified
  index. The pushed index condition will only refer to fields from
  this handler that is contained in the index (but it may also refer
  to fields in other handlers). Before the handler evaluates the
  condition it must read the content of the index entry into the
  record buffer.

  The handler is free to decide if and how much of the condition it
  will take responsibility for evaluating. Based on this evaluation
  it should return the part of the condition it will not evaluate.
  If it decides to evaluate the entire condition it should return
  NULL. If it decides not to evaluate any part of the condition it
  should return a pointer to the same condition as given as argument.

  @param keyno    the index number to evaluate the condition on
  @param idx_cond the condition to be evaluated by the handler

  @return The part of the pushed condition that the handler decides
          not to evaluate
 */

virtual Item *idx_cond_push(uint keyno, Item* idx_cond) { return idx_cond; }

因此這個問題是 MRR 在實現上的一個 bug，沒有考慮引擎在ICP時返回非 NULL 的情況。

另外我們在查問題時發現，如果 mysqld 重啟或者通過 flush table 關閉表的話，查詢是不會出錯的：

mysql> flush table t2;
mysql> explain  select * from t2  where uid > 1 limit 3;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra                              |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------+
|  1 | SIMPLE      | t2    | range | idx_uid       | idx_uid | 9       | NULL | 523677 | Using index condition; Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------+

從 explain 結果看，是因為沒有用到 MRR，這又是為什麼呢？

我們看下優化器是如何選擇是否用MRR優化的，在 DsMrr_impl::choose_mrr_impl() 這個函數裏是有這樣的邏輯的：

 /*
   If @@optimizer_switch has "mrr_cost_based" on, we should avoid
   using DS-MRR for queries where it is likely that the records are
   stored in memory. Since there is currently no way to determine
   this, we use a heuristic:
   a) if the storage engine has a memory buffer, DS-MRR is only
      considered if the table size is bigger than the buffer.
   b) if the storage engine does not have a memory buffer, DS-MRR is
      only considered if the table size is bigger than 100MB.
   c) Since there is an initial setup cost of DS-MRR, so it is only
      considered if at least 50 records will be read.
 */
 if (thd->optimizer_switch_flag(OPTIMIZER_SWITCH_MRR_COST_BASED))
 {
   /*
     If the storage engine has a database buffer we use this as the
     minimum size the table should have before considering DS-MRR.
   */
   longlong min_file_size= table->file->get_memory_buffer_size();
   if (min_file_size == -1)
   {
     // No estimate for database buffer
     min_file_size= 100 * 1024 * 1024;    // 100 MB
   }

   if (table->file->stats.data_file_length <
       static_cast<ulonglong>(min_file_size) ||
       rows <= 50)
     return true;                 // Use the default implementation
 }

可以看到，MRR 選擇條件是這樣的：

如果引擎的 cache 比表大的話，是不會用 MRR 優化的；
如果引擎沒有 cache，默認用 100M，用於自己不管理 cache 引擎，如 MyISAM；
如果要查詢的行數不超過50的話，也是不會用 MRR 優化的；

這個 cache 對 InnoDB 來說，就是 innodb_buffer_pool_size；對 TokuDB 來說，就是 tokudb_cache_size。但是 TokuDB handler 層沒有實現 get_memory_buffer_size() 這個接口，導致一直用 100M 做為 cache 來判斷，這個是 TokuDB handler 實現的上的一個bug。

而 data_file_length 這個是值是內存信息，在表剛關閉重新打開的時候，是0，所以不會用MRR優化。

另外還有一個判斷條件時，如果要求排序的話，也是不會用 MRR 優化的，這也就是為什麼我們剛開始發現的，語句中用了 order by 後，explain 結果中就沒有 MRR了。

問題影響和解決

從上麵的分析來看，滿足下麵條件語句會被影響：

語句訪問的是 TokuDB 表，並且走的二級索引，有回表操作；
表大小超過 100M；

簡單的判斷方法是，explain 結果中有 Using index condition; Using where; Using MRR，並且語句報錯 Got error 1 from storage engine。

臨時的解決方法是關閉優化器的 MRR 或者 ICP：

set optimizer_switch='mrr=off';
or
set optimizer_switch='index_condition_pushdown=off';

最後更新：2017-04-21 09:30:47

TokuDB · 捉蟲動態 · MRR 導致查詢失敗

問題背景

問題分析

問題影響和解決

上一篇： HybridDB · 穩定性 · HybridDB如何優雅的處理Out Of Memery問題

下一篇： nginx負載均衡配置

相關內容

熱門內容

最新內容