265 阿裏雲技術社區[雲棲]

數據庫優化器原理 - 如何治療選擇綜合症

標簽

PostgreSQL , 單列索引 , 複合索引 , 優化器 , 成本因子

背景

RBO -> CBO -> 動態優化

經常聽到這樣的聲音：“查詢慢？加個索引吧。”，雖然話不專業，但是體現了早期基於RBO（基於規則）的優化器思維。

通常對業務不熟悉，或者對數據庫不熟悉時，可能會憑自覺做出這樣的判斷。

RBO思維存在較大的問題，所以導致了CBO（基於成本）的出現。

再往後，(生成執行計劃->執行這樣的)靜態CBO又要落伍了，緊接著會是動態的執行計劃(邊執行->邊生成下一階段的執行計劃)。

動態執行計劃好似導航軟件的躲避擁堵功能，階段性的給出最佳的線路。

PostgreSQL pro動態執行計劃roadmap

https://postgrespro.com/roadmap/

https://postgrespro.com/roadmap/56513

1. Machine learning    
    
Query planner selects “cheapest” query plan based on its cost estimation.      
    
But it’s done with  many rough assumptions.      
    
This is why the estimated cost could be inadequate to real execution cost.      
    
One possibility is to improve the cost estimate mechanism itself by adding features     
like multivariate statistics.      
    
Another possibility is to use query execution feedback:     
see how estimated parameter values differ from actual parameter values.      
    
We can apply machine learning techniques  to improve the cost estimates using this feedback,     
so DBMS would learn on its own mistakes.    
    
We’ve already done this in a simple case, and further work is planned in the following directions:    
    
1.1 Extend implemented model to cover more use cases,    
    
1.2 Provide the infrastructure necessary to make our machine learning an extension.    
    
    
2. Execution-time planning    
    
Currently, query planning strictly precedes query execution.      
    
Sometimes it appears to be a serious limitation.      
    
When one part of a plan is already executed it could be possible to significantly improve     
the rest of the plan on the basis of gathered statistics.      
    
We can see two cases when this approach could be applied:    
    
2.1 Online reordering of filter expressions.      
    
During sequential scan of large table it’s important to do the cheapest 
and the most selective checks first.      
    
However estimated selectivity and cost of filtering are inaccurate,     
and thus the order of applying filters based on estimates can  be not optimal.     
    
But filter expressions could be reordered online on the base of statistics of their previous execution.    
    
2.2 Some queries could be divided into sequence of steps when subsequent steps could be     
replanned on the base of results of previous steps.      
    
For instance, suppose that step 1 is a scan of table A, and step 2 is a join of tables A and B.      
    
Depending on row count and data distribution from the first step we could choose     
different join algorithm on the second step.

我們回到CBO，既然是基於成本的優化，那麼成本是如何算出來的呢？

數據庫收到用戶的SQL請求後，會經過parser, rewrite, 產生paths, 產生最優plan, execute plan, 返回結果。

源碼參考
https://www.postgresql.org/developer/backend/

產生paths的功能類似下圖：達到目的有多少種方法；或者去往某個目的地，有多少種走法；又或者解題有多少種解法。

Creates path from parser output This takes the parser query output,     
and generates all possible methods of executing the request.     
    
It examines table join order, where clause restrictions,     
and optimizer table statistics to evaluate each possible execution method,     
and assigns a cost to each.

產生最優plan，(從多個解法中，選擇成本最低的path)，生成plan。

Optimizes path output This takes the optimizer/path output,     
chooses the path with the least cost, and creates a plan for the executor.

我們看上麵那張圖，每一個node(小圓圈)是一次運算，運算完將數據輸送給上層的node，到達頂端時計算結束返回結果給用戶。

每個path的成本，取決於該path每個node的成本總和。

接下來引出今日話題，當優化器可以選擇不同的索引解決同一個SQL的問題時，該選哪個呢？

例子

測試表與數據    
postgres=# create table multi_col(c1 int, c2 int, c3 timestamp, c4 text);    
CREATE TABLE    
    
postgres=# insert into multi_col select random()*10, random()*10, 
now()+(id||'sec')::interval , md5(random()::text) from generate_series(1,10000000) t(id);    
INSERT 0 10000000    
    
2個索引    
create index idx1 on multi_col(c3);    
    
create index idx1 on multi_col(c3,c2,c1);    
    
SQL如下    
select * from multi_col where c3 between '2017-05-06 10:54:38.112188' and '2017-05-28 10:54:38.112188' and c2=1;    
    
select * from multi_col where c3 between '2017-05-06 10:54:38.112188' and '2017-05-28 10:54:38.112188';    
    
select * from multi_col where c3 between '2017-05-06 10:54:38.112188' and '2017-05-28 10:54:38.112188' and c1=1;    
    
它們如何選擇索引？    
    
索引大小    
 public | idx1                           | index | postgres | multi_col              | 214 MB  |     
 public | idx2                           | index | postgres | multi_col              | 301 MB  |     
    
表大小    
                      List of relations    
 Schema |   Name    | Type  |  Owner   |  Size  | Description     
--------+-----------+-------+----------+--------+-------------    
 public | multi_col | table | postgres | 806 MB |     
(1 row)

接下來，看完本文，不僅僅可以解答今日問題，其他優化器相關的問題也迎刃而解。

優化器如何治療選擇綜合症

node成本如何計算

前麵說了，CBO是基於成本的優化，當一條SQL可以使用多個索引，或者可以選擇多種訪問路徑時，該如何選擇呢？

這是優化器經常需要麵對的問題。特別是PostgreSQL支持的訪問方法很多，選擇更多。

有哪些node可以參考 src/backend/commands/explain.c

手冊中有詳細的成本計算方法的例子

Chapter 67. How the Planner Uses Statistics

https://www.postgresql.org/docs/9.6/static/planner-stats-details.html

拆解後，node成本的計算實際上依賴幾個東西：

1. 成本因子，詳見

https://www.postgresql.org/docs/9.6/static/runtime-config-query.html

2. 統計信息（記錄數、PAGE數、列統計信息、線性相關性、高頻值、高頻值的比例等），詳見

pg_stats統計視圖。

3. 算法。每種NODE的算法都不一樣，詳見

src/backend/optimizer/path/costsize.c

 * costsize.c    
 *        Routines to compute (and set) relation sizes and path costs    
 *    
 * Path costs are measured in arbitrary units established by these basic    
 * parameters:    
    
以下是一部分成本因子，在計算成本時會用到。    
    
 *      seq_page_cost           Cost of a sequential page fetch    
 *      random_page_cost        Cost of a non-sequential page fetch    
 *      cpu_tuple_cost          Cost of typical CPU time to process a tuple    
 *      cpu_index_tuple_cost  Cost of typical CPU time to process an index tuple    
 *      cpu_operator_cost       Cost of CPU time to execute an operator or function    
 *      parallel_tuple_cost Cost of CPU time to pass a tuple from worker to master backend    
 *      parallel_setup_cost Cost of setting up shared memory for parallelism

如何改變node的成本計算結果

通過改變統計信息、成本因子、算法，可以改變NODE的成本計算結果。

1. 統計信息通過analyze收集，PostgreSQL支持列級設置柱狀圖bucket大小。默認是100，最大可以設置到10000。

bucket越大，統計信息越準確，但是統計耗時越長。

postgres=# alter table multi_col alter column c1 set statistics 10000;    
ALTER TABLE    
postgres=# analyze multi_col ;

修改統計信息，會直接影響NODE的成本計算結果。

2. 修改成本因子，可以直接影響NODE的成本計算結果。

例如全表掃描NODE，修改seq_page_cost會影響全表掃描NODE單個PAGE的掃描成本。

postgres=# show seq_page_cost ;    
 seq_page_cost     
---------------    
 1    
(1 row)    
postgres=# explain select count(*) from multi_col ;    
                                QUERY PLAN                                     
---------------------------------------------------------------------------    
 Aggregate  (cost=228093.00..228093.01 rows=1 width=8)    
   ->  Seq Scan on multi_col  (cost=0.00..203093.00 rows=10000000 width=0)    
(2 rows)    
    
postgres=# set seq_page_cost =90;    
SET    
postgres=# explain select count(*) from multi_col ;    
                                          QUERY PLAN                                              
----------------------------------------------------------------------------------------------    
 Aggregate  (cost=5477642.68..5477642.69 rows=1 width=8)    
   ->  Index Only Scan using idx2 on multi_col  (cost=0.43..5452642.68 rows=10000000 width=0)    
(2 rows)

3. 修改算法，也會導致成本計算結果的變化，需要動到PostgreSQL內核costsize.c，或者使用PostgreSQL內核提供的HOOK修改成本的計算結果。

如何讓優化器選擇你想要的path

有三種方法，可以讓優化器最終選哪個path生成plan。

1 修改成本因子

改成本因子，實際上是改node成本的計算結果。從而讓優化器改變最終的選擇。

seq_page_cost (floating point)  -- 全表掃描，掃描單個數據塊的成本    
    
random_page_cost (floating point)  -- 離散掃描(索引掃描)，掃描單個數據塊的成本    
    
parallel_setup_cost (floating point)	-- 多核並行的初始啟動成本（fork worker, alloc shared memory的成本）。    
    
parallel_tuple_cost (floating point)   --  並行worker進程將每條tuple輸送給其他進程的成本。    
    
min_parallel_relation_size (integer)   --  最小的對象大小，隻有超過這個閾值大小的對象才會考慮並行計算，否則不產生並行計算path。    
    
effective_cache_size (integer)    -- 告訴優化器操作係統有多少緩存可以被數據庫使用，越大，越傾向使用索引掃描。    
    
cpu_tuple_cost (floating point)   -- 每條記錄的CPU成本，比如某個NODE需要掃描10000條記錄，那麼需要乘以這個係數得到SUM(CPU TUPLE COST)。    
Sets the planner's estimate of the cost of processing each row during a query.     
The default is 0.01.    
    
cpu_index_tuple_cost (floating point)  -- 索引掃描時，每條索引記錄被掃描到時，消耗的CPU資源。    
Sets the planner's estimate of the cost of processing each index entry during an index scan.     
The default is 0.005.    
    
cpu_operator_cost (floating point)  -- SQL中的每個操作符、函數被調用時，每調用一次需要多少成本。調用多少次取決於函數或操作符的穩定性，穩定性參考本文末尾部分。    
Sets the planner's estimate of the cost of processing each operator or function executed during a query.     
The default is 0.0025.

要生成準確的成本，需要三個因素都準確，1. 成本因子，2. 統計信息，3. 算法。

其中成本因子的校準，可以參考如下文章

《優化器成本因子校對 - PostgreSQL explain cost constants alignment to timestamp》

通過修改成本因子，可以達到修正對應NODE成本的目的。

例子，用成本因子治療文章開頭的例子

測試表

postgres=# \d multi_col   
                       Table "public.multi_col"  
 Column |            Type             | Collation | Nullable | Default   
--------+-----------------------------+-----------+----------+---------  
 c1     | integer                     |           |          |   
 c2     | integer                     |           |          |   
 c3     | timestamp without time zone |           |          |   
 c4     | text                        |           |          |   
Indexes:  
    "idx1" btree (c3 NULLS FIRST)  
    "idx2" btree (c3, c2, c1)

測試SQL1

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from multi_col 
where c3 between '2017-05-06 10:54:38.112188' and '2017-05-28 10:54:38.112188' and c2=1;   
                                                                                 QUERY PLAN                                                                                    
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Index Scan using idx1 on public.multi_col  (cost=0.43..66634.87 rows=187605 width=49) (actual time=0.028..391.223 rows=187767 loops=1)  
   Output: c1, c2, c3, c4  
   Index Cond: ((multi_col.c3 >= '2017-05-06 10:54:38.112188'::timestamp without time zone) AND (multi_col.c3 <= '2017-05-28 10:54:38.112188'::timestamp without time zone))  
   Filter: (multi_col.c2 = 1)    
   -- 使用IDX1需要filter來過濾c2=1這個條件，而使用IDX2不需要這個FILTER
   -- filter需要消耗cpu_tuple_cost，cpu_index_tuple_cost。  
     
   -- 如果使用idx2，可以去掉Filter部分產生的 cpu_tuple_cost的開銷。  
   -- 那麼怎麼讓優化器選擇IDX2呢？  
   Rows Removed by Filter: 1690517  
   Buffers: shared hit=24498  
 Planning time: 0.161 ms  
 Execution time: 402.656 ms  
(8 rows)

如果要讓以上SQL使用IDX2(使用IDX2)，我們隻需要調大cpu_tuple_cost的開銷即可(因為這部分開銷是IDX1產生的，而IDX2不會產生這部分開銷)。

當前設置  
postgres=# show cpu_tuple_cost;  
 cpu_tuple_cost   
----------------  
 0.01  
(1 row)  
  
設大  
postgres=# set cpu_tuple_cost=0.1;  
SET  
  
優化器選擇了IDX2  
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from multi_col 
where c3 between '2017-05-06 10:54:38.112188' and '2017-05-28 10:54:38.112188' and c2=1;   
                                                                                             QUERY PLAN                                                                                               
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Index Scan using idx2 on public.multi_col  (cost=0.43..93471.13 rows=187605 width=49) (actual time=0.021..136.080 rows=187767 loops=1)  
   Output: c1, c2, c3, c4  
   Index Cond: ((multi_col.c3 >= '2017-05-06 10:54:38.112188'::timestamp without time zone) AND (multi_col.c3 <= '2017-05-28 10:54:38.112188'::timestamp without time zone) AND (multi_col.c2 = 1))  
   Buffers: shared hit=26562  
 Planning time: 0.150 ms  
 Execution time: 147.418 ms  
(6 rows)

在設大cpu_tuple_cost之前，為什麼數據庫選擇了IDX1呢，到底什麼導致了IDX2的成本高於IDX1了？

我們看到IDX2比IDX1略大（PAGE數更多），所以離散掃描的成本算進來，導致總成本比IDX2更低了。

例子

postgres=# show random_page_cost ;
 random_page_cost 
------------------
 1
(1 row)

回調cpu_tuple_cost，讓數據庫繼續選擇idx1。
postgres=# set cpu_tuple_cost=0.01;
SET
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from multi_col where c3 between '2017-05-06 10:54:38.112188' and '2017-05-28 10:54:38.112188' and c2=1; 
                                                                                 QUERY PLAN                                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx1 on public.multi_col  (cost=0.43..66634.87 rows=187605 width=49) (actual time=0.046..331.644 rows=187767 loops=1)
   Output: c1, c2, c3, c4
   Index Cond: ((multi_col.c3 >= '2017-05-06 10:54:38.112188'::timestamp without time zone) AND (multi_col.c3 <= '2017-05-28 10:54:38.112188'::timestamp without time zone))
   Filter: (multi_col.c2 = 1)
   Rows Removed by Filter: 1690517
   Buffers: shared hit=24498
 Planning time: 0.220 ms
 Execution time: 343.478 ms
(8 rows)

降低random_page_cost，這會又選擇IDX2了。

postgres=# set random_page_cost =0.1;
SET
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from multi_col where c3 between '2017-05-06 10:54:38.112188' and '2017-05-28 10:54:38.112188' and c2=1; 
                                                                                             QUERY PLAN                                                                                             
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx2 on public.multi_col  (cost=0.43..31412.31 rows=187605 width=49) (actual time=0.022..136.669 rows=187767 loops=1)
   Output: c1, c2, c3, c4
   Index Cond: ((multi_col.c3 >= '2017-05-06 10:54:38.112188'::timestamp without time zone) AND (multi_col.c3 <= '2017-05-28 10:54:38.112188'::timestamp without time zone) AND (multi_col.c2 = 1))
   Buffers: shared hit=26562
 Planning time: 0.147 ms
 Execution time: 148.052 ms
(6 rows)

2 修改開關

通過開關，可以讓優化器避免選擇某些路徑，這些路徑不會被生成，也不會計算成本，最終也不會被選擇。

PostgreSQL支持的開關如下

https://www.postgresql.org/docs/9.6/static/runtime-config-query.html

enable_bitmapscan (boolean)  -- 是否允許bitmapscan    
    
enable_hashagg (boolean)  -- 是否允許hash聚合    
    
enable_hashjoin (boolean)  -- 是否允許HASH JOIN    
    
enable_indexscan (boolean)  -- 是否允許索引掃描    
    
enable_indexonlyscan (boolean)  -- 是否允許index only掃描    
    
enable_material (boolean)  -- 是否允許物化(nestloop內表物化)    
    
enable_mergejoin (boolean)  -- 是否允許歸並JOIN    
    
enable_nestloop (boolean)  -- 是否允許嵌套循環    
    
enable_seqscan (boolean)  -- 是否允許全表掃描    
    
enable_sort (boolean)   -- 是否允許顯示排序（如果關閉的話，告訴優化器能走索引時會盡量走索引排序）    
    
enable_tidscan (boolean)  -- 是否允許使用物理行號掃描

控製是否提升子查詢

from_collapse_limit (integer)  -- 當設置為1時，子查詢將不會被提升，而是對子查詢單獨生成PATH    
    
如果N大於1，則允許提升做多N個子查詢。

控製顯示的JOIN（FULL JOIN除外）是否使用用戶提供的JOIN順序。

join_collapse_limit (integer)  -- 設置為1時，顯示的JOIN（FULL JOIN除外）使用用戶提供的JOIN順序    
    
當大於1時，優化器會對顯示的JOIN（FULL JOIN除外）的JOIN順序進行重排（重排對象數上限=join_collapse_limit），以獲得更好的執行計劃。

其他開關

force_parallel_mode (enum)  -- 強製並行    
    
cursor_tuple_fraction (floating point)  -- 針對遊標查詢的優化，    
設置越小，說明傾向於快速返回第一條記錄。    
設置越大，說明傾向快速返回所有值（總成本趨於更小）。

通過設置這些開關，可以讓優化器使用或者不使用某些path，從而控製最終的執行計劃。

例如把所有的索引掃描，BITMAP SCAN都關掉，會變成全表掃描。

3 hint

通過hint插件（實際上就是HOOK做的），可以強製優化器使用你設定的路徑。

比如告訴優化器，請使用HASH JOIN，或者使用某個索引。

HINT的使用例子如下

《關鍵時刻HINT出彩 - PG優化器的參數優化、執行計劃固化CASE》

《PostgreSQL 特性分析 Plan Hint》

《阿裏雲 PostgreSQL pg_hint_plan插件的用法》

《PostgreSQL SQL HINT的使用(pg_hint_plan)》

其他知識

1 遺傳算法

默認情況下，數據庫多表JOIN時，會使用窮舉法，將所有的JOIN順序排列出來，生成非常多的path。JOIN的表越多，path就越多，導致執行計劃花費較多的時間。

如果想避免窮舉法帶來多表JOIN執行計劃花費過多，

一種方法是使用前麵提到的顯示JOIN以及設置join_collapse_limit，from_collapse_limit=1。

另一種方法是使用遺傳算法，當FROM中的JOIN對象大於閾值，將使用遺傳算法。

遺傳算法請參考

https://www.postgresql.org/docs/9.6/static/runtime-config-query.html

geqo_threshold (integer)    
    
Use genetic query optimization to plan queries with at least this many FROM items involved.     
    
(Note that a FULL OUTER JOIN construct counts as only one FROM item.) The default is 12.     
    
For simpler queries it is usually best to use the regular, exhaustive-search planner,     
but for queries with many tables the exhaustive search takes too long,     
often longer than the penalty of executing a suboptimal plan.     
    
Thus, a threshold on the size of the query is a convenient way to manage use of GEQO.

2 10.0 有什麼高招？

1. 10.0對優化器有一些改造，比如自定義統計維度，比如JOIN循環的優化。

《PostgreSQL 10.0 preview 功能增強 - 自由定義統計信息維度》

https://www.postgresql.org/docs/devel/static/sql-createstatistics.html

自定義列統計信息例子

postgres=# CREATE STATISTICS s1 on (c3,c2,c1) from multi_col ;  
  
postgres=# \d multi_col   
                       Table "public.multi_col"  
 Column |            Type             | Collation | Nullable | Default   
--------+-----------------------------+-----------+----------+---------  
 c1     | integer                     |           |          |   
 c2     | integer                     |           |          |   
 c3     | timestamp without time zone |           |          |   
 c4     | text                        |           |          |   
Indexes:  
    "idx1" btree (c3 NULLS FIRST)  
    "idx2" btree (c3, c2, c1)  
Statistics:  
    "public.s1" WITH (ndistinct, dependencies) ON (c1, c2, c3)  
  
postgres=# select * from pg_statistic_ext ;  
-[ RECORD 1 ]---+--------------------------------------------------------------------------------------------------------------  
starelid        | 22037  
staname         | s1  
stanamespace    | 17307  
staowner        | 10  
stakeys         | 1 2 3  
staenabled      | {d,f}  
standistinct    | [{(b 1 2), 121.000000}, {(b 1 3), 10000000.000000}, {(b 2 3), 10000000.000000}, {(b 1 2 3), 10000000.000000}]  
stadependencies | [{3 => 1 : 1.000000}, {3 => 2 : 1.000000}, {1, 3 => 2 : 1.000000}, {2, 3 => 1 : 1.000000}]  
  
postgres=# select * from pg_stats_ext;  
-[ RECORD 1 ]---------  
schemaname | public  
tablename  | multi_col  
staname    | s1  
attnums    | 1 2 3  
ndistbytes | 78  
depsbytes  | 72

2. 《PostgreSQL 10.0 preview 性能增強 - hash,nestloop join優化(聰明的優化器是這樣的)》

3. 更多詳見10.0的release note

https://www.postgresql.org/docs/devel/static/release-10.html

E.1.3.1.4. Optimizer

Add multi-column optimizer statistics to compute the correlation ratio and number of distinct values (Tomas Vondra, David Rowley, Álvaro Herrera)

New commands are CREATE, ALTER, and DROP STATISTICS. This is helpful in estimating query memory usage and when combining the statistics from individual columns.

Improve planner matching of boolean indexes (Tom Lane)
Improve performance of queries referencing row-level security restrictions (Tom Lane)
The optimizer now has more flexibility in reordering executor behavior.

參考

《索引順序掃描引發的堆掃描IO放大背後的統計學原理與解決辦法》

《優化器成本因子校對 - PostgreSQL explain cost constants alignment to timestamp》

《PostgreSQL 嵌套循環成本估算方法 - nestloop loop cost & cost_material run_cost》

最後更新：2017-05-07 14:31:28

數據庫優化器原理 - 如何治療選擇綜合症

標簽

背景

RBO -> CBO -> 動態優化

優化器如何治療選擇綜合症

node成本如何計算

如何改變node的成本計算結果

如何讓優化器選擇你想要的path

1 修改成本因子

例子，用成本因子治療文章開頭的例子

2 修改開關

3 hint

其他知識

1 遺傳算法

2 10.0 有什麼高招？

參考

上一篇： Keras在ImageNet中的應用：VGGNet, ResNet, Inception以及Xeception

下一篇： 2017修車前拆解圖

相關內容

熱門內容

最新內容