閱讀774 返回首頁    go 小米 go MIUI米柚


PostgreSQL 10.0 preview 性能增強 - hash,nestloop join優化(聰明的優化器是這樣的)

標簽

PostgreSQL , 10.0 , nestloop , hash join


背景

兩張表JOIN時,如果內表的JOIN字段確定是唯一的,那麼在嵌套循環時,如果外表有重複值,循環過程中,對於內表來說,一個VALUE隻需要掃描一次。

hash join同樣適用。

例子

postgres=# create table intbl(id int);  
CREATE TABLE  
postgres=# create unique index idx_intbl on intbl(id);  
CREATE INDEX  
postgres=# insert into intbl select generate_series(1,1000000);     
INSERT 0 1000000  
postgres=# create table out(id int);  
CREATE TABLE  
postgres=# insert into out select 1 from generate_series(1,1000);   
-- 對於外表的1000個1, 內表scan一次命中後,同一個值不需要再次scan內表  
INSERT 0 1000  
postgres=# set enable_hashjoin =off;  
SET  
postgres=# set enable_mergejoin =off;  
SET  
postgres=# set enable_material =off;  
SET  

9.6

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from out,intbl where out.id=intbl.id;  
                                                              QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------  
 Nested Loop  (cost=0.42..2736.00 rows=1000 width=8) (actual time=0.033..1.965 rows=1000 loops=1)  
   Output: "out".id, intbl.id  
   Buffers: shared hit=4005  
   ->  Seq Scan on public."out"  (cost=0.00..15.00 rows=1000 width=4) (actual time=0.013..0.101 rows=1000 loops=1)  
         Output: "out".id  
         Buffers: shared hit=5  
   ->  Index Only Scan using idx_intbl on public.intbl  (cost=0.42..2.71 rows=1 width=4) (actual time=0.001..0.002 rows=1 loops=1000)  
         Output: intbl.id  
         Index Cond: (intbl.id = "out".id)  
         Heap Fetches: 1000  
         Buffers: shared hit=4000  
 Planning time: 0.109 ms  
 Execution time: 2.048 ms  
(13 rows)  

10.0

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from out,intbl where out.id=intbl.id;  
                                                              QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------  
 Nested Loop  (cost=0.42..2202.50 rows=1000 width=8) (actual time=0.035..1.803 rows=1000 loops=1)  
   Output: "out".id, intbl.id  
   Inner Unique: true  
   Buffers: shared hit=4005  
   ->  Seq Scan on public."out"  (cost=0.00..15.00 rows=1000 width=4) (actual time=0.013..0.106 rows=1000 loops=1)  
         Output: "out".id  
         Buffers: shared hit=5  
   ->  Index Only Scan using idx_intbl on public.intbl  (cost=0.42..2.19 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1000)  
         Output: intbl.id  
         Index Cond: (intbl.id = "out".id)  
         Heap Fetches: 1000  
         Buffers: shared hit=4000  
 Planning time: 0.122 ms  
 Execution time: 1.887 ms  
(14 rows)  

patch如下

Optimize joins when the inner relation can be proven unique.  
  
author	Tom Lane <tgl@sss.pgh.pa.us>	  
Sat, 8 Apr 2017 10:20:03 +0800 (22:20 -0400)  
committer	Tom Lane <tgl@sss.pgh.pa.us>	  
Sat, 8 Apr 2017 10:20:13 +0800 (22:20 -0400)  
commit	9c7f5229ad68d7e0e4dd149e3f80257893e404d4  
tree	0a167d403952550f43941b01b24ed5e7526c5351	tree | snapshot  
parent	f13a9121f9822eafe05cc3178bf046155a248173	commit | diff  
Optimize joins when the inner relation can be proven unique.  
  
If there can certainly be no more than one matching inner row for a given  
outer row, then the executor can move on to the next outer row as soon as  
it's found one match; there's no need to continue scanning the inner  
relation for this outer row.  This saves useless scanning in nestloop  
and hash joins.  In merge joins, it offers the opportunity to skip  
mark/restore processing, because we know we have not advanced past the  
first possible match for the next outer row.  
  
Of course, the devil is in the details: the proof of uniqueness must  
depend only on joinquals (not otherquals), and if we want to skip  
mergejoin mark/restore then it must depend only on merge clauses.  
To avoid adding more planning overhead than absolutely necessary,  
the present patch errs in the conservative direction: there are cases  
where inner_unique or skip_mark_restore processing could be used, but  
it will not do so because it's not sure that the uniqueness proof  
depended only on "safe" clauses.  This could be improved later.  
  
David Rowley, reviewed and rather heavily editorialized on by me  
  
Discussion: https://postgr.es/m/CAApHDvqF6Sw-TK98bW48TdtFJ+3a7D2mFyZ7++=D-RyPsL76gw@mail.gmail.com  

這個patch的討論,詳見郵件組,本文末尾URL。

PostgreSQL社區的作風非常嚴謹,一個patch可能在郵件組中討論幾個月甚至幾年,根據大家的意見反複的修正,patch合並到master已經非常成熟,所以PostgreSQL的穩定性也是遠近聞名的。

參考

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9c7f5229ad68d7e0e4dd149e3f80257893e404d4

最後更新:2017-04-22 17:01:48

  上一篇:go PostgreSQL 10.0 preview 功能增強 - 串行隔離級別 預加鎖閾值可控
  下一篇:go PostgreSQL 10.0 preview 性能增強 - 支持64bit atomic