PostgreSQL 10.0 preview 性能增強 - hash,nestloop join優化(聰明的優化器是這樣的)
標簽
PostgreSQL , 10.0 , nestloop , hash join
背景
兩張表JOIN時,如果內表的JOIN字段確定是唯一的,那麼在嵌套循環時,如果外表有重複值,循環過程中,對於內表來說,一個VALUE隻需要掃描一次。
hash join同樣適用。
例子
postgres=# create table intbl(id int);
CREATE TABLE
postgres=# create unique index idx_intbl on intbl(id);
CREATE INDEX
postgres=# insert into intbl select generate_series(1,1000000);
INSERT 0 1000000
postgres=# create table out(id int);
CREATE TABLE
postgres=# insert into out select 1 from generate_series(1,1000);
-- 對於外表的1000個1, 內表scan一次命中後,同一個值不需要再次scan內表
INSERT 0 1000
postgres=# set enable_hashjoin =off;
SET
postgres=# set enable_mergejoin =off;
SET
postgres=# set enable_material =off;
SET
9.6
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from out,intbl where out.id=intbl.id;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.42..2736.00 rows=1000 width=8) (actual time=0.033..1.965 rows=1000 loops=1)
Output: "out".id, intbl.id
Buffers: shared hit=4005
-> Seq Scan on public."out" (cost=0.00..15.00 rows=1000 width=4) (actual time=0.013..0.101 rows=1000 loops=1)
Output: "out".id
Buffers: shared hit=5
-> Index Only Scan using idx_intbl on public.intbl (cost=0.42..2.71 rows=1 width=4) (actual time=0.001..0.002 rows=1 loops=1000)
Output: intbl.id
Index Cond: (intbl.id = "out".id)
Heap Fetches: 1000
Buffers: shared hit=4000
Planning time: 0.109 ms
Execution time: 2.048 ms
(13 rows)
10.0
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from out,intbl where out.id=intbl.id;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.42..2202.50 rows=1000 width=8) (actual time=0.035..1.803 rows=1000 loops=1)
Output: "out".id, intbl.id
Inner Unique: true
Buffers: shared hit=4005
-> Seq Scan on public."out" (cost=0.00..15.00 rows=1000 width=4) (actual time=0.013..0.106 rows=1000 loops=1)
Output: "out".id
Buffers: shared hit=5
-> Index Only Scan using idx_intbl on public.intbl (cost=0.42..2.19 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1000)
Output: intbl.id
Index Cond: (intbl.id = "out".id)
Heap Fetches: 1000
Buffers: shared hit=4000
Planning time: 0.122 ms
Execution time: 1.887 ms
(14 rows)
patch如下
Optimize joins when the inner relation can be proven unique.
author Tom Lane <tgl@sss.pgh.pa.us>
Sat, 8 Apr 2017 10:20:03 +0800 (22:20 -0400)
committer Tom Lane <tgl@sss.pgh.pa.us>
Sat, 8 Apr 2017 10:20:13 +0800 (22:20 -0400)
commit 9c7f5229ad68d7e0e4dd149e3f80257893e404d4
tree 0a167d403952550f43941b01b24ed5e7526c5351 tree | snapshot
parent f13a9121f9822eafe05cc3178bf046155a248173 commit | diff
Optimize joins when the inner relation can be proven unique.
If there can certainly be no more than one matching inner row for a given
outer row, then the executor can move on to the next outer row as soon as
it's found one match; there's no need to continue scanning the inner
relation for this outer row. This saves useless scanning in nestloop
and hash joins. In merge joins, it offers the opportunity to skip
mark/restore processing, because we know we have not advanced past the
first possible match for the next outer row.
Of course, the devil is in the details: the proof of uniqueness must
depend only on joinquals (not otherquals), and if we want to skip
mergejoin mark/restore then it must depend only on merge clauses.
To avoid adding more planning overhead than absolutely necessary,
the present patch errs in the conservative direction: there are cases
where inner_unique or skip_mark_restore processing could be used, but
it will not do so because it's not sure that the uniqueness proof
depended only on "safe" clauses. This could be improved later.
David Rowley, reviewed and rather heavily editorialized on by me
Discussion: https://postgr.es/m/CAApHDvqF6Sw-TK98bW48TdtFJ+3a7D2mFyZ7++=D-RyPsL76gw@mail.gmail.com
這個patch的討論,詳見郵件組,本文末尾URL。
PostgreSQL社區的作風非常嚴謹,一個patch可能在郵件組中討論幾個月甚至幾年,根據大家的意見反複的修正,patch合並到master已經非常成熟,所以PostgreSQL的穩定性也是遠近聞名的。
參考
最後更新:2017-04-22 17:01:48