442
汽車大全
Greenplum 類型一致性使用規範 - 索引條件、JOIN的類型一致性限製
標簽
PostgreSQL , Greenplum , 類型一致 , 索引 , inner轉換 , join
背景
在查詢時,有很多用戶會犯渾,發現建立了索引,但是查詢偏偏不走索引。
怎麼不走索引啊?
這裏做容易混淆的是類型的差異,例如字段類型為字符串,但是輸入的是INT類型,這樣就可能不走索引。(除非創建了自動的CAST,自動進行類型轉換)
查詢的輸入類型與索引的類型一致是非常有必要的。
例子
1、建表(使用變長、定長字符串類型),寫入測試數據
create table tbl3(id int, c1 text, c2 varchar(64), c3 char(64));
postgres=# insert into tbl3 select id, md5, md5, md5 from (select id, md5(random()::text) as md5 from generate_series(1,1000000) t(id)) t;
INSERT 0 1000000
2、創建索引
postgres=# create index idx_tbl3_1 on tbl3 (c1);
CREATE INDEX
postgres=# create index idx_tbl3_2 on tbl3 (c2);
CREATE INDEX
postgres=# create index idx_tbl3_3 on tbl3 (c3);
CREATE INDEX
3、結構如下
postgres=# \d+ tbl3
Table "public.tbl3"
Column | Type | Modifiers | Storage | Description
--------+-----------------------+-----------+----------+-------------
id | integer | | plain |
c1 | text | | extended |
c2 | character varying(64) | | extended |
c3 | character(64) | | extended |
Indexes:
"idx_tbl3_1" btree (c1)
"idx_tbl3_2" btree (c2)
"idx_tbl3_3" btree (c3)
Has OIDs: no
Distributed by: (id)
4、走索引,強製轉換為bpchar。
postgres=# explain analyze select * from tbl3 where c3='abc';
QUERY PLAN
------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..201.23 rows=1 width=135)
Rows out: 0 rows at destination with 3.081 ms to end, start offset by 0.294 ms.
-> Index Scan using idx_tbl3_3 on tbl3 (cost=0.00..201.23 rows=1 width=135)
Index Cond: c3 = 'abc'::bpchar
Rows out: 0 rows (seg0) with 0.138 ms to end, start offset by 3.211 ms.
Slice statistics:
(slice0) Executor memory: 147K bytes.
(slice1) Executor memory: 145K bytes avg x 3 workers, 145K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Total runtime: 3.484 ms
(11 rows)
5、不走索引,因為輸入的為text變長,與char定長不一致。
postgres=# explain analyze select * from tbl3 where c3='abc'::text;
QUERY PLAN
-------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..20032.60 rows=1001 width=135)
Rows out: 0 rows at destination with 305 ms to end, start offset by 0.335 ms.
-> Seq Scan on tbl3 (cost=0.00..20032.60 rows=334 width=135)
Filter: c3::text = 'abc'::text
Rows out: 0 rows (seg0) with 302 ms to end, start offset by 4.023 ms.
Slice statistics:
(slice0) Executor memory: 147K bytes.
(slice1) Executor memory: 195K bytes avg x 3 workers, 195K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Total runtime: 305.985 ms
(11 rows)
6、走索引,輸入的類型為變長類型,字段也是變長類型,與TEXT兼容。
postgres=# explain analyze select * from tbl3 where c2='abc'::text;
QUERY PLAN
------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..200.86 rows=1 width=135)
Rows out: 0 rows at destination with 2.744 ms to end, start offset by 0.275 ms.
-> Index Scan using idx_tbl3_2 on tbl3 (cost=0.00..200.86 rows=1 width=135)
Index Cond: c2::text = 'abc'::text
Rows out: 0 rows (seg0) with 0.031 ms to end, start offset by 2.868 ms.
Slice statistics:
(slice0) Executor memory: 147K bytes.
(slice1) Executor memory: 145K bytes avg x 3 workers, 145K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Total runtime: 3.120 ms
(11 rows)
7、不走索引,輸入的類型為定長類型,但是字段為變長類型。
postgres=# explain analyze select * from tbl3 where c2='abc'::bpchar;
QUERY PLAN
-----------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..17531.66 rows=1 width=135)
Rows out: 0 rows at destination with 60 ms to end, start offset by 0.276 ms.
-> Seq Scan on tbl3 (cost=0.00..17531.66 rows=1 width=135)
Filter: c2::bpchar = 'abc'::bpchar
Rows out: 0 rows (seg0) with 57 ms to end, start offset by 2.864 ms.
Slice statistics:
(slice0) Executor memory: 147K bytes.
(slice1) Executor memory: 131K bytes avg x 3 workers, 131K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Total runtime: 60.320 ms
(11 rows)
8、內表不走索引,因為JOIN字段類型不匹配。
postgres=# explain select count(*) from tbl3 t1 join tbl3 t2 on (t1.c1=t2.c3) and t1.c1='abc';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Aggregate (cost=25235.81..25235.82 rows=1 width=8)
-> Gather Motion 3:1 (slice2; segments: 3) (cost=25235.74..25235.79 rows=1 width=8)
-> Aggregate (cost=25235.74..25235.75 rows=1 width=8)
-> Nested Loop (cost=0.00..25235.66 rows=11 width=0)
-> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.00..200.88 rows=1 width=33)
-> Index Scan using idx_tbl3_1 on tbl3 t1 (cost=0.00..200.86 rows=1 width=33)
Index Cond: c1 = 'abc'::text AND 'abc'::text = c1
-> Seq Scan on tbl3 t2 (cost=0.00..25034.46 rows=11 width=65)
Filter: 'abc'::text = c3::text AND c3::text = 'abc'::text
(9 rows)
9、內表走索引,因為JOIN字段類型都是變長,匹配。
postgres=# explain select count(*) from tbl3 t1 join tbl3 t2 on (t1.c1=t2.c2) and t1.c1='abc';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=401.86..401.87 rows=1 width=8)
-> Gather Motion 3:1 (slice2; segments: 3) (cost=401.80..401.85 rows=1 width=8)
-> Aggregate (cost=401.80..401.81 rows=1 width=8)
-> Nested Loop (cost=200.88..401.79 rows=2 width=0)
-> Index Scan using idx_tbl3_1 on tbl3 t1 (cost=0.00..200.86 rows=1 width=33)
Index Cond: c1 = 'abc'::text AND 'abc'::text = c1
-> Materialize (cost=200.88..200.89 rows=1 width=33)
-> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.00..200.88 rows=1 width=33)
-> Index Scan using idx_tbl3_2 on tbl3 t2 (cost=0.00..200.86 rows=1 width=33)
Index Cond: 'abc'::text = c2::text AND c2::text = 'abc'::text
(10 rows)
分析
實際上就是要求索引的表達式與條件的表達式一致。
例子,使用表達式,強製轉換為變長類型。
postgres=# create index idx_tbl3_4 on tbl3 ((c3::text));
CREATE INDEX
查詢時,隻要表達式的類型與條件類型匹配即可,走索引,同時會使用Implicit轉換。
postgres=# explain analyze select * from tbl3 where (c3::text)='abc';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=108.60..52793.25 rows=1001 width=135)
Rows out: 0 rows at destination with 3.796 ms to end, start offset by 0.309 ms.
-> Bitmap Heap Scan on tbl3 (cost=108.60..52793.25 rows=334 width=135)
Recheck Cond: c3::text = 'abc'::text
Rows out: 0 rows (seg0) with 0.163 ms to end, start offset by 3.095 ms.
-> Bitmap Index Scan on idx_tbl3_4 (cost=0.00..108.35 rows=334 width=0)
Index Cond: c3::text = 'abc'::text
Bitmaps out: Avg 1.0 x 3 workers. Max 1 (seg0) with 0.135 ms to end, start offset by 3.119 ms.
Slice statistics:
(slice0) Executor memory: 155K bytes.
(slice1) Executor memory: 321K bytes avg x 3 workers, 321K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_seqscan=off
Total runtime: 4.219 ms
(15 rows)
postgres=# explain analyze select * from tbl3 where (c3::text) = '123'::int;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=108.60..52793.25 rows=1001 width=135)
Rows out: 0 rows at destination with 2.926 ms to end, start offset by 0.298 ms.
-> Bitmap Heap Scan on tbl3 (cost=108.60..52793.25 rows=334 width=135)
Recheck Cond: c3::text = '123'::text
Rows out: 0 rows (seg0) with 0.110 ms to end, start offset by 3.057 ms.
-> Bitmap Index Scan on idx_tbl3_4 (cost=0.00..108.35 rows=334 width=0)
Index Cond: c3::text = '123'::text
Bitmaps out: Avg 1.0 x 3 workers. Max 1 (seg0) with 0.064 ms to end, start offset by 3.095 ms.
Slice statistics:
(slice0) Executor memory: 155K bytes.
(slice1) Executor memory: 289K bytes avg x 3 workers, 289K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_seqscan=off
Total runtime: 3.327 ms
(15 rows)
postgres=# \dC
List of casts
Source type | Target type | Function | Implicit?
-----------------------------+-----------------------------+--------------------+---------------
abstime | date | date | in assignment
abstime | integer | (binary coercible) | no
abstime | timestamp without time zone | timestamp | yes
abstime | timestamp with time zone | timestamptz | yes
abstime | time without time zone | time | in assignment
bigint | bit | bit | no
bigint | character | text | in assignment
bigint | character varying | text | in assignment
bigint | double precision | float8 | yes
bigint | integer | int4 | in assignment
bigint | numeric | numeric | yes
bigint | oid | oid | yes
bigint | real | float4 | yes
...............
小結
要讓索引起作用,最起碼應該關注一下幾點。
1、注意變長字符串和定長字符串,屬於兩種數據類型,不要混淆。
2、JOIN時操作符兩邊的字段類型一致。
3、查詢時,WHERE條件與索引表達式一致。
4、查詢時,WHERE條件與索引字段的類型一致。
5、查詢時,WHERE條件的collate與索引的collate一致。(本地化相關)
6、操作符本身支持對應的索引訪問方法。例如大多數的=操作符,支持b-tree的索引訪問方法。
7、PG是CBO優化,因此索引訪問的成本更低時,才會選擇索引掃描。(設置了特定的優化器開關,或者使用了HINT除外)。
參考
最後更新:2017-10-28 23:34:34