PostgreSQL 單列多條件查詢優化 - 多個多邊形查詢4倍提升的技巧

標簽

PostgreSQL , PostGIS , 多邊形合並 , 條件合並 , 菜鳥

背景

在某些空間數據查詢需求中，有一些這樣的請求，例如查找與某些多邊形中的任意一個相交的空間對象。

比如在菜鳥、新零售的業務中，查詢某幾個商場多邊形，或者某幾個小區多邊形內覆蓋的點。

SQL寫法可能是這樣的

select geo_point,* from table where ST_Within(geo_point, polygon_1) or ST_Within(geo_point, polygon_2) or ... ST_Within(geo_point, polygon_n);

PostgreSQL支持空間索引，同時支持bitmapAnd, bitmapOr index scan。也就是說隻要geo_point字段有索引，不管多少個查詢條件，都可以走index scan。

這個查詢有什麼優化空間麼？

在講這個空間優化前，我們來看另一個例子。

單列組合條件查詢

單列組合條件查詢與前麵提到的需求類似，即一個字段，多個查詢條件。是不是類似於一個字段，多個多邊形匹配呢？

表結構如下

postgres=# \d+ a  
                                                Table "public.a"  
  Column  |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description   
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------  
 id       | integer                     |           | not null |         | plain    |              |   
 info     | text                        |           |          |         | extended |              |   
 crt_time | timestamp without time zone |           |          |         | plain    |              |   
Indexes:  
    "a_pkey" PRIMARY KEY, btree (id)

查詢需求如下，ID字段上有4個條件，任意條件滿足即返回結果。

PostgreSQL根據統計信息，基於CBO成本優化，采用了bitmapOr，重複使用了多次單列索引，合並所有條件對應的數據塊。最後進行一次recheck得到所要的結果。

bitmapAnd,bitmapOr是PostgreSQL數據庫獨有的特性，可以在多個查詢條件的組合查詢中使用多個索引進行數據塊的合並和消除掃描，非常讚。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from a where id <100 or id<200 or id<300 or id between 10 and 100 or id between 100 and 30000;  
                                                                QUERY PLAN                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.a  (cost=39.85..3118.98 rows=2610 width=44) (actual time=5.099..15.355 rows=3154 loops=1)  
   Output: id, info, crt_time  
   Recheck Cond: ((a.id < 100) OR (a.id < 200) OR (a.id < 300) OR ((a.id >= 10) AND (a.id <= 100)) OR ((a.id >= 100) AND (a.id <= 30000)))  
   Heap Blocks: exact=38  
   Buffers: shared hit=16 read=40  
   ->  BitmapOr  (cost=39.85..39.85 rows=2610 width=0) (actual time=5.089..5.089 rows=0 loops=1)  
         Buffers: shared hit=15 read=3  
         ->  Bitmap Index Scan on a_pkey  (cost=0.00..1.70 rows=8 width=0) (actual time=0.003..0.003 rows=1 loops=1)  
               Index Cond: (a.id < 100)  
               Buffers: shared hit=3  
         ->  Bitmap Index Scan on a_pkey  (cost=0.00..1.77 rows=17 width=0) (actual time=0.001..0.001 rows=1 loops=1)  
               Index Cond: (a.id < 200)  
               Buffers: shared hit=3  
         ->  Bitmap Index Scan on a_pkey  (cost=0.00..1.84 rows=26 width=0) (actual time=0.001..0.001 rows=1 loops=1)  
               Index Cond: (a.id < 300)  
               Buffers: shared hit=3  
         ->  Bitmap Index Scan on a_pkey  (cost=0.00..1.72 rows=8 width=0) (actual time=0.004..0.004 rows=0 loops=1)  
               Index Cond: ((a.id >= 10) AND (a.id <= 100))  
               Buffers: shared hit=3  
         ->  Bitmap Index Scan on a_pkey  (cost=0.00..29.55 rows=2551 width=0) (actual time=5.079..5.079 rows=3153 loops=1)  
               Index Cond: ((a.id >= 100) AND (a.id <= 30000))  
               Buffers: shared hit=3 read=3  
 Planning time: 0.251 ms  
 Execution time: 15.699 ms  
(24 rows)

而實際上這4個條件，我們可以在邏輯上合成一個條件。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from a where id <= 30000;  
                                                      QUERY PLAN                                                        
----------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.a  (cost=23.88..3006.82 rows=2560 width=44) (actual time=0.110..0.568 rows=3154 loops=1)  
   Output: id, info, crt_time  
   Recheck Cond: (a.id <= 30000)  
   Heap Blocks: exact=38  
   Buffers: shared hit=44  
   ->  Bitmap Index Scan on a_pkey  (cost=0.00..23.24 rows=2560 width=0) (actual time=0.102..0.102 rows=3154 loops=1)  
         Index Cond: (a.id <= 30000)  
         Buffers: shared hit=6  
 Planning time: 0.127 ms  
 Execution time: 0.835 ms  
(10 rows)

合成後，走精確索引掃描，性能提升幾十倍。

實際上，空間類型的數據，也是一樣的，而且合成起來更方便，通過st_union即可。

空間合成優化

select geo_point,* from table where ST_Within(geo_point, polygon_1) or ST_Within(geo_point, polygon_2) or ... ST_Within(geo_point, polygon_n);

優化為

select geo_point,* from table where ST_Within(geo_point, st_union(polygon_1,polygon_2,...polygon_n) );

性能提升也是非常明顯的。

參考

https://postgis.net/docs/manual-2.3/ST_Union.html

https://www.postgresql.org/docs/9.6/static/indexes-bitmap-scans.html

最後更新：2017-08-20 17:06:21

PostgreSQL 單列多條件查詢優化 - 多個多邊形查詢4倍提升的技巧

標簽

背景

單列組合條件查詢

空間合成優化

參考

上一篇：阿裏雲HybridDB for PG實踐 - 行存、列存，堆表、AO表的原理和選擇

下一篇：手工清理 cgroup中的page cache

相關內容

熱門內容

最新內容

PostgreSQL 單列多條件查詢優化 - 多個多邊形查詢4倍提升的技巧

標簽

背景

單列組合條件查詢

空間合成優化

參考

上一篇： 阿裏雲HybridDB for PG實踐 - 行存、列存，堆表、AO表的原理和選擇

下一篇： 手工清理 cgroup中的page cache

相關內容

熱門內容

最新內容

上一篇：阿裏雲HybridDB for PG實踐 - 行存、列存，堆表、AO表的原理和選擇

下一篇：手工清理 cgroup中的page cache