閱讀918 返回首頁    go 阿裏雲 go 技術社區[雲棲]


PostgreSQL 空間st_contains,st_within空間包含搜索優化 - 降IO和降CPU(bound box)

標簽

PostgreSQL , st_contains , st_within , 空間包含 , 空間bound box , GiST索引 , 空間索引結構 , IO放大 , BOUND BOX放大


背景

點麵判斷、按麵圈選點或其他對象,是GIS幾何應用中非常典型的需求。

在PostgreSQL中通過建立GiST索引可以加速這類判斷,然而,建立索引就夠了嗎?

很多時候建立索引是不夠的,性能沒有到達巔峰,如果要更低的延遲,更少的CPU開銷,還有什麼優化手段呢?

實際上我以前寫過一篇類似的文章,講的是BTree索引訪問的優化,當數據存放與索引順序的線性相關性很差時,引入了一個問題,訪問時IO放大:

《索引順序掃描引發的堆掃描IO放大背後的統計學原理與解決辦法 - PostgreSQL index scan enlarge heap page scans when index and column correlation small.》

原理和解決辦法上麵的文檔已經講得很清楚了。對於空間索引也有類似的問題和優化方法。但是首先你需要了解空間索引的構造:

《通過空間思想理解GiST索引的構造》

然後你可以通過空間聚集,來降低空間掃描的IO。

《PostgreSQL 黑科技 - 空間聚集存儲》

下麵以一個搜索為例,講解空間包含搜索的優化方法:

在表中有1000萬空間對象數據,查詢某個多邊形覆蓋到的空間對象。這個查詢有一個特點,這個多邊形是一個長條條的多邊形,包含這個多邊形的BOUND BOX是比較大的。

pic

構建這個多邊形的方法

postgres=# select st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)')), 4326);  
    st_setsrid             
----------------------------  
 0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000  
(1 row)  

優化手段1 - 空間聚集

1、建表

postgres=# create table e(id int8, pos geometry);  
CREATE TABLE  

2、寫入空間測試數據(1000萬個隨機點,覆蓋 +-50 的經緯度區間)

postgres=# insert into e select id, st_setsrid(st_makepoint(50-random()*100, 50-random()*100), 4326) from generate_series(1,10000000) t(id);  
INSERT 0 10000000  

3、創建空間索引

postgres=# create index idx_e on e using gist(pos);  
CREATE INDEX  

4、查詢滿足這個多邊形的BOUND BOX覆蓋的對象的BOUND BOX條件的對象。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from e where pos @ st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)')), 4326);  
   QUERY PLAN                                                                                                                                                                                  
-----------------------  
 Index Scan using idx_e on public.e  (cost=0.42..12526.72 rows=10000 width=40) (actual time=0.091..39.449 rows=35081 loops=1)  
   Output: id, pos  
   Index Cond: (e.pos @ '0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry)  
   Buffers: shared hit=35323  
 Planning time: 0.108 ms  
 Execution time: 41.222 ms  
(6 rows)  

搜索了35323個數據塊,返回了35081條記錄。

5、查詢被這個多邊形包含的對象。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from e where st_contains(st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)')), 4326), pos);    
   QUERY PLAN                                                                                                                                                                                  
-----------------------  
 Index Scan using idx_e on public.e  (cost=0.42..15026.72 rows=3333 width=40) (actual time=0.077..49.015 rows=8491 loops=1)  
   Output: id, pos  
   Index Cond: ('0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry ~ e.pos)  
   Filter: _st_contains('0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry, e.pos)  
   Rows Removed by Filter: 26590  
   Buffers: shared hit=35323  
 Planning time: 0.085 ms  
 Execution time: 49.460 ms  
(8 rows)  

搜索了35323個數據塊,搜索了35081條記錄,返回了8491條記錄,過濾了26590條不滿足條件的記錄。

5和4的查詢差異是BOUND BOX包含、實際的輪廓包含。索引的基礎是bound box。在以下文檔中我們也可以學習到這個原理。

《通過空間思想理解GiST索引的構造》

我們看到,複合條件的記錄並不多,但是搜索了很多數據塊,通過空間聚集可以減少數據塊的掃描。

6、創建另一張表,按空間聚集,調整數據存儲順序。並建立空間索引。

postgres=# create table f(like e);  
CREATE TABLE  
  
postgres=# insert into f select * from e order by st_geohash(pos,15);  
INSERT 0 10000000  
  
postgres=# create index idx_f on f using gist(pos);  
CREATE INDEX  

7、優化後:

查詢滿足這個多邊形的BOUND BOX覆蓋的對象的BOUND BOX條件的對象。從掃描35323個數據塊降低到了訪問1648個數據塊。質的飛躍。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from f where pos @ st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)')), 4326);  
   QUERY PLAN                                                                                                                                                                                  
-----------------------  
 Index Scan using idx_f on public.f  (cost=0.42..12526.72 rows=10000 width=40) (actual time=0.081..9.702 rows=35081 loops=1)  
   Output: id, pos  
   Index Cond: (f.pos @ '0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry)  
   Buffers: shared hit=1648  
 Planning time: 0.096 ms  
 Execution time: 11.404 ms  
(6 rows)  

8、優化後:

查詢被這個多邊形包含的對象。從掃描35323個數據塊降低到了訪問1648個數據塊。質的飛躍。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from f where st_contains(st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)')), 4326), pos);  
   QUERY PLAN                         
-----------------------  
 Index Scan using idx_f on public.f  (cost=0.42..15026.72 rows=3333 width=40) (actual time=1.216..32.398 rows=8491 loops=1)  
   Output: id, pos  
   Index Cond: ('0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry ~ f.pos)  
   Filter: _st_contains('0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry, f.pos)  
   Rows Removed by Filter: 26590  
   Buffers: shared hit=1648  
 Planning time: 0.101 ms  
 Execution time: 32.837 ms  
(8 rows)  

使用空間聚集,從掃描35323個數據塊降低到了訪問1648個數據塊。質的飛躍。

優化手段2 - 空間分裂查詢

空間聚集的優化手段,解決了IO放大的問題,另一個優化點和空間索引的結構有關,是BOUND BOX放大的問題。

從本文的例子中,我們也看到了,空間索引實際上是針對bound box的,所以在有效麵積占比較低時,可能圈選到多數無效數據,導致IO和CPU同時放大,我們就來解決它。

pic

下圖虛線部分包含的區間就是這個長條條的BOUND BOX。目前數據庫在使用GiST索引查詢滿足這個多邊形包含的POS的條件時,會將落在這個BOUND BOX中的對象都弄出來。

pic

優化思路:

將這個多邊形,拆成4個BOX,完全杜絕bound box放大的問題。

explain (analyze,verbose,timing,costs,buffers) select * from f where   
  st_contains(st_setsrid(st_makebox2d(st_makepoint(0,0), st_makepoint(1,3)), 4326), pos)  
  or  
  st_contains(st_setsrid(st_makebox2d(st_makepoint(1,2.5), st_makepoint(5,3)), 4326), pos)  
  or  
  st_contains(st_setsrid(st_makebox2d(st_makepoint(5,2.5), st_makepoint(6,5)), 4326), pos)  
  or  
  st_contains(st_setsrid(st_makebox2d(st_makepoint(6,4), st_makepoint(7,5)), 4326), pos);  
  
explain (analyze,verbose,timing,costs,buffers) select * from f where   
  pos @ st_setsrid(st_makebox2d(st_makepoint(0,0), st_makepoint(1,3)), 4326)  
  or  
  pos @ st_setsrid(st_makebox2d(st_makepoint(1,2.5), st_makepoint(5,3)), 4326)  
  or  
  pos @ st_setsrid(st_makebox2d(st_makepoint(5,2.5), st_makepoint(6,5)), 4326)  
  or  
  pos @ st_setsrid(st_makebox2d(st_makepoint(6,4), st_makepoint(7,5)), 4326);  

1、組合1和2的優化手段後:

查詢滿足這個多邊形的BOUND BOX覆蓋的對象的BOUND BOX條件的對象。從掃描1648個數據塊降低到了訪問243個數據塊。質的飛躍。

explain (analyze,verbose,timing,costs,buffers) select * from f where   
  pos @ st_setsrid(st_makebox2d(st_makepoint(0,0), st_makepoint(1,3)), 4326)  
  or  
  pos @ st_setsrid(st_makebox2d(st_makepoint(1,2.5), st_makepoint(5,3)), 4326)  
  or  
  pos @ st_setsrid(st_makebox2d(st_makepoint(5,2.5), st_makepoint(6,5)), 4326)  
  or  
  pos @ st_setsrid(st_makebox2d(st_makepoint(6,4), st_makepoint(7,5)), 4326);  
   QUERY PLAN                         
-----------------------  
 Bitmap Heap Scan on public.f  (cost=10000000690.01..10000037405.46 rows=39940 width=40) (actual time=1.502..2.329 rows=8491 loops=1)  
   Output: id, pos  
   Recheck Cond: ((f.pos @ '0103000020E610000001000000050000000000000000000000000000000000000000000000000000000000000000000840000000000000F03F0000000000000840000000000000F03F000000000000000000000000000000000000000000000000'::geometry) OR (f.pos @ '0103000020E61000000100000005000000000000000000F03F0000000000000440000000000000F03F00000000000008400000000000001440000000000000084000000000000014400000000000000440000000000000F03F0000000000000440'::geometry) OR (f.pos @ '0103000020E610000001000000050000000000000000001440000000000000044000000000000014400000000000001440000000000000184000000000000014400000000000001840000000000000044000000000000014400000000000000440'::geometry) OR (f.pos @ '0103000020E6100000010000000500000000000000000018400000000000001040000000000000184000000000000014400000000000001C4000000000000014400000000000001C40000000000000104000000000000018400000000000001040'::geometry))  
   Heap Blocks: exact=119  
   Buffers: shared hit=243  
   ->  BitmapOr  (cost=690.01..690.01 rows=40000 width=0) (actual time=1.483..1.483 rows=0 loops=1)  
         Buffers: shared hit=124  
         ->  Bitmap Index Scan on idx_f  (cost=0.00..162.52 rows=10000 width=0) (actual time=0.461..0.461 rows=3077 loops=1)  
               Index Cond: (f.pos @ '0103000020E610000001000000050000000000000000000000000000000000000000000000000000000000000000000840000000000000F03F0000000000000840000000000000F03F000000000000000000000000000000000000000000000000'::geometry)  
               Buffers: shared hit=37  
         ->  Bitmap Index Scan on idx_f  (cost=0.00..162.52 rows=10000 width=0) (actual time=0.423..0.423 rows=1991 loops=1)  
               Index Cond: (f.pos @ '0103000020E61000000100000005000000000000000000F03F0000000000000440000000000000F03F00000000000008400000000000001440000000000000084000000000000014400000000000000440000000000000F03F0000000000000440'::geometry)  
               Buffers: shared hit=33  
         ->  Bitmap Index Scan on idx_f  (cost=0.00..162.52 rows=10000 width=0) (actual time=0.366..0.366 rows=2435 loops=1)  
               Index Cond: (f.pos @ '0103000020E610000001000000050000000000000000001440000000000000044000000000000014400000000000001440000000000000184000000000000014400000000000001840000000000000044000000000000014400000000000000440'::geometry)  
               Buffers: shared hit=31  
         ->  Bitmap Index Scan on idx_f  (cost=0.00..162.52 rows=10000 width=0) (actual time=0.232..0.232 rows=988 loops=1)  
               Index Cond: (f.pos @ '0103000020E6100000010000000500000000000000000018400000000000001040000000000000184000000000000014400000000000001C4000000000000014400000000000001C40000000000000104000000000000018400000000000001040'::geometry)  
               Buffers: shared hit=23  
 Planning time: 0.104 ms  
 Execution time: 2.751 ms  
(21 rows)  

2、組合1和2的優化手段後:

查詢被這個多邊形包含的對象。從掃描1648個數據塊降低到了訪問243個數據塊。質的飛躍。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from f where   
  st_contains(st_setsrid(st_makebox2d(st_makepoint(0,0), st_makepoint(1,3)), 4326), pos)  
  or  
  st_contains(st_setsrid(st_makebox2d(st_makepoint(1,2.5), st_makepoint(5,3)), 4326), pos)  
  or  
  st_contains(st_setsrid(st_makebox2d(st_makepoint(5,2.5), st_makepoint(6,5)), 4326), pos)  
  or  
  st_contains(st_setsrid(st_makebox2d(st_makepoint(6,4), st_makepoint(7,5)), 4326), pos);  
  QUERY PLAN                 
--------------------------------------------  
 Bitmap Heap Scan on public.f  (cost=663.40..77378.85 rows=13327 width=40) (actual time=1.496..11.038 rows=8491 loops=1)  
   Output: id, pos  
   Recheck Cond: (('0103000020E610000001000000050000000000000000000000000000000000000000000000000000000000000000000840000000000000F03F0000000000000840000000000000F03F000000000000000000000000000000000000000000000000'::geometry ~ f.pos) OR  
 ('0103000020E61000000100000005000000000000000000F03F0000000000000440000000000000F03F00000000000008400000000000001440000000000000084000000000000014400000000000000440000000000000F03F0000000000000440'::geometry ~ f.pos) OR ('0103000020E610000001000000050000000000000000001440000000000000044000000000000014400000000000001440000000000000184000000000000014400000000000001840000000000000044000000000000014400000000000000440'::geometry ~ f.pos) OR ('0103000020E6100000010000000500000000000000000018400000000000001040000000000000184000000000000014400000000000001C4000000000000014400000000000001C40000000000000104000000000000018400000000000001040'::geometry ~ f.pos))  
   Filter: ((('0103000020E610000001000000050000000000000000000000000000000000000000000000000000000000000000000840000000000000F03F0000000000000840000000000000F03F000000000000000000000000000000000000000000000000'::geometry ~ f.pos) AND _st_contains('0103000020E610000001000000050000000000000000000000000000000000000000000000000000000000000000000840000000000000F03F0000000000000840000000000000F03F000000000000000000000000000000000000000000000000'::geometry, f.pos)) OR (('0103000020E61000000100000005000000000000000000F03F0000000000000440000000000000F03F00000000000008400000000000001440000000000000084000000000000014400000000000000440000000000000F03F0000000000000440'::geometry ~ f.pos) AND _st_contains('0103000020E61000000100000005000000000000000000F03F0000000000000440000000000000F03F00000000000008400000000000001440000000000000084000000000000014400000000000000440000000000000F03F0000000000000440'::geometry, f.pos)) OR (('0103000020E610000001000000050000000000000000001440000000000000044000000000000014400000000000001440000000000000184000000000000014400000000000001840000000000000044000000000000014400000000000000440'::geometry ~ f.pos) AND _st_contains('0103000020E610000001000000050000000000000000001440000000000000044000000000000014400000000000001440000000000000184000000000000014400000000000001840000000000000044000000000000014400000000000000440'::geometry, f.pos)) OR (('0103000020E6100000010000000500000000000000000018400000000000001040000000000000184000000000000014400000000000001C4000000000000014400000000000001C40000000000000104000000000000018400000000000001040'::geometry ~ f.pos) AND _st_contains('0103000020E6100000010000000500000000000000000018400000000000001040000000000000184000000000000014400000000000001C4000000000000014400000000000001C40000000000000104000000000000018400000000000001040'::geometry, f.pos)))  
   Heap Blocks: exact=119  
   Buffers: shared hit=243  
   ->  BitmapOr  (cost=663.40..663.40 rows=40000 width=0) (actual time=1.472..1.472 rows=0 loops=1)  
         Buffers: shared hit=124  
         ->  Bitmap Index Scan on idx_f  (cost=0.00..162.52 rows=10000 width=0) (actual time=0.436..0.436 rows=3077 loops=1)  
               Index Cond: ('0103000020E610000001000000050000000000000000000000000000000000000000000000000000000000000000000840000000000000F03F0000000000000840000000000000F03F000000000000000000000000000000000000000000000000'::geometry ~ f.pos)  
               Buffers: shared hit=37  
         ->  Bitmap Index Scan on idx_f  (cost=0.00..162.52 rows=10000 width=0) (actual time=0.438..0.438 rows=1991 loops=1)  
               Index Cond: ('0103000020E61000000100000005000000000000000000F03F0000000000000440000000000000F03F00000000000008400000000000001440000000000000084000000000000014400000000000000440000000000000F03F0000000000000440'::geometry ~ f.pos)  
               Buffers: shared hit=33  
         ->  Bitmap Index Scan on idx_f  (cost=0.00..162.52 rows=10000 width=0) (actual time=0.365..0.365 rows=2435 loops=1)  
               Index Cond: ('0103000020E610000001000000050000000000000000001440000000000000044000000000000014400000000000001440000000000000184000000000000014400000000000001840000000000000044000000000000014400000000000000440'::geometry ~ f.pos)  
               Buffers: shared hit=31  
         ->  Bitmap Index Scan on idx_f  (cost=0.00..162.52 rows=10000 width=0) (actual time=0.234..0.234 rows=988 loops=1)  
               Index Cond: ('0103000020E6100000010000000500000000000000000018400000000000001040000000000000184000000000000014400000000000001C4000000000000014400000000000001C40000000000000104000000000000018400000000000001040'::geometry ~ f.pos)  
               Buffers: shared hit=23  
 Planning time: 0.163 ms  
 Execution time: 11.497 ms  
(22 rows)  

優化手段2,將長條條的polygon拆分成多個小的box,將大的bound box消除,搜索的BLOCK再次降低到243。質的飛躍。

將兩個手段合並起來用,起到了雙劍合璧的效果。

pic

st_split 切分對象

PostGIS提供了切分對象的方法。

https://postgis.net/docs/manual-2.4/ST_Split.html

pic

-- this creates a geometry collection consisting of the 2 halves of the polygon  
-- this is similar to the example we demonstrated in ST_BuildArea  
SELECT ST_Split(circle, line)  
FROM (SELECT  
    ST_MakeLine(ST_MakePoint(10, 10),ST_MakePoint(190, 190)) As line,  
    ST_Buffer(ST_GeomFromText('POINT(100 90)'), 50) As circle) As foo;  
  
-- result --  
 GEOMETRYCOLLECTION(POLYGON((150 90,149.039264020162 80.2454838991936,146.193976625564 70.8658283817455,...), POLYGON(...)))  
  
-- To convert to individual polygons, you can use ST_Dump or ST_GeometryN  
SELECT ST_AsText((ST_Dump(ST_Split(circle, line))).geom) As wkt  
FROM (SELECT  
    ST_MakeLine(ST_MakePoint(10, 10),ST_MakePoint(190, 190)) As line,  
    ST_Buffer(ST_GeomFromText('POINT(100 90)'), 50) As circle) As foo;  
  
-- result --  
wkt  
---------------  
POLYGON((150 90,149.039264020162 80.2454838991936,...))  
POLYGON((60.1371179574584 60.1371179574584,58.4265193848728 62.2214883490198,53.8060233744357 ...))  

pic

SELECT ST_AsText(ST_Split(mline, pt)) As wktcut  
        FROM (SELECT  
    ST_GeomFromText('MULTILINESTRING((10 10, 190 190), (15 15, 30 30, 100 90))') As mline,  
    ST_Point(30,30) As pt) As foo;  
  
wktcut  
------  
GEOMETRYCOLLECTION(  
    LINESTRING(10 10,30 30),  
    LINESTRING(30 30,190 190),  
    LINESTRING(15 15,30 30),  
    LINESTRING(30 30,100 90)  
)  

我後麵寫了一篇文檔來簡化SPLIT:

《PostgreSQL 空間切割(st_split)功能擴展 - 空間對象網格化》

st_snap

https://postgis.net/docs/manual-2.4/ST_Snap.html

@, ~ 與 ST_Contains, ST_Within的區別

@, ~ 與 ST_Contains, ST_Within都是對象包含的操作符或函數,他們有什麼區別呢?

@

A @ B  

Returns TRUE if A's bounding box is contained by B's.

~

與 @ 含義相反。

A ~ B  

Returns TRUE if A's bounding box contains B's.

ST_Contains

ST_Contains(A, B)  

Returns true if and only if no points of B lie in the exterior of A, and at least one point of the interior of B lies in the interior of A.

ST_Within

與 ST_Contains 含義相反。

ST_Within(A, B)  

Returns true if the geometry A is completely inside geometry B

區別

@ 和 ~的操作並不是直接針對幾何對象,而是針對A和B的bound box的,也就是說包含對象的左下和右上的點組成的BOX。  
  
ST_Within和ST_Contains是針對幾何對象的,但是從GiST索引搜索角度來看,是需要先用BOUND BOX去搜索,再通過CPU進行計算來判斷的。  

例子

pic

A @ Polygon,返回真  
  
B @ Polygon,返回真  
  
C @ Polygon,返回真  
  
  
ST_Contains(Polygon, A),返回假  
  
ST_Contains(Polygon, B),返回真  
  
ST_Contains(Polygon, C),返回假  

小結

空間搜索的兩個可以優化的點,原理如下:

1、空間數據在存儲時亂序存放,導致搜索一批數據時掃描的數據塊很多。(點查感覺不到這個問題。)

2、PostGIS的GiST空間索引,采用了BOUND BOX作為KEY,搜索時也是使用對象的BOUND BOX進行搜索,因此當對象是長條條時,可能造成大量的BOUND BOX空洞,放大了掃描範圍(對st_contains, st_within來說),增加了CPU過濾的開銷。

優化手段1:空間聚集,解決IO放大問題。

優化手段2:對輸入條件(長條條的多邊形)進行SPLIT,降低BOUND BOX放大引入的掃描範圍(對st_contains, st_within來說)放大的問題。

數據量:1000萬。

點麵判斷(長條形多邊形,或者離散多個多邊形對象覆蓋的空間對象)。

優化前 優化1(空間聚集) 優化1,2(SPLIT多邊形)
訪問35323塊 訪問1648塊 訪問243塊
過濾26590條 過濾26590條 過濾0條

參考

《通過空間思想理解GiST索引的構造》

《PostgreSQL 黑科技 - 空間聚集存儲》

《Greenplum 空間(GIS)數據檢索 b-tree & GiST 索引實踐 - 阿裏雲HybridDB for PostgreSQL最佳實踐》

《PostGIS空間索引(GiST、BRIN、R-Tree)選擇、優化 - 阿裏雲RDS PostgreSQL最佳實踐》

《PostGIS 空間數據學習建議》

《PostgreSQL 空間切割(st_split)功能擴展 - 空間對象網格化》

https://postgis.net/docs/manual-2.4/ST_Within.html

https://postgis.net/docs/manual-2.4/ST_Contains.html

https://postgis.net/docs/manual-2.4/ST_Geometry_Contained.html

https://postgis.net/docs/manual-2.4/ST_Geometry_Contain.html

https://postgis.net/docs/manual-2.4/ST_Split.html

https://postgis.net/docs/manual-2.4/ST_Snap.html

最後更新:2017-10-28 23:04:19

  上一篇:go  PostgreSQL 空間切割(st_split)功能擴展 - 空間對象網格化
  下一篇:go  PostgreSQL 空間獨立事件相關性分析 二 - 人車擬合