閱讀324 返回首頁    go 阿裏雲 go 技術社區[雲棲]


PostgreSQL 空間切割(st_split)功能擴展 - 空間對象網格化

標簽

PostgreSQL , PostGIS , st_split , ST_GeometryN , ST_NumGeometries , ST_XMax , ST_XMin , ST_YMax , ST_YMin , UDF , st_area , ST_MakeBox2D, 麵積占比


背景

前麵介紹了空間包含(st_contains, st_within)搜索降CPU的優化方法,將長條形(相對於BOUND BOX空間占比很小)的對象切分成多個空間對象,提升相對於bound box的空間占比,從而減少掃描範圍,提升命中率。

《PostgreSQL 空間st_contains,st_within空間包含搜索優化 - 降IO和降CPU(bound box)》

這種優化方法的關鍵是SPLIT,PostGIS提供了一種split函數,但是隻能支持一次切兩片,本文提供一種方法,可以根據用戶的需求進行自由切割。(輸入被切割的目標對象,橫向切割多少刀,縱向切割多少刀,麵積占比高於多少時不切割。)

八星八箭有木有:

pic

空間split,目的是降低無效麵積,看看這幅無效麵積有多大吧,嚇不嚇人?

pic

空間切割示例

切割邏輯

1、輸入被切割的目標對象,橫向切割多少刀,縱向切割多少刀,麵積占比高於多少時不切割。

2、計算目標對象麵積。

2、獲取目標對象的bound box邊界。計算BOUND BOX麵積。

3、判斷是否需要切割。

求橫向切割線

切割

4、求切割後的geometry coll有幾個對象。

5、轉換為geo數組輸出。

切割代碼

create or replace function split_geo(  
  i_geo geometry,   -- 被切割的目標對象  
  i_srid int,      -- SRID  
  i_x int2,         -- X方向切多少刀  
  i_y int2,         -- Y方向切多少刀  
  i_aratio float4   -- 麵積占比閾值,高於它則不切割  
)   
returns geometry[] as $$  
declare  
  res geometry[];         -- 結果  
  tmp_geo geometry;       -- 切割後的臨時對象(geometry collection)  
  split_geos geometry[];  -- 切割線數組  
  split_line geometry;    -- 線段  
  v_area_obj float8;      -- 目標對象麵積  
  v_area_box float8;      -- 目標對象的bound box的麵積  
  v_xmin float8 := ST_XMin(i_geo);          -- 目標對象BOUND BOX,XMIN  
  v_ymin float8 := ST_YMin(i_geo);          -- 目標對象BOUND BOX,YMIN  
  v_xmax float8 := ST_XMax(i_geo);          -- 目標對象BOUND BOX,XMAX  
  v_ymax float8 := ST_YMax(i_geo);          -- 目標對象BOUND BOX,YMAX  
  v_box geometry;         -- 目標對象的BOUND BOX  
  x_geo geometry;         -- 分解geometry collection臨時對象  
begin  
  -- 求邊界  
  v_box := st_setsrid(ST_MakeBox2D(st_makepoint(v_xmin,v_ymin), st_makepoint(v_xmax,v_ymax)),i_srid);  
    
  -- 求麵積  
  v_area_obj := st_area(i_geo);  
  v_area_box := st_area(v_box);  
  
  -- split 前的空間占比  
  raise notice '%', (v_area_obj/v_area_box);  
    
  -- 計算麵積占比,判斷是否需要切割  
  if (v_area_obj/v_area_box) > i_aratio then    
    -- 大於麵積比,不切割  
    return array[i_geo];  
  else  
    -- 計算切割線段X位點  
    for i in 1..i_x  
    loop  
      split_geos := coalesce  
        (   
        array_append(split_geos, st_setsrid(st_makeline(st_makepoint(v_xmin+i*((v_xmax-v_xmin)/(i_x+1)), v_ymin), st_makepoint(v_xmin+i*((v_xmax-v_xmin)/(i_x+1)), v_ymax)), i_srid))  
        ,   
        array[st_setsrid(st_makeline(st_makepoint(v_xmin+i*((v_xmax-v_xmin)/(i_x+1)), v_ymin), st_makepoint(v_xmin+i*((v_xmax-v_xmin)/(i_x+1)), v_ymax)), i_srid)]   
        );   
    end loop;  
  
    -- 計算切割線段Y位點  
    for i in 1..i_y  
    loop  
      split_geos := coalesce  
        (   
        array_append(split_geos, st_setsrid(st_makeline(st_makepoint(v_xmin, v_ymin+i*((v_ymax-v_ymin)/(i_y+1))), st_makepoint(v_xmax, v_ymin+i*((v_ymax-v_ymin)/(i_y+1)))), i_srid))  
        ,   
        array[st_setsrid(st_makeline(st_makepoint(v_xmin, v_ymin+i*((v_ymax-v_ymin)/(i_y+1))), st_makepoint(v_xmax, v_ymin+i*((v_ymax-v_ymin)/(i_y+1)))), i_srid)]   
        );   
    end loop;  
  
    -- 切割  
    foreach split_line in array split_geos  
    loop  
      tmp_geo := coalesce  
        (  
	st_split(tmp_geo, split_line)  
	,  
        st_split(i_geo, split_line)  
	);  
    end loop;  
  end if;  
  
  -- 將geometry collection轉換為geometry數組  
  for i in 1..ST_NumGeometries(tmp_geo)  
  loop  
    res := coalesce(array_append(res, ST_GeometryN(tmp_geo, i)), array[ST_GeometryN(tmp_geo, i)]);  
  end loop;  
  
  -- split 後的空間占比  
  v_area_obj := 0;  
  v_area_box := 0;  
  
  foreach x_geo in array res  
  loop  
    v_area_obj := v_area_obj + st_area(x_geo);  
    v_area_box := v_area_box + st_area(st_setsrid(ST_MakeBox2D(st_makepoint(st_xmin(x_geo),st_ymin(x_geo)), st_makepoint(st_xmax(x_geo),st_ymax(x_geo))),i_srid));  
  end loop;  
  
  -- split 後的空間占比  
  raise notice '%', (v_area_obj/v_area_box);  
  
  return res;  
end;  
$$ language plpgsql strict immutable;  

驗證切割

1、橫向縱向各切2刀,最多得到9個對象。(當刀下去後沒有切到有效部位時不返回,因此可能少於9個對象。)

select st_astext(  
unnest(  
split_geo(  
  st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)')), 4326),  
  4326::int4,  
  2::int2,  
  2::int2,  
  0.9::float4  
)));  
  
  
NOTICE:  0.242857142857143  
NOTICE:  0.818181818181818  
                                                        st_astext                                                          
-------------------------------------------------------------------------------------------------------------------------  
 POLYGON((2.33333333333333 2.5,1 2.5,1 1.66666666666667,0 1.66666666666667,0 3,2.33333333333333 3,2.33333333333333 2.5))  
 POLYGON((1 1.66666666666667,1 0,0 0,0 1.66666666666667,1 1.66666666666667))  
 POLYGON((2.33333333333333 3,4.66666666666667 3,4.66666666666667 2.5,2.33333333333333 2.5,2.33333333333333 3))  
 POLYGON((4.66666666666667 3,5 3,5 3.33333333333333,6 3.33333333333333,6 2.5,4.66666666666667 2.5,4.66666666666667 3))  
 POLYGON((5 3.33333333333333,5 5,7 5,7 4,6 4,6 3.33333333333333,5 3.33333333333333))  
(5 rows)  

切割前後的麵積占比分別是24%和82%,一下子提升了好多,如果使用這幾個拚接,可以少掃描若幹個數據塊。

2、橫向縱向各切2刀,最多得到9個對象。(當刀下去後沒有切到有效部位時不返回,因此可能少於9個對象。)

切割前後的麵積占比分別是24%和98%,一下子提升了好多,如果使用這幾個拚接,可以少掃描若幹個數據塊。

postgres=# select st_astext(  
unnest(  
split_geo(  
  st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)')), 4326),  
  4326::int4,  
  4::int2,  
  4::int2,  
  0.9::float4  
)));  
  
  
NOTICE:  0.242857142857143  
NOTICE:  0.977011494252874  
                     st_astext                        
----------------------------------------------------  
 POLYGON((1.4 2.5,1 2.5,1 2,0 2,0 3,1.4 3,1.4 2.5))  
 POLYGON((1 2,1 1,0 1,0 2,1 2))  
 POLYGON((1 1,1 0,0 0,0 1,1 1))  
 POLYGON((1.4 3,2.8 3,2.8 2.5,1.4 2.5,1.4 3))  
 POLYGON((2.8 3,4.2 3,4.2 2.5,2.8 2.5,2.8 3))  
 POLYGON((4.2 3,5 3,5.6 3,5.6 2.5,4.2 2.5,4.2 3))  
 POLYGON((5 3,5 4,5.6 4,5.6 3,5 3))  
 POLYGON((5 4,5 5,5.6 5,5.6 4,5 4))  
 POLYGON((5.6 5,7 5,7 4,6 4,5.6 4,5.6 5))  
 POLYGON((6 4,6 3,5.6 3,5.6 4,6 4))  
 POLYGON((6 3,6 2.5,5.6 2.5,5.6 3,6 3))  
(11 rows)  

切割前後優化對比

不切割,掃描1648個數據塊,過濾26590條無效數據。

explain (analyze,verbose,timing,costs,buffers)  
select * from f where st_within(pos, st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)')), 4326));  
  
  
 Index Scan using idx_f on public.f  (cost=0.42..15026.72 rows=3333 width=40) (actual time=1.519..35.773 rows=8491 loops=1)  
   Output: id, pos  
   Index Cond: ('0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry ~ f.pos)  
   Filter: _st_contains('0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry, f.pos)  
   Rows Removed by Filter: 26590  
   Buffers: shared hit=1648  
 Planning time: 0.274 ms  
 Execution time: 36.212 ms  

切割,掃描610個數據塊,過濾1932條無效數據。

explain (analyze,verbose,timing,costs,buffers)  
select * from f where st_within(pos, st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(2.33333333333333 2.5,1 2.5,1 1.66666666666667,0 1.66666666666667,0 3,2.33333333333333 3,2.33333333333333 2.5)')), 4326))  
union all  
select * from f where st_within(pos, st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(1 1.66666666666667,1 0,0 0,0 1.66666666666667,1 1.66666666666667)')), 4326))  
union all  
select * from f where st_within(pos, st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(2.33333333333333 3,4.66666666666667 3,4.66666666666667 2.5,2.33333333333333 2.5,2.33333333333333 3)')), 4326))  
union all  
select * from f where st_within(pos, st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(4.66666666666667 3,5 3,5 3.33333333333333,6 3.33333333333333,6 2.5,4.66666666666667 2.5,4.66666666666667 3)')), 4326))  
union all  
select * from f where st_within(pos, st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(5 3.33333333333333,5 5,7 5,7 4,6 4,6 3.33333333333333,5 3.33333333333333)')), 4326))  
;  
  
  
 Append  (cost=0.42..75300.24 rows=16665 width=40) (actual time=0.113..11.690 rows=8491 loops=1)  
   Buffers: shared hit=610  
   ->  Index Scan using idx_f on public.f  (cost=0.42..15026.72 rows=3333 width=40) (actual time=0.113..3.365 rows=2053 loops=1)  
         Output: f.id, f.pos  
         Index Cond: ('0103000020E61000000100000007000000A3AAAAAAAAAA02400000000000000440000000000000F03F0000000000000440000000000000F03FBAAAAAAAAAAAFA3F0000000000000000BAAAAAAAAAAAFA3F00000000000000000000000000000840A3AAAAAAAAAA02400000000000000840A3AAAAAAAAAA02400000000000000440'::geometry ~ f.pos)  
         Filter: _st_contains('0103000020E61000000100000007000000A3AAAAAAAAAA02400000000000000440000000000000F03F0000000000000440000000000000F03FBAAAAAAAAAAAFA3F0000000000000000BAAAAAAAAAAAFA3F00000000000000000000000000000840A3AAAAAAAAAA02400000000000000840A3AAAAAAAAAA02400000000000000440'::geometry, f.pos)  
         Rows Removed by Filter: 1142  
         Buffers: shared hit=189  
   ->  Index Scan using idx_f on public.f f_1  (cost=0.42..15026.72 rows=3333 width=40) (actual time=0.084..1.734 rows=1699 loops=1)  
         Output: f_1.id, f_1.pos  
         Index Cond: ('0103000020E61000000100000005000000000000000000F03FBAAAAAAAAAAAFA3F000000000000F03F0000000000000000000000000000000000000000000000000000000000000000BAAAAAAAAAAAFA3F000000000000F03FBAAAAAAAAAAAFA3F'::geometry ~ f_1.pos)  
         Filter: _st_contains('0103000020E61000000100000005000000000000000000F03FBAAAAAAAAAAAFA3F000000000000F03F0000000000000000000000000000000000000000000000000000000000000000BAAAAAAAAAAAFA3F000000000000F03FBAAAAAAAAAAAFA3F'::geometry, f_1.pos)  
         Buffers: shared hit=92  
   ->  Index Scan using idx_f on public.f f_2  (cost=0.42..15026.72 rows=3333 width=40) (actual time=0.075..1.283 rows=1158 loops=1)  
         Output: f_2.id, f_2.pos  
         Index Cond: ('0103000020E61000000100000005000000A3AAAAAAAAAA02400000000000000840AEAAAAAAAAAA12400000000000000840AEAAAAAAAAAA12400000000000000440A3AAAAAAAAAA02400000000000000440A3AAAAAAAAAA02400000000000000840'::geometry ~ f_2.pos)  
         Filter: _st_contains('0103000020E61000000100000005000000A3AAAAAAAAAA02400000000000000840AEAAAAAAAAAA12400000000000000840AEAAAAAAAAAA12400000000000000440A3AAAAAAAAAA02400000000000000440A3AAAAAAAAAA02400000000000000840'::geometry, f_2.pos)  
         Buffers: shared hit=74  
   ->  Index Scan using idx_f on public.f f_3  (cost=0.42..15026.72 rows=3333 width=40) (actual time=0.095..1.214 rows=986 loops=1)  
         Output: f_3.id, f_3.pos  
         Index Cond: ('0103000020E61000000100000007000000AEAAAAAAAAAA12400000000000000840000000000000144000000000000008400000000000001440A3AAAAAAAAAA0A400000000000001840A3AAAAAAAAAA0A4000000000000018400000000000000440AEAAAAAAAAAA12400000000000000440AEAAAAAAAAAA12400000000000000840'::geometry ~ f_3.pos)  
         Filter: _st_contains('0103000020E61000000100000007000000AEAAAAAAAAAA12400000000000000840000000000000144000000000000008400000000000001440A3AAAAAAAAAA0A400000000000001840A3AAAAAAAAAA0A4000000000000018400000000000000440AEAAAAAAAAAA12400000000000000440AEAAAAAAAAAA12400000000000000840'::geometry, f_3.pos)  
         Rows Removed by Filter: 127  
         Buffers: shared hit=68  
   ->  Index Scan using idx_f on public.f f_4  (cost=0.42..15026.72 rows=3333 width=40) (actual time=0.105..3.331 rows=2595 loops=1)  
         Output: f_4.id, f_4.pos  
         Index Cond: ('0103000020E610000001000000070000000000000000001440A3AAAAAAAAAA0A40000000000000144000000000000014400000000000001C4000000000000014400000000000001C400000000000001040000000000000184000000000000010400000000000001840A3AAAAAAAAAA0A400000000000001440A3AAAAAAAAAA0A40'::geometry ~ f_4.pos)  
         Filter: _st_contains('0103000020E610000001000000070000000000000000001440A3AAAAAAAAAA0A40000000000000144000000000000014400000000000001C4000000000000014400000000000001C400000000000001040000000000000184000000000000010400000000000001840A3AAAAAAAAAA0A400000000000001440A3AAAAAAAAAA0A40'::geometry, f_4.pos)  
         Rows Removed by Filter: 663  
         Buffers: shared hit=187  
 Planning time: 0.397 ms  
 Execution time: 12.150 ms  
(32 rows)  

PostgreSQL支持 op any\some\all(array)的操作,因此我們可以將SQL簡化成這樣,9宮格切割,掃描的數據塊降低到278,過濾記錄降到了1932:

explain (analyze,verbose,timing,costs,buffers) select * from f where pos @ any(split_geo(  
  st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)')), 4326),  
  4326::int4,  
  2::int2,  
  2::int2,  
  0.9::float4  
))
and
st_within(pos, st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)')), 4326));

NOTICE:  0.242857142857143
NOTICE:  0.818181818181818


 Bitmap Heap Scan on public.f  (cost=9.09..87.16 rows=17 width=40) (actual time=2.270..10.578 rows=8491 loops=1)
   Output: id, pos
   Recheck Cond: ((f.pos @ ANY ('{0103000020E61000000100000007000000ABAAAAAAAAAA02400000000000000440000000000000F03F0000000000000440000000000000F03FABAAAAAAAAAAFA3F0000000000000000ABAAAAAAAAAAFA3F00000000000000000000000000000840ABAAAAAAAAAA02400000000000000840ABAAAAAAAAAA02400000000000000440:0103000020E61000000100000005000000000000000000F03FABAAAAAAAAAAFA3F000000000000F03F0000000000000000000000000000000000000000000000000000000000000000ABAAAAAAAAAAFA3F000000000000F03FABAAAAAAAAAAFA3F:0103000020E61000000100000005000000ABAAAAAAAAAA02400000000000000840ABAAAAAAAAAA12400000000000000840ABAAAAAAAAAA12400000000000000440ABAAAAAAAAAA02400000000000000440ABAAAAAAAAAA02400000000000000840:0103000020E61000000100000007000000ABAAAAAAAAAA12400000000000000840000000000000144000000000000008400000000000001440ABAAAAAAAAAA0A400000000000001840ABAAAAAAAAAA0A4000000000000018400000000000000440ABAAAAAAAAAA12400000000000000440ABAAAAAAAAAA12400000000000000840:0103000020E610000001000000070000000000000000001440ABAAAAAAAAAA0A40000000000000144000000000000014400000000000001C4000000000000014400000000000001C400000000000001040000000000000184000000000000010400000000000001840ABAAAAAAAAAA0A400000000000001440ABAAAAAAAAAA0A40}'::geometry[])) AND ('0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry ~ f.pos))
   Filter: _st_contains('0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry, f.pos)
   Rows Removed by Filter: 1932
   Heap Blocks: exact=133
   Buffers: shared hit=278
   ->  Bitmap Index Scan on idx_f  (cost=0.00..9.09 rows=50 width=0) (actual time=2.244..2.244 rows=10423 loops=1)
         Index Cond: ((f.pos @ ANY ('{0103000020E61000000100000007000000ABAAAAAAAAAA02400000000000000440000000000000F03F0000000000000440000000000000F03FABAAAAAAAAAAFA3F0000000000000000ABAAAAAAAAAAFA3F00000000000000000000000000000840ABAAAAAAAAAA02400000000000000840ABAAAAAAAAAA02400000000000000440:0103000020E61000000100000005000000000000000000F03FABAAAAAAAAAAFA3F000000000000F03F0000000000000000000000000000000000000000000000000000000000000000ABAAAAAAAAAAFA3F000000000000F03FABAAAAAAAAAAFA3F:0103000020E61000000100000005000000ABAAAAAAAAAA02400000000000000840ABAAAAAAAAAA12400000000000000840ABAAAAAAAAAA12400000000000000440ABAAAAAAAAAA02400000000000000440ABAAAAAAAAAA02400000000000000840:0103000020E61000000100000007000000ABAAAAAAAAAA12400000000000000840000000000000144000000000000008400000000000001440ABAAAAAAAAAA0A400000000000001840ABAAAAAAAAAA0A4000000000000018400000000000000440ABAAAAAAAAAA12400000000000000440ABAAAAAAAAAA12400000000000000840:0103000020E610000001000000070000000000000000001440ABAAAAAAAAAA0A40000000000000144000000000000014400000000000001C4000000000000014400000000000001C400000000000001040000000000000184000000000000010400000000000001840ABAAAAAAAAAA0A400000000000001440ABAAAAAAAAAA0A40}'::geometry[])) AND ('0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry ~ f.pos))
         Buffers: shared hit=145
 Planning time: 2.828 ms
 Execution time: 11.024 ms
(12 rows)

小結

1、split後,有效麵積占比提升,從而降低了無效數據掃描帶來的IO和CPU開銷,提升了性能。

2、split過多,不同分片的GiST索引的非葉子節點可能被重複掃描,不過基本上都能在CACHE命中的話,可以不CARE這部分開銷。

參考

https://postgis.net/docs/manual-2.4/ST_Split.html

https://postgis.net/docs/manual-2.4/ST_XMin.html

https://postgis.net/docs/manual-2.4/ST_XMax.html

https://postgis.net/docs/manual-2.4/ST_YMin.html

https://postgis.net/docs/manual-2.4/ST_XMax.html

https://postgis.net/docs/manual-2.4/ST_GeometryN.html

https://postgis.net/docs/manual-2.4/ST_NumGeometries.html

https://postgis.net/docs/manual-2.4/ST_Area.html

https://postgis.net/docs/manual-2.4/ST_MakeBox2D.html

https://postgis.net/docs/manual-2.4/ST_SetSRID.html

《PostgreSQL 空間st_contains,st_within空間包含搜索優化 - 降IO和降CPU(bound box)》

本文提到的切割方法還是比較粗糙的,更好的切割算法我們可以繼續探討。

最後更新:2017-10-28 23:33:57

  上一篇:go  PostgreSQL 實踐 - 內容社區(如論壇)圖式搜索應用
  下一篇:go  PostgreSQL 空間st_contains,st_within空間包含搜索優化 - 降IO和降CPU(bound box)