閱讀916 返回首頁    go 阿裏雲 go 技術社區[雲棲]


GIS業務,附近查找性能優化

標簽

PostgreSQL , PostGIS , KNN , order by 距離 sort 優化


背景

空間數據中對臨近點的檢索使用非常常見, 例如以經緯度為坐標點, 檢索離這個點1公裏範圍內的其他點的信息.

最近有網友問到這樣的問題,如何優化呢.

ps 現在的版本可以直接支持,不需要使用子查詢來支持了。

《PostgreSQL 百億地理位置數據 近鄰查詢性能》

正文

本文將以Postgis為例, 舉一個簡單的例子, 利用gist 索引加速檢索.

測試表 :

create table cust_jw        
(        
 dz varchar(300),        
 jwd geometry        
);        

測試數據, 取自經緯度信息網站.

insert into cust_jw values ('杭州', ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163));          
insert into cust_jw values ('北京', ST_Transform(ST_GeomFromText('POINT(116.46 39.92)', 4326), 2163));          
insert into cust_jw values ('南京', ST_Transform(ST_GeomFromText('POINT(118.78 32.04)', 4326), 2163));          
insert into cust_jw values ('南寧', ST_Transform(ST_GeomFromText('POINT(108.33 22.84)', 4326), 2163));          
insert into cust_jw values ('貴陽', ST_Transform(ST_GeomFromText('POINT(106.71 26.57)', 4326), 2163));          
insert into cust_jw values ('南昌', ST_Transform(ST_GeomFromText('POINT(115.89 28.68)', 4326), 2163));          
insert into cust_jw values ('餘杭', ST_Transform(ST_GeomFromText('POINT(120.3 30.43)', 4326), 2163));          

創建gist索引 :

create index idx_cust_jw_1 on cust_jw using gist(jwd);          

這個索引方法支持包含<->兩個幾何類型的距離排序和&&兩個幾何類型相交.

詳見pg_amop , pg_am, pg_operator, pg_opfamily等係統表.

以下SQL查出北京到杭州的直線距離, 單位米 :

SELECT ST_Distance(          
ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163),          
ST_Transform(ST_GeomFromText('POINT(116.46 39.92)', 4326), 2163)          
);          
   st_distance              
------------------          
 1256521.71432098          
(1 row)          

以下SQL 查出表中距離ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)這個點20公裏的坐標.

函數使用方法參考postgis手冊.

digoal=# select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) from cust_jw where jwd && ST_Buffer(ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163), 20000, 10);          
  dz  |                        jwd                         |   st_distance              
------+----------------------------------------------------+------------------          
 杭州 | 0101000020730800004C94087D5D4F54C173AA7759E8FB5D41 |                0          
 餘杭 | 0101000020730800000E6E5A20494854C121FC688DA9EF5D41 | 14483.9823187612          
(2 rows)          
Time: 1.335 ms          

前麵已經說了, 這個索引訪問方法支持&&操作符, <->操作符.

digoal=# explain select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) from cust_jw where jwd && ST_Buffer(ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163), 20000, 10);                                                                                                                   
                          QUERY PLAN                                                                                                          
                                                                                                                                              
                                                                                                                                              
                                                                                                                                              
                                                                                                                                              
                                                                         
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
---------------------------------------------------------------          
 Index Scan using idx_cust_jw_1 on cust_jw  (cost=0.14..3.41 rows=1 width=548)          
   Index Cond: (jwd && '01030000207308000001000000290000004C94087DD53B54C173AA7759E8FB5D411122F50B133C54C154295A2DDAF85D41D751B134CA          
3C54C1F4F2B643DFF55D41B6BBAE74F63D54C10FB6A0650AF35D41CDDC4767903F54C1D331586C6DF05D4124855AF48D4154C14B9BC9D018EE5D41AC1BE98FE24354          
C1F4F2B6431BEC5D41E89F31897F4654C1DDD11D5181EA5D41CDDC4767544954C1FE67201155E95D412D13EB504F4C54C1383864E89DE85D414C94087D5D4F54C173          
AA775960E85D416B1526A96B5254C1383864E89DE85D41CB4BC992665554C1FE67201155E95D41B088DF703B5854C1DDD11D5181EA5D41EC0C286AD85A54C1F4F2B6          
431BEC5D4174A3B6052D5D54C14B9BC9D018EE5D41CB4BC9922A5F54C1D331586C6DF05D41E26C6285C46054C10FB6A0650AF35D41C1D65FC5F06154C1F4F2B643DF          
F55D4187061CEEA76254C154295A2DDAF85D414C94087DE56254C173AA7759E8FB5D4187061CEEA76254C1922B9585F6FE5D41C1D65FC5F06154C1F261386FF1015E          
41E26C6285C46054C1D79E4E4DC6045E41CB4BC9922A5F54C11323974663075E4174A3B6052D5D54C19BB925E2B7095E41EC0C286AD85A54C1F261386FB50B5E41B0          
88DF703B5854C10983D1614F0D5E41CB4BC992665554C1E8ECCEA17B0E5E416B1526A96B5254C1AE1C8BCA320F5E414C94087D5D4F54C173AA7759700F5E412D13EB          
504F4C54C1AE1C8BCA320F5E41CDDC4767544954C1E8ECCEA17B0E5E41E89F31897F4654C10983D1614F0D5E41AC1BE98FE24354C1F261386FB50B5E4124855AF48D          
4154C19BB925E2B7095E41CDDC4767903F54C11323974663075E41B6BBAE74F63D54C1D79E4E4DC6045E41D751B134CA3C54C1F261386FF1015E411122F50B133C54          
C1922B9585F6FE5D414C94087DD53B54C173AA7759E8FB5D41'::geometry)          
(2 rows)          
Time: 1.218 ms          

以下SQL 按距離排序.

digoal=# select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) from cust_jw order by jwd <-> ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163);          
  dz  |                        jwd                         |   st_distance              
------+----------------------------------------------------+------------------          
 杭州 | 0101000020730800004C94087D5D4F54C173AA7759E8FB5D41 |                0          
 餘杭 | 0101000020730800000E6E5A20494854C121FC688DA9EF5D41 | 14483.9823187612          
 南京 | 0101000020730800000FFE5AD1D62653C16F4F972A10635E41 | 321491.591341196          
 南昌 | 010100002073080000B2744BA1FE5253C10975D1494AA25F41 | 503843.306221247          
 北京 | 0101000020730800006EBBB0F1AB0E4FC17207C71D44525E41 | 1256521.71432098          
 南寧 | 01010000207308000030806B3882F451C18E3F38DCBB686141 |  1409624.7420143          
 貴陽 | 01010000207308000082EA89026EE14FC1D6A3AD6E9E786141 | 1732521.31784296          
(7 rows)          
Time: 0.598 ms          

通過以下方法強製排序走索引 :

digoal=# set enable_seqscan=off;          
SET          
Time: 0.109 ms          
digoal=# explain select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) from cust_jw order by jwd <-> ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163);          
                                      QUERY PLAN                                                
--------------------------------------------------------------------------------------          
 Index Scan using idx_cust_jw_1 on cust_jw  (cost=0.14..54.44 rows=140 width=548)          
   Order By: (jwd <-> '0101000020730800004C94087D5D4F54C173AA7759E8FB5D41'::geometry)          
(2 rows)          

以下為進一步的優化, 如果點比較密集的話, 這種方法比較好.

digoal=# select * from (select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) AS dist from cust_jw order by jwd <-> ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163) limit 1000) t where dist<15000;          
  dz  |                        jwd                         |       dist                 
------+----------------------------------------------------+------------------          
 杭州 | 0101000020730800004C94087D5D4F54C173AA7759E8FB5D41 |                0          
 餘杭 | 0101000020730800000E6E5A20494854C121FC688DA9EF5D41 | 14483.9823187612          
(2 rows)          
Time: 0.634 ms          

進一步優化, 使用遊標, 可以將數據掃描降到極限. (前提是for循環中的sql order by使用了索引)

digoal=# do language plpgsql $$          
declare          
  v_rec record;          
  v_limit int := 1000;          
begin          
  set enable_seqscan=off;  -- 強製索引, 因為掃描行數夠就退出.          
  for v_rec in select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) AS dist from cust_jw order by jwd <-> ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163) loop          
    if v_limit <=0 then           
      raise notice '已經取足數據';          
      return;          
    end if;          
    if v_rec.dist > 20000 then           
      raise notice '滿足條件的點已輸出完畢';          
      return;          
    else          
      raise notice 'do someting, v_rec:%', v_rec;          
    end if;          
    v_limit := v_limit -1;          
  end loop;          
end;          
$$;          
NOTICE:  do someting, v_rec:(杭州,0101000020730800004C94087D5D4F54C173AA7759E8FB5D41,0)          
NOTICE:  do someting, v_rec:(餘杭,0101000020730800000E6E5A20494854C121FC688DA9EF5D41,14483.9823187612)          
NOTICE:  滿足條件的點已輸出完畢          
DO          

使用這種方法最多掃描比需求結果多1行.

參考

1. https://www.ximizi.com/jingweidu.php

2. https://postgis.net/docs/manual-2.0/ST_Distance_Sphere.html

3. https://postgis.net/docs/manual-2.0/ST_Buffer.html

4. https://postgis.net/docs/manual-2.0/ST_Transform.html

5. https://postgis.net/docs/manual-2.0/ST_GeomFromText.html

6. https://postgis.net/docs/manual-2.0/geometry_distance_centroid.html

最後更新:2017-05-08 13:03:04

  上一篇:go 【xin站大全】優秀.xin域名終端企業網站推薦
  下一篇:go 《能源地圖》:阿裏雲數加MaxCompute輕鬆處理8000億條氣象數據