菜鳥末端軌跡(解密支撐每天251億個包裹的數據庫) - 阿裏雲RDS PostgreSQL最佳實踐
標簽
PostgreSQL , PostGIS , 多邊形 , 麵 , 點 , 麵點判斷 , 菜鳥
背景
菜鳥末端軌跡項目中涉及的一個關鍵需求,麵麵判斷。
在數據庫中存儲了一些多邊形記錄,約幾百萬到千萬條記錄,例如一個小區,在地圖上是一個多邊形。
不同的快遞公司,會有各自不同的多邊形劃分方法(每個網點負責的片區(多邊形),每個快遞員負責的片區(多邊形))。
用戶在寄件時,根據用戶的位置,查找對應快遞公司負責這個片區的網點、或者負責該片區的快遞員。
一、需求
1、在數據庫中存儲了一些靜態的麵信息,代表小區、園區、寫字樓等等。所有的麵不相交。
2、為了支持不同的業務類型,對一個地圖,可能劃分為不同的多邊形組成。
例如不同的快遞公司,會有各自不同的多邊形劃分方法(網點負責的片區(多邊形),某個快遞員負責的片區(多邊形))。
因此在一張地圖上,有多個圖層,每個圖層的多邊形劃分方法可能不一樣。
3、快速的根據快遞公司、客戶的位置,求包含這個點的多邊形(即得到對應快遞公司負責這個片區的網點、或者負責該片區的快遞員)。
二、架構設計
用到阿裏雲的RDS PostgreSQL,以及PG提供的PostGIS插件。
我們需要用到PostGIS的函數有兩個
https://postgis.net/docs/manual-2.3/ST_Within.html
1、ST_within
ST_Within — Returns true if the geometry A is completely inside geometry B
boolean ST_Within(geometry A, geometry B);
Returns TRUE if geometry A is completely inside geometry B. For this function to make sense, the source geometries must both be of the same coordinate projection, having the same SRID. It is a given that if ST_Within(A,B) is true and ST_Within(B,A) is true, then the two geometries are considered spatially equal.
This function call will automatically include a bounding box comparison that will make use of any indexes that are available on the geometries. To avoid index use, use the function _ST_Within.
-- a circle within a circle
SELECT ST_Within(smallc,smallc) As smallinsmall,
ST_Within(smallc, bigc) As smallinbig,
ST_Within(bigc,smallc) As biginsmall,
ST_Within(ST_Union(smallc, bigc), bigc) as unioninbig,
ST_Within(bigc, ST_Union(smallc, bigc)) as biginunion,
ST_Equals(bigc, ST_Union(smallc, bigc)) as bigisunion
FROM
(
SELECT ST_Buffer(ST_GeomFromText('POINT(50 50)'), 20) As smallc,
ST_Buffer(ST_GeomFromText('POINT(50 50)'), 40) As bigc) As foo;
-- Result
smallinsmall | smallinbig | biginsmall | unioninbig | biginunion | bigisunion
--------------+------------+------------+------------+------------+------------
t | t | f | t | t | t
(1 row)
2、ST_Contains
ST_Contains — Returns true if and only if no points of B lie in the exterior of A, and at least one point of the interior of B lies in the interior of A.
boolean ST_Contains(geometry geomA, geometry geomB);
Returns TRUE if geometry B is completely inside geometry A. For this function to make sense, the source geometries must both be of the same coordinate projection, having the same SRID. ST_Contains is the inverse of ST_Within. So ST_Contains(A,B) implies ST_Within(B,A) except in the case of invalid geometries where the result is always false regardless or not defined.
This function call will automatically include a bounding box comparison that will make use of any indexes that are available on the geometries. To avoid index use, use the function _ST_Contains.
-- A circle within a circle
SELECT ST_Contains(smallc, bigc) As smallcontainsbig,
ST_Contains(bigc,smallc) As bigcontainssmall,
ST_Contains(bigc, ST_Union(smallc, bigc)) as bigcontainsunion,
ST_Equals(bigc, ST_Union(smallc, bigc)) as bigisunion,
ST_Covers(bigc, ST_ExteriorRing(bigc)) As bigcoversexterior,
ST_Contains(bigc, ST_ExteriorRing(bigc)) As bigcontainsexterior
FROM (SELECT ST_Buffer(ST_GeomFromText('POINT(1 2)'), 10) As smallc,
ST_Buffer(ST_GeomFromText('POINT(1 2)'), 20) As bigc) As foo;
-- Result
smallcontainsbig | bigcontainssmall | bigcontainsunion | bigisunion | bigcoversexterior | bigcontainsexterior
------------------+------------------+------------------+------------+-------------------+---------------------
f | t | t | t | t | f
-- Example demonstrating difference between contains and contains properly
SELECT ST_GeometryType(geomA) As geomtype, ST_Contains(geomA,geomA) AS acontainsa, ST_ContainsProperly(geomA, geomA) AS acontainspropa,
ST_Contains(geomA, ST_Boundary(geomA)) As acontainsba, ST_ContainsProperly(geomA, ST_Boundary(geomA)) As acontainspropba
FROM (VALUES ( ST_Buffer(ST_Point(1,1), 5,1) ),
( ST_MakeLine(ST_Point(1,1), ST_Point(-1,-1) ) ),
( ST_Point(1,1) )
) As foo(geomA);
geomtype | acontainsa | acontainspropa | acontainsba | acontainspropba
--------------+------------+----------------+-------------+-----------------
ST_Polygon | t | f | f | f
ST_LineString | t | f | f | f
ST_Point | t | t | f | f
三、DEMO與性能
1 PG內置幾何類型 麵點搜索 壓測
為了簡化測試,采樣PG內置的幾何類型進行測試,用法與PostGIS是類似的。
1、創建測試表
postgres=# create table po(id int, typid int, po polygon);
CREATE TABLE
2、創建分區表或分區索引
create extension btree_gist;
create index idx_po_1 on po using gist(typid, po);
3、創建空間排他約束,可選
如果要求單個typid內的po不重疊,可以創建空間排他約束
create table tbl_po(id int, typid int, po polygon)
PARTITION BY LIST (typid);
CREATE TABLE tbl_po_1
PARTITION OF tbl_po (
EXCLUDE USING gist (po WITH &&)
) FOR VALUES IN (1);
...
CREATE TABLE tbl_po_20
PARTITION OF tbl_po (
EXCLUDE USING gist (po WITH &&)
) FOR VALUES IN (20);
查看某分區表的空間排他約束如下
postgres=# \d tbl_po_1
Table "postgres.tbl_po_1"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
typid | integer | | |
po | polygon | | |
Partition of: tbl_po FOR VALUES IN (1)
Indexes:
"tbl_po_1_po_excl" EXCLUDE USING gist (po WITH &&)
4、寫入1000萬多邊形測試數據
insert into po select id, random()*20, polygon('(('||x1||','||y1||'),('||x2||','||y2||'),('||x3||','||y3||'))') from (select id, 180-random()*180 x1, 180-random()*180 x2, 180-random()*180 x3, 90-random()*90 y1, 90-random()*90 y2, 90-random()*90 y3 from generate_series(1,10000000) t(id)) t;
5、測試麵點判斷性能
查詢包含point(1,1)的多邊形,響應時間0.57毫秒。
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from po where typid=1 and po @> polygon('((1,1),(1,1),(1,1))') limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..1.76 rows=1 width=93) (actual time=0.551..0.551 rows=1 loops=1)
Output: id, typid, po
Buffers: shared hit=74
-> Index Scan using idx_po_1 on postgres.po (cost=0.42..673.48 rows=503 width=93) (actual time=0.550..0.550 rows=1 loops=1)
Output: id, typid, po
Index Cond: ((po.typid = 1) AND (po.po @> '((1,1),(1,1),(1,1))'::polygon))
Rows Removed by Index Recheck: 17
Buffers: shared hit=74
Planning time: 0.090 ms
Execution time: 0.572 ms
(10 rows)
6、壓測
vi test.sql
\set x random(-180,180)
\set y random(-90,90)
\set typid random(1,20)
select * from po where typid=:typid and po @> polygon('((:x,:y),(:x,:y),(:x,:y))') limit 1;
pgbench -M simple -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 100
transaction type: ./test.sql
scaling factor: 1
query mode: simple
number of clients: 64
number of threads: 64
duration: 100 s
number of transactions actually processed: 29150531
latency average = 0.220 ms
latency stddev = 0.140 ms
tps = 291487.813205 (including connections establishing)
tps = 291528.228634 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.002 \set x random(-180,180)
0.001 \set y random(-90,90)
0.000 \set typid random(1,20)
0.223 select * from po where typid=:typid and po @> polygon('((:x,:y),(:x,:y),(:x,:y))') limit 1;
驚不驚喜、意不意外
TPS:29萬 ,平均響應時間:0.2毫秒
2 PostGIS空間數據庫 麵點搜索 壓測
阿裏雲 RDS PostgreSQL,HybridDB for PostgreSQL 已經內置了PostGIS空間數據庫插件,使用前創建插件即可。
create extension postgis;
1、建表
postgres=# create table po(id int, typid int, po geometry);
CREATE TABLE
2、創建空間索引
postgres=# create extension btree_gist;
postgres=# create index idx_po_1 on po using gist(typid, po);
3、寫入1000萬多邊形測試數據
postgres=# insert into po
select
id, random()*20,
ST_PolygonFromText('POLYGON(('||x1||' '||y1||','||x2||' '||y2||','||x3||' '||y3||','||x1||' '||y1||'))')
from
(
select id, 180-random()*180 x1, 180-random()*180 x2, 180-random()*180 x3, 90-random()*90 y1, 90-random()*90 y2, 90-random()*90 y3 from generate_series(1,10000000) t(id)
) t;
4、測試麵點判斷性能
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from po where typid=1 and st_within(ST_PointFromText('POINT(1 1)'), po) limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..4.21 rows=1 width=40) (actual time=0.365..0.366 rows=1 loops=1)
Output: id, typid, po
Buffers: shared hit=14
-> Index Scan using idx_po_1 on public.po (cost=0.42..64.92 rows=17 width=40) (actual time=0.364..0.364 rows=1 loops=1)
Output: id, typid, po
Index Cond: ((po.typid = 1) AND (po.po ~ '0101000000000000000000F03F000000000000F03F'::geometry))
Filter: _st_contains(po.po, '0101000000000000000000F03F000000000000F03F'::geometry)
Rows Removed by Filter: 1
Buffers: shared hit=14
Planning time: 0.201 ms
Execution time: 0.389 ms
(11 rows)
postgres=# select id,typid,st_astext(po) from po where typid=1 and st_within(ST_PointFromText('POINT(1 1)'), po) limit 5;
id | typid | st_astext
---------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------
9781228 | 1 | POLYGON((0.295946141704917 0.155529817566276,16.4715472329408 56.1022255802527,172.374844718724 15.4784881789237,0.295946141704917 0.155529817566276))
704428 | 1 | POLYGON((173.849076312035 77.8871315997094,167.085936572403 23.9897218951955,0.514283403754234 0.844541620463133,173.849076312035 77.8871315997094))
5881120 | 1 | POLYGON((104.326644698158 44.4173073163256,3.76680867746472 76.8664212757722,0.798425730317831 0.138536808080971,104.326644698158 44.4173073163256))
1940693 | 1 | POLYGON((0.774057107046247 0.253543308936059,126.49553722702 22.7823389600962,8.62134614959359 56.176855028607,0.774057107046247 0.253543308936059))
3026739 | 1 | POLYGON((0.266327261924744 0.406031627207994,101.713274326175 38.6256391229108,2.88589236326516 15.3229149011895,0.266327261924744 0.406031627207994))
(5 rows)
5、壓測
vi test.sql
\setrandom x -180 180
\setrandom y -90 90
\setrandom typid 1 20
select * from po where typid=:typid and st_within(ST_PointFromText('POINT(:x :y)'), po) limit 1;
pgbench -M simple -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 120
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 64
number of threads: 64
duration: 120 s
number of transactions actually processed: 23779817
latency average: 0.321 ms
latency stddev: 0.255 ms
tps = 198145.452614 (including connections establishing)
tps = 198160.891580 (excluding connections establishing)
statement latencies in milliseconds:
0.002615 \setrandom x -180 180
0.000802 \setrandom y -90 90
0.000649 \setrandom typid 1 20
0.316816 select * from po where typid=:typid and st_within(ST_PointFromText('POINT(:x :y)'), po) limit 1;
驚不驚喜、意不意外
TPS:19.8萬 ,平均響應時間:0.32毫秒
四、技術點
1、空間排他約束
這個約束可以用於強製記錄中的多邊形不相交。例如地圖這類嚴謹數據,絕對不可能出現兩個多邊形相交的,否則就有領土紛爭了。
PostgreSQL就是這麼嚴謹,意不意外。
2、分區表
本例中不同的快遞公司,對應不同的圖層,每個快遞公司根據網點、快遞員負責的片區(多邊形)劃分為多個多邊形。
使用LIST分區,每個分區對應一家快遞公司。
3、空間索引
GiST空間索引,支持KNN、包含、相交、上下左右等空間搜索。
效率極高。
4、空間分區索引
《分區索引的應用和實踐 - 阿裏雲RDS PostgreSQL最佳實踐》
5、麵麵、點判斷
麵麵判斷或麵點判斷是本例的主要需求,用戶在寄包裹時,根據用戶位置在數據庫的一千萬多邊形中找出覆蓋這個點的多邊形。
五、雲端產品
六、類似場景、案例
《PostgreSQL 物流軌跡係統數據庫需求分析與設計 - 包裹俠實時跟蹤與召回》
七、小結
菜鳥末端軌跡項目中涉及的一個關鍵需求,麵麵判斷。
在數據庫中存儲了一些多邊形記錄,約幾百萬到千萬條記錄,例如一個小區,在地圖上是一個多邊形。
不同的快遞公司,會有各自不同的多邊形劃分方法(網點負責的片區(多邊形),某個快遞員負責的片區(多邊形))。
用戶在寄件時,根據用戶的位置,查找對應快遞公司負責這個片區的網點、或者負責該片區的快遞員。
使用阿裏雲RDS PostgreSQL,用戶存放約1千萬的多邊形數據,單庫實現了每秒29萬的處理請求,單次請求平均響應時間約0.2毫秒。
驚不驚喜、意不意外。
八、參考
https://postgis.net/docs/manual-2.3/ST_Within.html
《分區索引的應用和實踐 - 阿裏雲RDS PostgreSQL最佳實踐》
最後更新:2017-08-13 22:37:53