軌跡係統 需求分析與DB設計
標簽
PostgreSQL , PostGIS , 快遞 , 包裹俠 , 地理位置 , 距離排序 , KNN
背景
物流行業對地理位置信息數據的處理有非常強烈的需求,例如
1. 實時跟蹤快遞員、貨車的位置信息。對數據庫的寫入性能要求較高。
2. 對於當日件,需要按發貨位置,實時召回附近的快遞員。
3. 實時的位置信息非常龐大,為了數據分析的需求,需要保留數據,所以需要廉價的存儲。例如對象存儲。同時還需要和數據庫或分析型的數據庫產品實現聯動。
阿裏雲的 PostgreSQL + HybridDB for PostgreSQL + OSS 對象存儲可以很好的滿足這個需求,詳細的方案如下。
業務描述
以物流配送場景為例,介紹阿裏雲的解決方案。
數據量
快遞員:百萬級。
快遞員的軌跡定位數據間隔:5秒。
一個快遞員每天工作時間 7 ~ 19點 (12個小時)。
一個快遞員一天產生8640條記錄。
所有的快遞員,全網一天產生86.4億條記錄。
業務需求
1. 繪製快遞員軌跡(實時)
2. 召回快遞員(實時)
當天件的需求。
表結構設計
一、軌跡表設計
主表
按快遞員ID哈希,128張表。
(如果不分區,單表存儲86.4億記錄,問題也不大,隻是導出到OSS對象存儲的過程可能比較長,如果OSS出現故障,再次導出又要很久)
另一方麵的好處是便於擴容。
create table cainiao (
uid int, -- 快遞員ID
pos point, -- 快遞員位置
crt_date date, -- 日期
crt_time time(0) -- 時間
);
do language plpgsql $$
declare
sql text;
begin
for i in 0..127
loop
sql := format( 'create table %I (like cainiao)' , 'cainiao_'||i );
execute sql;
end loop;
end;
$$;
子表
每天1張子表,輪詢使用,覆蓋到周(便於維護, 導出到OSS後直接truncate)。一共7張子表。
do language plpgsql $$
declare
sql text;
begin
for i in 0..127
loop
for x in 0..6
loop
sql := format( 'create table %I (like cainiao)' , 'cainiao_'||i||'_'||x );
execute sql;
end loop;
end loop;
end;
$$;
曆史軌跡存儲
OSS對象存儲。
阿裏雲PostgreSQL有oss_ext插件,可以將數據寫入oss對象存儲。同時也支持從oss對象存儲讀取數據(外部表的方式),對用戶透明。
詳見
https://help.aliyun.com/document_detail/44461.html
10.0分區表的例子(可選)
PostgreSQL 10.0 內置了分區表,所以以上分區,可以直接讀寫主表。
《PostgreSQL 10.0 preview 功能增強 - 內置分區表》
9.5以及以上版本,建議使用pg_pathman插件,一樣可以達到分區表的目的。
《PostgreSQL 9.5+ 高效分區表實現 - pg_pathman》
分區表例子
create table cainiao (
uid int,
pos point,
crt_date date,
crt_time time(0)
)
PARTITION BY RANGE(crt_time);
do language plpgsql $$
declare
sql text;
begin
for i in 0..23
loop
if i<>23 then
sql := format( 'create table %I PARTITION OF cainiao FOR VALUES FROM (%L) TO (%L)' , 'cainiao_'||lpad(i::text, 2, '0') , (lpad(i::text, 2, '0')||':00:00') , (lpad((i+1)::text, 2, '0')||':00:00') );
else
sql := format( 'create table %I PARTITION OF cainiao FOR VALUES FROM (%L) TO (unbounded)' , 'cainiao_'||lpad(i::text, 2, '0') , (lpad(i::text, 2, '0')||':00:00') );
end if;
execute sql;
end loop;
end;
$$;
postgres=# \d+ cainiao
Table "public.cainiao"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+---------------------------+-----------+----------+---------+---------+--------------+-------------
uid | integer | | | | plain | |
pos | point | | | | plain | |
crt_date | date | | | | plain | |
crt_time | time(0) without time zone | | not null | | plain | |
Partition key: RANGE (crt_time)
Partitions: cainiao_00 FOR VALUES FROM ('00:00:00') TO ('01:00:00'),
cainiao_01 FOR VALUES FROM ('01:00:00') TO ('02:00:00'),
cainiao_02 FOR VALUES FROM ('02:00:00') TO ('03:00:00'),
cainiao_03 FOR VALUES FROM ('03:00:00') TO ('04:00:00'),
cainiao_04 FOR VALUES FROM ('04:00:00') TO ('05:00:00'),
cainiao_05 FOR VALUES FROM ('05:00:00') TO ('06:00:00'),
cainiao_06 FOR VALUES FROM ('06:00:00') TO ('07:00:00'),
cainiao_07 FOR VALUES FROM ('07:00:00') TO ('08:00:00'),
cainiao_08 FOR VALUES FROM ('08:00:00') TO ('09:00:00'),
cainiao_09 FOR VALUES FROM ('09:00:00') TO ('10:00:00'),
cainiao_10 FOR VALUES FROM ('10:00:00') TO ('11:00:00'),
cainiao_11 FOR VALUES FROM ('11:00:00') TO ('12:00:00'),
cainiao_12 FOR VALUES FROM ('12:00:00') TO ('13:00:00'),
cainiao_13 FOR VALUES FROM ('13:00:00') TO ('14:00:00'),
cainiao_14 FOR VALUES FROM ('14:00:00') TO ('15:00:00'),
cainiao_15 FOR VALUES FROM ('15:00:00') TO ('16:00:00'),
cainiao_16 FOR VALUES FROM ('16:00:00') TO ('17:00:00'),
cainiao_17 FOR VALUES FROM ('17:00:00') TO ('18:00:00'),
cainiao_18 FOR VALUES FROM ('18:00:00') TO ('19:00:00'),
cainiao_19 FOR VALUES FROM ('19:00:00') TO ('20:00:00'),
cainiao_20 FOR VALUES FROM ('20:00:00') TO ('21:00:00'),
cainiao_21 FOR VALUES FROM ('21:00:00') TO ('22:00:00'),
cainiao_22 FOR VALUES FROM ('22:00:00') TO ('23:00:00'),
cainiao_23 FOR VALUES FROM ('23:00:00') TO (UNBOUNDED)
二、實時位置表
實時位置表,記錄快遞員的實時位置(最後一條記錄的位置)。
由於實時位置表的數據不需要持久化,所以可以使用unlogged table。
create unlogged table cainiao_trace_realtime (
uid int primary key, -- 快遞員ID
pos point, -- 快遞員位置
crt_date date, -- 日期
crt_time time(0) -- 時間
);
位置字段,創建GIST空間索引。
create index idx_cainiao_trace_realtime_pos on cainiao_trace_realtime using gist (pos);
實時位置更新邏輯設計
為了實時更新快遞員的位置,可以設置一個觸發器,在快遞員上傳實時位置時,自動更新最後的位置。
create or replace function ins_cainiao() returns trigger as $$
declare
begin
insert into cainiao_trace_realtime(uid,pos,crt_date,crt_time)
values (NEW.uid, NEW.pos, NEW.crt_date, NEW.crt_time)
on conflict (uid) do update set pos=excluded.pos,crt_date=excluded.crt_date,crt_time=excluded.crt_time;
return null;
end;
$$ language plpgsql strict;
對基表添加觸發器
do language plpgsql $$
declare
sql text;
begin
for i in 0..127
loop
for x in 0..6
loop
sql := format( 'create trigger tg after insert on %I for each row execute procedure ins_cainiao()', 'cainiao_'||i||'_'||x );
execute sql;
end loop;
end loop;
end;
$$;
觸發器示例如下
postgres=# \d+ cainiao_0_0
Table "public.cainiao_0_0"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+---------------------------+-----------+----------+---------+---------+--------------+-------------
uid | integer | | | | plain | |
pos | point | | | | plain | |
crt_date | date | | | | plain | |
crt_time | time(0) without time zone | | | | plain | |
Triggers:
tg AFTER INSERT ON cainiao_0_0 FOR EACH ROW EXECUTE PROCEDURE ins_cainiao()
性能測試
說明
1. 本文假設應用程序會根據 快遞員UID ,時間字段 拚接出基表的表名。
否則就需要使用PostgreSQL的分區表功能(分區表的性能比直接操作基表差一些)。
2. 本文使用point代替經緯度,因為point比較好造數據,方便測試。
實際上point和經緯度都是地理位置類型,可以實現的場景類似。性能指標也可以用於參考。
1 實時軌跡測試
模擬快遞員實時的上傳軌跡,實時的更新快遞員的最新位置。
pgbench的測試腳本如下
vi test.sql
\set uid random(1,1000000)
\set x random(-500000,500000)
\set y random(-500000,500000)
insert into cainiao_0_2 values (:uid, point(:x,:y), now()::date, now()::time);
開始測試,持續300秒。
numactl --physcpubind=0-31 pgbench -M prepared -n -r -P 1 -f ./test.sql -c 32 -j 32 -T 300
測試結果
每秒寫入17.4萬,單次請求延遲0.18毫秒。
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 32
number of threads: 32
duration: 300 s
number of transactions actually processed: 52270642
latency average = 0.184 ms
latency stddev = 2.732 ms
tps = 174234.709260 (including connections establishing)
tps = 174236.529998 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.001 \set uid random(1,1000000)
0.000 \set x random(-500000,500000)
0.000 \set y random(-500000,500000)
0.182 insert into cainiao_0_2 values (:uid, point(:x,:y), now()::date, now()::time);
2 召回快遞員測試
比如當日件達到一定數量、或者到達一定時間點時,需要召回附近的快遞員取件。
或者當用戶寄當日件時,需要召回附近的快遞員取件。
壓測用例
隨機選擇一個點,召回半徑為20000範圍內,距離最近的100名快遞員。
SQL樣例
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from cainiao_trace_realtime where circle '((0,0),20000)' @> pos order by pos <-> point '(0,0)' limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.41..112.45 rows=100 width=40) (actual time=0.096..0.342 rows=100 loops=1)
Output: uid, pos, crt_date, crt_time, ((pos <-> '(0,0)'::point))
Buffers: shared hit=126
-> Index Scan using idx_cainiao_trace_realtime_pos on public.cainiao_trace_realtime (cost=0.41..1167.86 rows=1042 width=40) (actual time=0.094..0.330 rows=100 loops=1)
Output: uid, pos, crt_date, crt_time, (pos <-> '(0,0)'::point)
Index Cond: ('<(0,0),20000>'::circle @> cainiao_trace_realtime.pos)
Order By: (cainiao_trace_realtime.pos <-> '(0,0)'::point)
Buffers: shared hit=126
Planning time: 0.098 ms
Execution time: 0.377 ms
(10 rows)
pgbench的測試腳本如下
vi test1.sql
\set x random(-500000,500000)
\set y random(-500000,500000)
select * from cainiao_trace_realtime where circle(point(:x,:y),20000) @> pos order by pos <-> point(:x,:y) limit 100;
開始測試,持續300秒。
numactl --physcpubind=32-63 pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 32 -j 32 -T 300
測試結果
每秒處理召回請求 6萬,單次請求延遲0.53毫秒。
transaction type: ./test1.sql
scaling factor: 1
query mode: prepared
number of clients: 32
number of threads: 32
duration: 300 s
number of transactions actually processed: 18087765
latency average = 0.531 ms
latency stddev = 0.103 ms
tps = 60292.169523 (including connections establishing)
tps = 60292.786291 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.001 \set x random(-500000,500000)
0.000 \set y random(-500000,500000)
0.529 select * from cainiao_trace_realtime where circle(point(:x,:y),20000) @> pos order by pos <-> point(:x,:y) limit 100;
備注,如果隻召回一名快遞員,可以達到28萬 tps.
3 混合測試
同時壓測快遞員軌跡插入、隨機召回快遞員。
壓測結果
插入TPS: 12.5萬,響應時間0.25毫秒
查詢TPS: 2.17萬,響應時間1.47毫秒
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 32
number of threads: 32
duration: 100 s
number of transactions actually processed: 12508112
latency average = 0.256 ms
latency stddev = 1.266 ms
tps = 125072.868788 (including connections establishing)
tps = 125080.518685 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.002 \set uid random(1,1000000)
0.001 \set x random(-500000,500000)
0.000 \set y random(-500000,500000)
0.253 insert into cainiao_16 values (:uid, point(:x,:y), now()::date, now()::time);
transaction type: ./test1.sql
scaling factor: 1
query mode: prepared
number of clients: 32
number of threads: 32
duration: 100 s
number of transactions actually processed: 2174422
latency average = 1.472 ms
latency stddev = 0.455 ms
tps = 21743.641754 (including connections establishing)
tps = 21744.366018 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.002 \set x random(-500000,500000)
0.000 \set y random(-500000,500000)
1.469 select * from cainiao_trace_realtime where circle(point(:x,:y),20000) @> pos order by pos <-> point(:x,:y) limit 100;
曆史軌跡進入OSS對象存儲
前麵對實時軌跡數據使用一周的分表,目的就是有時間可以將其寫入到OSS,方便維護。
每天可以將6天前的數據,寫入OSS對象存儲。
OSS對象存儲。
阿裏雲PostgreSQL有oss_ext插件,可以將數據寫入oss對象存儲。同時也支持從oss對象存儲讀取數據(外部表的方式),對用戶透明。
詳見
https://help.aliyun.com/document_detail/44461.html
其他需求,軌跡合並設計
單個快遞員,一天產生的軌跡是8640條。
PostgreSQL支持JSON、HSTORE(kv)、數組、複合數組 類型。每天將單個快遞員的軌跡聚合為一條記錄,可以大幅度提升按快遞員查詢軌跡的速度。
同樣的場景可以參考:
《performance tuning about multi-rows query aggregated to single-row query》
聚合例子
create type trace as (pos point, crt_time time);
create table cainiao_trace_agg (crt_date date, uid int, trace_arr trace[], primary key(crt_date,uid));
insert into cainiao_trace_agg (crt_date , uid , trace_arr )
select crt_date, uid, array_agg( (pos,crt_time)::trace ) from cainiao_0_2 group by crt_date, uid;
查詢某個快遞員1天的軌跡,性能提升對比
聚合前(b-tree索引),耗時8毫秒
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from cainiao_0_2 where uid=340054;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx on public.cainiao_0_2 (cost=0.57..193.61 rows=194 width=32) (actual time=0.033..7.711 rows=7904 loops=1)
Output: uid, pos, crt_date, crt_time
Index Cond: (cainiao_0_2.uid = 340054)
Buffers: shared hit=7720
Planning time: 0.090 ms
Execution time: 8.198 ms
(6 rows)
聚合後,耗時0.033毫秒
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from cainiao_trace_agg where crt_date='2017-04-18' and uid=340054;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using cainiao_trace_agg_pkey on public.cainiao_trace_agg (cost=0.42..2.44 rows=1 width=978) (actual time=0.016..0.017 rows=1 loops=1)
Output: crt_date, uid, trace_arr
Index Cond: ((cainiao_trace_agg.crt_date = '2017-04-18'::date) AND (cainiao_trace_agg.uid = 340054))
Buffers: shared hit=4
Planning time: 0.098 ms
Execution time: 0.033 ms
(6 rows)
小結
1. 本文以物流軌跡係統為背景,對兩個常見需求進行數據庫的設計以及模型的壓測:實時跟蹤快遞員軌跡,實時召回附近的快遞員。
2. PostgreSQL 結合 OSS,實現了數據的冷熱分離,曆史軌跡寫入OSS保存,再通過OSS可以共享給HybridDB for PostgreSQL,進行實時的數據挖掘分析。
3. 單機滿足了每秒18萬的軌跡寫入,按最近距離召回快遞員(100名快遞員)可以達到6萬/s的速度,按最近距離召回快遞員(1名快遞員)可以達到28萬/s的速度。
參考
《PostGIS long lat geometry distance search tuning using gist knn function》
《ApsaraDB的左右互搏(PgSQL+HybridDB+OSS) - 解決OLTP+OLAP混合需求》
最後更新:2017-04-18 21:00:40