278 技術社區[雲棲]

塊級(ctid)掃描在IoT(物聯網)極限寫和消費讀並存場景的應用

標簽

PostgreSQL , 塊掃描 , 行號掃描 , ctid , tid scan , IoT , 物聯網 , 極限寫入 , 實時消費 , 實時讀 , 堆表 , heap , 時序

背景

在物聯網有一個非常普遍的數據需求，就是數據的寫入，另一個普遍的需求則是數據的消費（按時序讀取），以及流式計算。

關於流式計算，請參考

《(流式、lambda、觸發器)實時處理大比拚 - 物聯網(IoT)\金融,時序處理最佳實踐》

《流計算風雲再起 - PostgreSQL攜PipelineDB力挺IoT》

《"物聯網"流式處理應用 - 用PostgreSQL實時處理(萬億每天)》

接下來我們談一談極限寫入和消費。

寫入

從數據存儲結構來看，PostgreSQL的HEAP存儲是非常適合高速寫入的，追加式寫入。以下文章中已得到高速寫入的驗證。

《PostgreSQL 如何瀟灑的處理每天上百TB的數據增量》

塊（時序列）索引

BRIN索引，也被稱為塊索引，是針對數據塊元數據建立的索引（例如某個自增長字段，物理存儲和字段的值存在很好的線性相關性，那麼每個塊的數據區間就具有非常強的獨立性），BRIN索引非常小，對寫入性能的影響可以忽略。

BRIN適合物理存儲和字段的值存在很好的線性相關性的字段，例如時序字段。

或者使用cluster或order 重排後，適合對應字段。

消費

消費是指異步的讀取數據，處理數據的過程，例如IoT場景，數據的寫入延遲要求非常低，所以要求寫入吞吐特別大。

而處理方麵，則通過消費機製，進行處理。

那麼如何消費呢？

通常可以根據索引進行消費，比如前麵提到的BRIN索引，對寫入吞吐的影響小，同時支持=，以及範圍的檢索。如果有時序字段的話，BRIN是非常好的選擇。

然而並非所有的數據寫入場景都有時序字段（當然用戶可以添加一個時間字段來解決這個問題）。當沒有時序字段時，如何消費效率最高呢？

塊掃描

塊掃描是很好的選擇，前麵提到了數據存儲是HEAP，追加形式。

PostgreSQL提供了一種tid scan的掃描方法，告訴數據庫你要搜索哪個數據塊的哪條記錄。

select * from tbl where ctid='(100,99)';

這條SQL指查詢100號數據塊的第100條記錄。

這種掃描效率非常之高，可以配合HEAP存儲，在消費(讀取記錄)時使用。

評估塊記錄數

PostgreSQL暫時沒有提供返回整個數據塊的所有記錄的接口，隻能返回某個數據塊的某一條記錄，所以如果我們需要讀取某個數據塊的記錄，需要枚舉該數據塊的所有行。

如何評估一個數據塊有多少條記錄，或者最多有多少條記錄？

PAGE layout

https://www.postgresql.org/docs/10/static/storage-page-layout.html

HeapTupleHeaderData Layout

Field	Type	Length	Description
t_xmin	TransactionId		4 bytes
t_xmax	TransactionId	4 bytes	delete XID stamp
t_cid	CommandId	4 bytes	insert and/or delete CID stamp (overlays with t_xvac)
t_xvac	TransactionId	4 bytes	XID for VACUUM operation moving a row version
t_ctid	ItemPointerData	6 bytes	current TID of this or newer row version
t_infomask2	uint16	2 bytes	number of attributes, plus various flag bits
t_infomask	uint16	2 bytes	various flag bits
t_hoff	uint8	1 byte	offset to user data

Overall Page Layout

Item	Description
PageHeaderData	24 bytes long. Contains general information about the page, including free space pointers.
ItemIdData	Array of (offset,length) pairs pointing to the actual items. 4 bytes per item.
Free space	The unallocated space. New item pointers are allocated from the start of this area, new items from the end.
Items	The actual items themselves.
Special space	Index access method specific data. Different methods store different data. Empty in ordinary tables.

單頁最大記錄數估算

最大記錄數=block_size/(ctid+tuple head)=block_size/(4+27);

postgres=# select current_setting('block_size');  
 current_setting   
-----------------  
 32768  
(1 row)  
  
postgres=# select current_setting('block_size')::int/31;  
 ?column?   
----------  
     1057  
(1 row)

如果需要評估更精確的行數，可以加上字段的固定長度，變長字段的頭（4BYTE）。

例子

生成指定block TID的函數

create or replace function gen_tids(blkid int) returns tid[] as $$  
select array(  
  SELECT ('('||blkid||',' || s.i || ')')::tid  
    FROM generate_series(0,current_setting('block_size')::int/31) AS s(i)  
)  ;  
$$ language sql strict immutable;

讀取某個數據塊的記錄

postgres=# create table test(id int);  
CREATE TABLE  
postgres=# insert into test select generate_series(1,10000);  
INSERT 0 10000  
  
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test where ctid = any  
(  
  array  
  (  
    SELECT ('(0,' || s.i || ')')::tid  
      FROM generate_series(0, current_setting('block_size')::int/31) AS s(i)  
  )  
);  
                                                                QUERY PLAN                                                                  
------------------------------------------------------------------------------------------------------------------------------------------  
 Tid Scan on postgres.test  (cost=25.03..40.12 rows=10 width=4) (actual time=0.592..0.795 rows=909 loops=1)  
   Output: test.id  
   TID Cond: (test.ctid = ANY ($0))  
   Buffers: shared hit=1057  
   InitPlan 1 (returns $0)  
     ->  Function Scan on pg_catalog.generate_series s  (cost=0.01..25.01 rows=1000 width=6) (actual time=0.087..0.429 rows=1058 loops=1)  
           Output: ((('(0,'::text || (s.i)::text) || ')'::text))::tid  
           Function Call: generate_series(0, ((current_setting('block_size'::text))::integer / 31))  
 Planning time: 0.106 ms  
 Execution time: 0.881 ms  
(10 rows)

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test where ctid = any(gen_tids(1));  
  
 Tid Scan on postgres.test  (cost=1.32..1598.90 rows=1058 width=4) (actual time=0.026..0.235 rows=909 loops=1)  
   Output: id  
   TID Cond: (test.ctid = ANY ('{"(1,0)","(1,1)","(1,2)","(1,3)","(1,4)","(1,5)","(1,6)","(1,7)","(1,8)","(1,9)","(1,10)","(1,11)","(1,12)","(1,13)","(1,14)","(1,15)","(1,16)","(1,17)","(1,18)","(1,19)","(1,20)","(1,21)","(1,22)","(1,23)  
","(1,24)","(1,25)"  
....  
   Buffers: shared hit=1057  
 Planning time: 1.084 ms  
 Execution time: 0.294 ms  
(6 rows)

postgres=# select ctid,* from test where ctid = any(gen_tids(11));
  ctid  |  id   
--------+-------
 (11,1) | 10000
(1 row)

postgres=# select ctid,* from test where ctid = any(gen_tids(9));
  ctid   |  id  
---------+------
 (9,1)   | 8182
 (9,2)   | 8183
 (9,3)   | 8184
 (9,4)   | 8185
 (9,5)   | 8186
 (9,6)   | 8187
 ...
 (9,904) | 9085
 (9,905) | 9086
 (9,906) | 9087
 (9,907) | 9088
 (9,908) | 9089
 (9,909) | 9090
(909 rows)

擴展場景

如果數據沒有更新，刪除；那麼CTID還可以作為索引來使用，例如全文檢索（ES），可以在建立索引時使用ctid來指向數據庫中的記錄，而不需要另外再建一個PK，也能大幅度提升寫入性能。

參考

https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/

https://www.postgresql.org/message-id/flat/be64327d326568a3be7fde1891ed34ff.squirrel%40sq.gransy.com

最後更新：2017-06-08 11:31:57

塊級(ctid)掃描在IoT(物聯網)極限寫和消費讀並存場景的應用

標簽

背景

寫入

塊（時序列）索引

消費

塊掃描

評估塊記錄數

PAGE layout

單頁最大記錄數估算

例子

生成指定block TID的函數

讀取某個數據塊的記錄

擴展場景

參考

上一篇：多字段，任意組合條件查詢(0建模) - 毫秒級實時圈人實踐

下一篇： PostgreSQL UDF實現IF NOT EXISTS語法

相關內容

熱門內容

最新內容

塊級(ctid)掃描在IoT(物聯網)極限寫和消費讀並存場景的應用

標簽

背景

寫入

塊（時序列）索引

消費

塊掃描

評估塊記錄數

PAGE layout

單頁最大記錄數估算

例子

生成指定block TID的函數

讀取某個數據塊的記錄

擴展場景

參考

上一篇： 多字段，任意組合條件查詢(0建模) - 毫秒級實時圈人 實踐

下一篇： PostgreSQL UDF實現IF NOT EXISTS語法

相關內容

熱門內容

最新內容

上一篇：多字段，任意組合條件查詢(0建模) - 毫秒級實時圈人實踐