267 技術社區[雲棲]

分區索引的應用和實踐 - 阿裏雲RDS PostgreSQL最佳實踐

標簽

PostgreSQL , partial index , partition index

背景

當表很大時，大家可能會想到分區表的概念，例如用戶表，按用戶ID哈希或者範圍分區，拆成很多表。

又比如行為數據表，可以按時間分區，拆成很多表。

拆表的好處：

1、可以將表放到不同的表空間，表空間和塊設備掛鉤，例如曆史數據訪問量低，數據量大，可以放到機械盤所在的表空間。而活躍數據則可以放到SSD對應的表空間。

2、拆表後，方便維護，例如刪除曆史數據，直接DROP TABLE就可以了，不會產生REDO。

索引實際上也有分區的概念，例如按USER ID HASH分區，按時間分區等。

分區索引的好處與分區表的好處類似。同時還有其他好處：

1、不需要被檢索的部分數據，可以不對它建立索引。

例如一張用戶表，我們隻檢索已激活的用戶，對於未激活的用戶，我們不對它進行檢索，那麼可以隻對已激活用戶建立索引。

2、不同構造的數據，可以使用不同的索引接口。

例如某張表裏麵數據出現了傾斜，某些VALUE占比很高，而某些VALUE占比則很低。我們可以對占比很高的VALUE使用bitmap或者gin的索引方法，而對於出現頻率低的使用btree的索引方法。

那麼我們接下來看看PostgreSQL分區索引是如何實現的？

全局索引

首先是全局索引，就是我們平常建立的索引。

create table test(id int, crt_time timestamp, info text);  
  
create index idx_test_id on test(id);

一級分區索引

create table test(id int, crt_time timestamp, info text);  
  
分區索引如下  
  
create index idx_test_id_1 on test(id) where crt_time between '2017-01-01' and '2017-02-01';  
create index idx_test_id_2 on test(id) where crt_time between '2017-02-01' and '2017-03-01';  
...  
create index idx_test_id_12 on test(id) where crt_time between '2017-12-01' and '2018-01-01';

多級分區索引

create table test(id int, crt_time timestamp, province_code int, info text);  
  
分區索引如下  
  
create index idx_test_id_1_1 on test(id) where crt_time between '2017-01-01' and '2017-02-01' and province_code=1;  
create index idx_test_id_1_2 on test(id) where crt_time between '2017-02-01' and '2017-03-01' and province_code=1;  
...  
create index idx_test_id_1_12 on test(id) where crt_time between '2017-12-01' and '2018-01-01' and province_code=1;  
  
....  
  
create index idx_test_id_2_1 on test(id) where crt_time between '2017-01-01' and '2017-02-01' and province_code=2;  
create index idx_test_id_2_2 on test(id) where crt_time between '2017-02-01' and '2017-03-01' and province_code=2;  
...  
create index idx_test_id_2_12 on test(id) where crt_time between '2017-12-01' and '2018-01-01' and province_code=2;

數據傾斜分區例子

create table test(uid int, crt_time timestamp, province_code int, info text);  
  
create index idx_test_1 on test using gin(uid) where uid<1000;     -- 該號段包含大量重複值（高頻值），使用gin索引加速  
create index idx_test_1 on test using btree(uid) where uid>=1000;  -- 該號段為低頻值，使用btree索引加速

小結

1、在搜索數據時，用戶帶上索引分區條件，索引字段。使用對應的操作符，即可實現分區索引的檢索。

2、分區索引通常用在多個條件的搜索中，其中分區條件作為其中的一種搜索條件。當然它也能用在對單個列的搜索中。

3、PostgreSQL除了支持分區索引（partial index），還支持表達式索引、函數索引。

歡迎使用阿裏雲RDS PostgreSQL。

最後更新：2017-07-23 21:32:37

分區索引的應用和實踐 - 阿裏雲RDS PostgreSQL最佳實踐

標簽

背景

全局索引

一級分區索引

多級分區索引

數據傾斜分區例子

小結

上一篇：車聯網案例，軌跡清洗 - 阿裏雲RDS PostgreSQL最佳實踐 - 窗口查詢

下一篇：我的考駕照之路

相關內容

熱門內容

最新內容

分區索引的應用和實踐 - 阿裏雲RDS PostgreSQL最佳實踐

標簽

背景

全局索引

一級分區索引

多級分區索引

數據傾斜分區例子

小結

上一篇： 車聯網案例，軌跡清洗 - 阿裏雲RDS PostgreSQL最佳實踐 - 窗口查詢

下一篇： 我的考駕照之路

相關內容

熱門內容

最新內容

上一篇：車聯網案例，軌跡清洗 - 阿裏雲RDS PostgreSQL最佳實踐 - 窗口查詢

下一篇：我的考駕照之路