Deepgreen DB 是什麼？

04a7383980f4887513cdd0efb610bdaa3671850c

Deepgreen DB 全稱 Vitesse Deepgreen DB，它是一個可擴展的大規模並行（通常稱為MPP）數據倉庫解決方案，起源於開源數據倉庫項目Greenplum DB（通常稱為GP或GPDB）。所以已經熟悉了GP的朋友，可以無縫切換到Deepgreen。

它幾乎擁有GP的所有功能，在保有GP所有優勢的基礎上，Deepgreen對原查詢處理引擎進行了優化，新一代查詢處理引擎擴展了：

優越的連接和聚合算法
新的溢出處理子係統
基於JIT的查詢優化、矢量掃描和數據路徑優化

下麵簡單介紹一下Deepgreen的主要特性（主要與Greenplum對比）：

1. 100% GPDB

Deepgreen與Greenplum幾乎100%一致，這裏說幾乎，是因為Deepgreen也剔除了一些Greenplum上的雞肋功能，例如MapReduce支持，可以說保有的都是精華。從SQL語法、存儲過程語法，到數據存儲格式，再到像gpstart/gpfdist等組件，Deepgreen為想要從Greenplum遷移過來的用戶將遷移影響降到最低。尤其是在下麵這些方麵：

除了以quicklz方式壓縮的數據需要修改外，其他數據無需重新裝載
DML和DDL語句沒有任何改變
UDF（用戶定義函數）語法沒有任何改變
存儲過程語法沒有任何改變
JDBC／ODBC等連接和授權協議沒有任何改變
運行腳本沒有任何改變（例如備份腳本）

那麼Deepgreen和Greenplum的不同之處在哪呢？總結成一個詞就是：快！快！快！（重要的事情說三遍）。因為大部分的OLAP工作都與CPU的性能有關，所以針對CPU優化後的Deepgreen在性能測試中，可以達到比原Greenplum塊3～5倍的性能。

2.更快的Decimal類型

Deepgreen提供了兩個更精確的Decimal類型：Decimal64和Decimal128，它們比Greenplum原有的Decimal類型（Numeric）更有效。因為它們更精確，相比於fload／double類型，更適合用在銀行等對數據準確性要求高的業務場景。

安裝：

這兩個數據類型需要在數據庫初始化以後，通過命令加載到需要的數據庫中：
dgadmin@flash:~$ source deepgreendb/greenplum_path.sh
dgadmin@flash:~$ cd $GPHOME/share/postgresql/contrib/
dgadmin@flash:~/deepgreendb/share/postgresql/contrib$ psql postgres -f pg_decimal.sql

測試一把：

使用語句：select avg(x), sum(2*x) from table
數據量：100萬
dgadmin@flash:~$ psql -d postgres
psql (8.2.15)
Type "help" for help.

postgres=# drop table if exists tt;
NOTICE:  table "tt" does not exist, skipping
DROP TABLE
postgres=# create table tt(
postgres(# ii bigint,
postgres(#  f64 double precision,
postgres(# d64 decimal64,
postgres(# d128 decimal128,
postgres(# n numeric(15, 3))
postgres-# distributed randomly;
CREATE TABLE
postgres=# insert into tt
postgres-# select i,
postgres-#     i + 0.123,
postgres-#     (i + 0.123)::decimal64,
postgres-#     (i + 0.123)::decimal128,
postgres-#     i + 0.123
postgres-# from generate_series(1, 1000000) i;
INSERT 0 1000000
postgres=# \timing on
Timing is on.
postgres=# select count(*) from tt;
  count
---------
 1000000
(1 row)

Time: 161.500 ms
postgres=# set vitesse.enable=1;
SET
Time: 1.695 ms
postgres=# select avg(f64),sum(2*f64) from tt;
       avg        |       sum
------------------+------------------
 500000.622996815 | 1000001245993.63
(1 row)

Time: 45.368 ms
postgres=# select avg(d64),sum(2*d64) from tt;
    avg     |        sum
------------+-------------------
 500000.623 | 1000001246000.000
(1 row)

Time: 135.693 ms
postgres=# select avg(d128),sum(2*d128) from tt;
    avg     |        sum
------------+-------------------
 500000.623 | 1000001246000.000
(1 row)

Time: 148.286 ms
postgres=# set vitesse.enable=1;
SET
Time: 11.691 ms
postgres=# select avg(n),sum(2*n) from tt;
         avg         |        sum
---------------------+-------------------
 500000.623000000000 | 1000001246000.000
(1 row)

Time: 154.189 ms
postgres=# set vitesse.enable=0;
SET
Time: 1.426 ms
postgres=# select avg(n),sum(2*n) from tt;
         avg         |        sum
---------------------+-------------------
 500000.623000000000 | 1000001246000.000
(1 row)

Time: 296.291 ms

結果列表：

45ms - 64位float
136ms - decimal64
148ms - decimal128
154ms - deepgreen numeric
296ms - greenplum numeric

通過上麵的測試，decimal64（136ms）類型比deepgreen numeric（154ms）類型快，比greenplum numeric快兩倍，生產環境中快5倍以上。

3.支持JSON

Deepgreen支持JSON類型，但是並不完全支持。不支持的函數有：json_each,json_each_text,json_extract_path,json_extract_path_text, json_object_keys, json_populate_record, json_populate_recordset, json_array_elements, and json_agg.

安裝：

執行下麵命令擴展json支持：

dgadmin@flash:~$ psql postgres -f $GPHOME/share/postgresql/contrib/json.sql

測試一把：

dgadmin@flash:~$ psql postgres
psql (8.2.15)
Type "help" for help.

postgres=# select '[1,2,3]'::json->2;
 ?column?
----------
 3
(1 row)

postgres=# create temp table mytab(i int, j json) distributed by (i);
CREATE TABLE
postgres=# insert into mytab values (1, null), (2, '[2,3,4]'), (3, '[3000,4000,5000]');
INSERT 0 3
postgres=#
postgres=# insert into mytab values (1, null), (2, '[2,3,4]'), (3, '[3000,4000,5000]');
INSERT 0 3
postgres=# select i, j->2 from mytab;
 i | ?column?
---+----------
 2 | 4
 2 | 4
 1 |
 3 | 5000
 1 |
 3 | 5000
(6 rows)

4.高效壓縮算法

Deepgreen延續了Greenplum的zlib壓縮算法用於存儲壓縮。除此之外，Deepgreen還提供兩種對數據庫負載更優的壓縮格式：zstd和lz4.

如果客戶在列存或者隻追加堆表存儲時要求更優的壓縮比，請選擇zstd壓縮算法。相比於zlib，zstd有更好的壓縮比，並且能更有效利用CPU。

如果客戶有大量讀取需求，那麼可以選擇lz4壓縮算法，因為它有著驚人的解壓速度。雖然在壓縮比上lz4並沒有zlib和zstd那麼出眾，但是為了滿足高讀取負載作出一些犧牲還是值得的。

有關於這兩種壓縮算法的具體內容，詳見其主頁：

zstd主頁 https://facebook.github.io/zstd/
lz4主頁 https://lz4.github.io/lz4/

測試一把：

這裏隻針對不壓縮／zlib／zstd／lz4四種，進行簡單的測試，我的機器性能並不高，所有結果僅供參考：

postgres=# create temp table ttnone (
postgres(#     i int,
postgres(#     t text,
postgres(#     default column encoding (compresstype=none))
postgres-# with (appendonly=true, orientation=column)
postgres-# distributed by (i);
CREATE TABLE
postgres=# \timing on
Timing is on.
postgres=# create temp table ttzlib(
postgres(#     i int,
postgres(#     t text,
postgres(#     default column encoding (compresstype=zlib, compresslevel=1))
postgres-# with (appendonly=true, orientation=column)
postgres-# distributed by (i);
CREATE TABLE
Time: 762.596 ms
postgres=# create temp table ttzstd (
postgres(#     i int,
postgres(#     t text,
postgres(#     default column encoding (compresstype=zstd, compresslevel=1))
postgres-# with (appendonly=true, orientation=column)
postgres-# distributed by (i);
CREATE TABLE
Time: 827.033 ms
postgres=# create temp table ttlz4 (
postgres(#     i int,
postgres(#     t text,
postgres(#     default column encoding (compresstype=lz4))
postgres-# with (appendonly=true, orientation=column)
postgres-# distributed by (i);
CREATE TABLE
Time: 845.728 ms
postgres=# insert into ttnone select i, 'user '||i from generate_series(1, 100000000) i;
INSERT 0 100000000
Time: 104641.369 ms
postgres=# insert into ttzlib select i, 'user '||i from generate_series(1, 100000000) i;
INSERT 0 100000000
Time: 99557.505 ms
postgres=# insert into ttzstd select i, 'user '||i from generate_series(1, 100000000) i;
INSERT 0 100000000
Time: 98800.567 ms
postgres=# insert into ttlz4 select i, 'user '||i from generate_series(1, 100000000) i;
INSERT 0 100000000
Time: 96886.107 ms
postgres=# select pg_size_pretty(pg_relation_size('ttnone'));
 pg_size_pretty
----------------
 1708 MB
(1 row)

Time: 83.411 ms
postgres=# select pg_size_pretty(pg_relation_size('ttzlib'));
 pg_size_pretty
----------------
 374 MB
(1 row)

Time: 4.641 ms
postgres=# select pg_size_pretty(pg_relation_size('ttzstd'));
 pg_size_pretty
----------------
 325 MB
(1 row)

Time: 5.015 ms
postgres=# select pg_size_pretty(pg_relation_size('ttlz4'));
 pg_size_pretty
----------------
 785 MB
(1 row)

Time: 4.483 ms
postgres=# select sum(length(t)) from ttnone;
    sum
------------
 1288888898
(1 row)

Time: 4414.965 ms
postgres=# select sum(length(t)) from ttzlib;
    sum
------------
 1288888898
(1 row)

Time: 4500.671 ms
postgres=# select sum(length(t)) from ttzstd;
    sum
------------
 1288888898
(1 row)

Time: 3849.648 ms
postgres=# select sum(length(t)) from ttlz4;
    sum
------------
 1288888898
(1 row)

Time: 3160.477 ms

5.數據采樣

從Deepgreen 16.16版本開始，內建支持通過SQL進行數據真實采樣，您可以通過定義行數或者定義采樣比兩種方式進行采樣：

SELECT {select-clauses} LIMIT SAMPLE {n} ROWS;
SELECT {select-clauses} LIMIT SAMPLE {n} PERCENT;

測試一把：

postgres=# select count(*) from ttlz4;
   count
-----------
 100000000
(1 row)

Time: 903.661 ms
postgres=# select * from ttlz4 limit sample 0.00001 percent;
    i     |       t
----------+---------------
  3442917 | user 3442917
  9182620 | user 9182620
  9665879 | user 9665879
 13791056 | user 13791056
 15669131 | user 15669131
 16234351 | user 16234351
 19592531 | user 19592531
 39097955 | user 39097955
 48822058 | user 48822058
 83021724 | user 83021724
  1342299 | user 1342299
 20309120 | user 20309120
 34448511 | user 34448511
 38060122 | user 38060122
 69084858 | user 69084858
 73307236 | user 73307236
 95421406 | user 95421406
(17 rows)

Time: 4208.847 ms
postgres=# select * from ttlz4 limit sample 10 rows;
    i     |       t
----------+---------------
 78259144 | user 78259144
 85551752 | user 85551752
 90848887 | user 90848887
 53923527 | user 53923527
 46524603 | user 46524603
 31635115 | user 31635115
 19030885 | user 19030885
 97877732 | user 97877732
 33238448 | user 33238448
 20916240 | user 20916240
(10 rows)

Time: 3578.031 ms

6.TPC-H性能

Deepgreen與Greenplum的性能對比，請參考我另外兩個帖子：

《Deepgreen與Greenplum TPC-H性能測試對比（使用德哥腳本）》

《Deepgreen與Greenplum TPC-H性能測試對比（使用VitesseData腳本）》

另外Deepgreen自身搭載的高性能組件Xdrive，在後期會另行分享～

End~

最後更新：2017-06-16 02:06:34

Deepgreen DB 是什麼？

上一篇： Aliware-MQ消息隊列技術架構與最佳實踐

下一篇：中國智能家居的蝴蝶效應

相關內容

熱門內容

最新內容

Deepgreen DB 是什麼？

上一篇： Aliware-MQ消息隊列技術架構與最佳實踐

下一篇： 中國智能家居的蝴蝶效應

相關內容

熱門內容

最新內容

下一篇：中國智能家居的蝴蝶效應