行為、審計日誌 (實時索引/實時搜索) - 最佳實踐
標簽
PostgreSQL , ES , 搜索引擎 , 全文檢索 , 日誌分析 , 倒排索引 , 優化 , 分區 , 分片 , 審計日誌 , 行為日誌
背景
在很多係統中會記錄用戶的行為日誌,行為日誌包括瀏覽行為、社交行為、操作行為等。
典型的應用例如:數據庫的SQL審計、企業內部的堡壘機(行為審計)等。
行為、審計日誌的量與業務量或者操作量有關,為了滿足企業實時查詢的需求,通常需要構建搜索引擎,比如使用ES或者使用PostgreSQL的全文檢索功能來實現。
如果使用PostgreSQL來構建,有幾個優勢,可以滿足多個需求:
1. 明細存儲的需求,除了需要建立索引的字段,明細字段也可以存儲在PostgreSQL中。
2. 索引的需求,即建立日誌行為字段的全文索引。
3. 多維度索引的需求,除了日誌行為字段的索引,還可以建立其他字段的索引,例如時間維度,屬性維度的索引。這些索引可以組合使用,滿足多個維度的搜索需求。
4. 不需要同步到搜索引擎,滿足了實時搜索的需求。
例子
測試機
磁盤,使用空間大、廉價的SATA盤,使用一塊SSD作為BCACHE寫緩存。
12 * N TB
目錄規劃,每塊盤一個目錄
/disk[1-12]
操作係統配置
略
編譯PostgreSQL
wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2
tar -jxvf postgresql-snapshot.tar.bz2
cd postgresql-10beta1
export USE_NAMED_POSIX_SEMAPHORES=1
LIBS=-lpthread CFLAGS="-O3" ./configure --prefix=/home/digoal/pgsql10_8k --with-blocksize=8 --with-wal-segsize=1024
LIBS=-lpthread CFLAGS="-O3" make world -j 128
LIBS=-lpthread CFLAGS="-O3" make install-world
環境變量
vi env_pg10.sh
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=$(($1+1920))
export PGDATA=/disk$1/digoal/pgdata/pg_root$(($1+1920))
export LANG=en_US.utf8
export PGHOME=/home/digoal/pgsql10_8k
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export LD_RUN_PATH=$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=127.0.0.1
export PGUSER=postgres
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
初始化集群
創建12個數據庫集群,對應到每一塊磁盤。可以充分利用磁盤的IO。
vi init.sh
for ((i=1;i<13;i++))
do
. ~/env_pg10.sh $i
initdb -D $PGDATA -E SQL_ASCII --locale=C -U postgres
echo "local all all trust" > $PGDATA/pg_hba.conf
echo "host all all 127.0.0.1/32 trust" >> $PGDATA/pg_hba.conf
echo "host all all ::1/128 trust" >> $PGDATA/pg_hba.conf
echo "host all all 0.0.0.0/0 trust" >> $PGDATA/pg_hba.conf
done
. ./init.sh
配置參數
postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
listen_addresses = '0.0.0.0'
max_connections = 400
unix_socket_directories = '.'
shared_buffers = 32GB
maintenance_work_mem = 2GB
dynamic_shared_memory_type = posix
vacuum_cost_delay = 0
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 10.0
bgwriter_flush_after = 0
max_parallel_workers_per_gather = 0
synchronous_commit = off
full_page_writes = off
wal_buffers = 2047MB
wal_writer_delay = 10ms
wal_writer_flush_after = 0
checkpoint_timeout = 45min
max_wal_size = 96GB
min_wal_size = 32GB
checkpoint_completion_target = 0.9
checkpoint_flush_after = 2MB
max_wal_senders = 0
random_page_cost = 1.3
effective_cache_size = 32GB
log_destination = 'csvlog'
logging_collector = on
log_truncate_on_rotation = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_timezone = 'PRC'
autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_freeze_max_age = 1500000000
autovacuum_multixact_freeze_max_age = 1600000000
vacuum_freeze_table_age = 1400000000
vacuum_multixact_freeze_table_age = 1400000000
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
autovacuum_work_mem = 2GB
autovacuum_max_workers = 12
autovacuum_naptime = 30s
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.2
autovacuum_vacuum_cost_delay = 0
autovacuum_vacuum_cost_limit = 0
zero_damaged_pages = on
cp postgresql.auto.conf /disk1/digoal/pgdata/pg_root1921/
...
...
cp postgresql.auto.conf /disk12/digoal/pgdata/pg_root1932/
啟動集群
將數據庫實例綁定到不同的CPU核
vi start.sh
for ((i=1;i<13;i++))
do
. /home/digoal/env_pg10.sh $i
if [ $i -eq 12 ]; then
numactl --physcpubind=56-63 pg_ctl start -o "-c port=$PGPORT"
else
numactl --physcpubind="$((($i-1)*(5)+(1)))"-"$(($i*5))" pg_ctl start -o "-c port=$PGPORT"
fi
done
. ./start.sh
停止集群
vi stop.sh
for ((i=1;i<13;i++))
do
. /home/digoal/env_pg10.sh $i
pg_ctl stop -m fast
done
. ./stop.sh
建模
表結構
4個字段,分別存儲PK(對應原始明細數據的PK),時間,用戶ID,用戶行為(tsvector字段)。
檢索需求
檢索時可能按照時間區間,用戶ID,以及分詞條件進行檢索。
保留時長
日誌保留一段時間(例如1個月)後清除。
分區
每個集群中,創建若幹個分區表,例如本例使用了12個分區表。
如果條件允許,建議每個小時一個分區表,這樣的話可以不建時間索引,查詢時間區間的數據使用分區即可。
如果單個用戶的數據量很龐大,那麼建議按UID再建立哈希或LIST分區,這樣的話,按照UID查詢,不需要使用索引(可以省去在UID建立索引,甚至省去存儲UID這個字段)。
索引
行為字段,全文索引。
用戶ID,B-TREE索引。
時間字段,brin塊級索引。
模擬數據
時間,時序產生。
用戶ID,在一個範圍內隨機產生。
用戶行為數據,長約512字符的字符串,拆分成若幹個TOKEN,例如本例為40個長度不等的TOKEN。
初始化數據表
vi init.sql
drop table bptest cascade;
do language plpgsql $$
declare
begin
for i in 1..12 loop
execute 'drop table bptest'||i||' cascade';
end loop;
end;
$$;
create table bptest(pk serial8, uid int, ts int8, content tsvector);
create index idx_bptest_content on bptest using gin (content) with (fastupdate=on,gin_pending_list_limit=2048);
create index idx_bptest_uid on bptest(uid);
create index idx_bptest_ts on bptest using brin(ts);
do language plpgsql $$
declare
begin
for i in 1..12 loop
execute 'create table bptest'||i||'(like bptest including all) inherits(bptest)';
end loop;
end;
$$;
-- 產生若幹字符長度的隨機字符串
CREATE OR REPLACE FUNCTION public.gen_rand_str(integer)
RETURNS text
LANGUAGE sql
STRICT
AS $function$
select string_agg(a[random()*6+1],'') from generate_series(1,$1), (select array['a','b','c','d','e','f',' ']) t(a);
$function$;
psql -f ./init.sql -p 1921
...
...
psql -f ./init.sql -p 1932
測試
灌入測試數據,例如每張表插入2億,一個數據庫插入24億(約6TB),總共插入288億(約72TB)。
每10條一批灌入。
vi test.sh
for ((i=1;i<13;i++))
do
echo "insert into bptest${i} (uid,ts,content) select random()*100000 , extract(epoch from now()), to_tsvector('english',gen_rand_str(512)) from generate_series(1,10);" > ~/test${i}.sql
done
for ((i=1;i<13;i++))
do
. /home/digoal/env_pg10.sh ${i}
for ((x=1;x<13;x++))
do
nohup pgbench -M prepared -n -r -P 3 -f ./test${x}.sql -c 1 -j 1 -t 200000000 >/tmp/bptest_${i}_${x}.log 2>&1 &
done
done
chmod 500 test.sh
查詢測試數據如下,數據非常隨機,每條記錄的content約40個元素,長度限定在512字符。
select * from bptest1 limit 1;
pk | 1
uid | 849185
ts | 1494928859
content | 'aaaefba':14 'acddcfd':39 'acdeeaadbffdbbecceb':50 'aceeedfbaefbdfcbd':59 'adbbeddbecfdcffaeedcedaeeddaeaaeebfbdcdcecfbbebfcebabceffbfdbfbfa':60
'adcdf':61 'aead':47 'afddf':70 'ba':8 'bae':37 'bbaacffbcafeffafefdf':38 'bbe':55 'bbecfdf':32 'bcbfd':27 'bdce':45 'bdeccbcdeaabefbeeebcdbfddd':19
'bed':17 'beedeadccbbbecbfcbf':44 'bfccaeddaddbc':2 'cafdfcf':5 'cbcacefaff':3 'cbcfc':52 'cbfef':63 'ccdcbedb':33 'ccdcd':20 'cd':6 'cfecfeeccabf':42
'cffb':15 'dabdfddeeabfdcefb':16 'dacdeecfbcefebfabeedfabbaccec':57 'daee':1 'daffcdffadddbaffd':68 'dbcddacefcd':9
'dbdbcbfadfffbdddaaabdcbcecdbecbbdecffbfcfecbbfebfebcadefecfceadaeffd':11 'dcdf':23 'dd':53 'ddec':31 'debdcdebfffebdbfdeefffbcfbccbececdbeaffffedfbefdcccbbccadedecfbeccccbbb':48
'deefaeeaabdbbdafcfcbeecc':71 'df':26 'dfcbbcd':46 'e':7,51,56 'eafddcaac':43 'ecbaffa':21 'ecdeeceddbdcbfcabdc':10 'ecedcec':41 'ed':66
'edcbaecfcdfbcbcdedeebdbfceeeececfac':35 'eeca':25 'eeebafeacfebfdbdbddaacabebabbfbfdefeddefccfbeaefdbf':29 'eefdbfcadebcbbfffaefcaecafbddbdbfcf':13
'ef':58 'efbdc':67 'efccdddaebfbdaffcdfcbfdcbdeb':54 'efccebdddededdeda':64 'effcbfdfdeebfbbcfaabfd':12 'f':24,28,65 'fbbfccfcbcba':30 'fc':4
'fcbbdbbaefcefefdf':34 'fd':18 'fdffcbe':69 'fea':62 'feeabdcd':36 'feeadcedecedebaedccffbfddadcfececbefddcbeaedbebfadefedcbd':22 'feffceceaeec':49 'ffaffde':40
用戶全文檢索請求,輸入4個查詢條件,流式返回PK。
建議使用流式返回接口,因為結果集可能非常大。
select pk from bptest1 where uid=$1 and ts between $2 and $3 and content @@ to_tsquery('english', $4);
壓測
./test.sh
資源使用
dstat
CPU基本耗盡,磁盤的寫入也非常的充分
cpu大部分為user的開銷,後麵使用perf看一下
dstat
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
92 7 1 0 0 1|1075M 2435M|4048B 2297B| 0 0 | 142k 167k
92 7 0 0 0 1|1137M 2075M|2391B 1945B| 0 0 | 135k 161k
91 8 0 0 0 0|1182M 2125M|3483B 2845B| 0 0 | 140k 166k
91 8 0 0 0 1|1193M 1971M|3788B 1633B| 0 0 | 135k 159k
91 8 0 0 0 0|1089M 2305M|2232B 1478B| 0 0 | 139k 159k
92 7 0 0 0 1| 986M 2795M|2176B 1568B| 0 0 | 127k 142k
92 7 1 0 0 0| 760M 2864M|6028B 1408B| 0 0 | 116k 118k
90 8 0 0 0 0|1029M 3057M|1565B 2116B| 0 0 | 132k 150k
90 9 1 0 0 1|1000M 3237M|2336B 4850B| 0 0 | 133k 154k
90 8 1 1 0 1| 659M 4399M|2872B 7992B| 0 0 | 115k 119k
91 7 0 0 0 1| 925M 2996M|1293B 1059B| 0 0 | 122k 127k
90 8 1 1 0 1| 996M 3350M| 664B 574B| 0 0 | 133k 148k
91 7 1 0 0 1| 948M 2927M|3525B 2500B| 0 0 | 132k 146k
90 8 0 1 0 0|1114M 2869M|1751B 2645B| 0 0 | 132k 150k
90 8 0 1 0 1|1267M 2408M|3003B 2244B| 0 0 | 137k 167k
91 8 0 1 0 1|1086M 2539M| 900B 347B| 0 0 | 133k 154k
91 8 0 0 0 1| 998M 2614M|1975B 1757B| 0 0 | 130k 151k
91 8 0 0 0 0|1120M 2150M|1466B 4911B| 0 0 | 130k 154k
92 7 0 0 0 0|1163M 2387M|1356B 498B| 0 0 | 136k 163k
90 8 1 1 0 1| 864M 2656M|2601B 3373B| 0 0 | 130k 143k
91 8 0 0 0 1| 987M 2651M|2052B 898B| 0 0 | 135k 154k
91 8 0 0 0 0|1073M 2205M|2479B 2319B| 0 0 | 130k 144k
90 8 1 1 0 1| 951M 2941M|1390B 1001B| 0 0 | 130k 148k
磁盤使用率
iostat -x
avg-cpu: %user %nice %system %iowait %steal %idle
85.29 0.49 9.89 1.90 0.00 2.43
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdb 0.00 92.40 99.40 389.00 2822.40 136435.20 285.13 2.25 4.60 0.65 31.64
sdc 0.00 33.60 154.20 211.60 4838.40 85700.80 247.51 1.14 3.10 0.63 22.96
sdd 0.00 63.00 232.40 238.40 7316.80 109648.00 248.44 2.17 4.61 0.99 46.54
sde 0.00 78.80 269.60 340.80 7980.80 102419.20 180.87 2.53 4.14 0.94 57.62
sdf 0.00 58.40 283.00 234.20 8204.80 99129.60 207.53 2.30 4.45 0.93 48.20
sdg 0.00 50.80 207.60 236.60 6652.80 94337.60 227.35 1.42 3.19 0.68 30.34
sdh 0.00 102.20 109.40 475.20 3489.60 131211.20 230.42 2.60 4.45 0.52 30.40
sdi 0.00 70.20 107.00 337.00 3228.80 79603.20 186.56 1.35 3.04 0.53 23.38
sdj 0.00 31.00 70.60 158.80 2534.40 85124.80 382.12 0.82 3.59 0.86 19.72
sdk 0.20 58.40 190.60 295.80 5587.20 123539.20 265.47 1.74 3.57 0.68 33.28
sdl 0.00 91.00 162.80 396.40 4441.60 119507.20 221.65 1.98 3.54 0.59 33.26
sdm 0.00 274.80 103.20 359.20 2296.00 158908.80 348.63 3.81 8.23 1.08 50.06
perf
大部分的開銷是postgres進程消耗的,建議使用以下開關重新編譯一下.
《PostgreSQL 源碼性能診斷(perf profiling)指南》
top -ag
PerfTop: 9171 irqs/sec kernel:63.7% exact: 0.0% [1000Hz cycles], (all, 64 CPUs)
----------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ _______________________________________
23044.00 4.5% tsCompareString /home/digoal/pgsql10_8k/bin/postgres
19821.00 3.9% ExecInterpExpr /home/digoal/pgsql10_8k/bin/postgres
12258.00 2.4% gintuple_get_key /home/digoal/pgsql10_8k/bin/postgres
12208.00 2.4% pg_detoast_datum_packed /home/digoal/pgsql10_8k/bin/postgres
11111.00 2.2% hash_search_with_hash_value /home/digoal/pgsql10_8k/bin/postgres
10318.00 2.0% memcpy /lib64/libc-2.12.so
9078.00 1.8% AllocSetAlloc /home/digoal/pgsql10_8k/bin/postgres
8944.00 1.7% advance_aggregates /home/digoal/pgsql10_8k/bin/postgres
8547.00 1.7% cmpEntryAccumulator /home/digoal/pgsql10_8k/bin/postgres
7311.00 1.4% array_seek /home/digoal/pgsql10_8k/bin/postgres
6744.00 1.3% gin_cmp_tslexeme /home/digoal/pgsql10_8k/bin/postgres
6650.00 1.3% __closure_wake_up [bcache]
6550.00 1.3% appendBinaryStringInfo /home/digoal/pgsql10_8k/bin/postgres
6475.00 1.3% TParserGet /home/digoal/pgsql10_8k/bin/postgres
5578.00 1.1% ginFindLeafPage /home/digoal/pgsql10_8k/bin/postgres
5543.00 1.1% PyParser_AddToken /lib64/libpython2.7.so.1.0
5412.00 1.1% array_get_element /home/digoal/pgsql10_8k/bin/postgres
5355.00 1.0% heap_fill_tuple /home/digoal/pgsql10_8k/bin/postgres
4936.00 1.0% entryLocateLeafEntry /home/digoal/pgsql10_8k/bin/postgres
4732.00 0.9% heap_form_minimal_tuple /home/digoal/pgsql10_8k/bin/postgres
4512.00 0.9% rb_insert /home/digoal/pgsql10_8k/bin/postgres
top
top -c -u digoal
top - 19:20:47 up 179 days, 5:38, 8 users, load average: 183.79, 189.01, 166.41
Tasks: 2939 total, 159 running, 2780 sleeping, 0 stopped, 0 zombie
Cpu(s): 87.7%us, 8.9%sy, 0.9%ni, 0.9%id, 1.2%wa, 0.0%hi, 0.5%si, 0.0%st
Mem: 529321828k total, 512395020k used, 16926808k free, 299780k buffers
Swap: 0k total, 0k used, 0k free, 482162560k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11258 digoal 20 0 20.7g 2.9g 2.9g R 78.7 0.6 5:11.98 postgres: postgres postgres 127.0.0.1(52848) INSERT
11253 digoal 20 0 20.7g 3.0g 2.9g R 70.7 0.6 5:14.48 postgres: postgres postgres 127.0.0.1(52843) INSERT
11264 digoal 20 0 20.7g 3.0g 2.9g R 68.1 0.6 5:14.63 postgres: postgres postgres 127.0.0.1(52854) INSERT
11263 digoal 20 0 20.7g 3.0g 2.9g R 65.2 0.6 5:14.89 postgres: postgres postgres 127.0.0.1(52853) INSERT
11250 digoal 20 0 20.7g 3.0g 3.0g R 59.4 0.6 5:16.66 postgres: postgres postgres 127.0.0.1(52840) idle
11268 digoal 20 0 20.7g 2.9g 2.9g R 53.3 0.6 5:11.36 postgres: postgres postgres 127.0.0.1(52858) INSERT
11266 digoal 20 0 20.7g 3.0g 3.0g R 52.3 0.6 5:09.00 postgres: postgres postgres 127.0.0.1(52856) INSERT
11068 digoal 20 0 20.7g 2.6g 2.6g R 51.4 0.5 4:44.47 postgres: postgres postgres 127.0.0.1(45823) INSERT
11097 digoal 20 0 20.7g 2.6g 2.5g R 49.5 0.5 4:47.85 postgres: postgres postgres 127.0.0.1(45839) INSERT
11161 digoal 20 0 20.7g 2.6g 2.6g R 49.5 0.5 4:47.87 postgres: postgres postgres 127.0.0.1(44881) INSERT
11256 digoal 20 0 20.7g 3.0g 3.0g R 49.5 0.6 5:14.69 postgres: postgres postgres 127.0.0.1(52846) INSERT
10819 digoal 20 0 20.7g 2.7g 2.7g R 48.8 0.5 4:58.17 postgres: postgres postgres 127.0.0.1(47342) INSERT
11107 digoal 20 0 20.7g 2.7g 2.7g R 48.8 0.5 5:02.00 postgres: postgres postgres 127.0.0.1(59612) INSERT
11255 digoal 20 0 20.7g 3.0g 3.0g R 48.2 0.6 5:15.68 postgres: postgres postgres 127.0.0.1(52845) INSERT
11267 digoal 20 0 20.7g 3.0g 3.0g R 47.9 0.6 5:18.82 postgres: postgres postgres 127.0.0.1(52857) INSERT
11066 digoal 20 0 20.7g 2.6g 2.6g R 46.9 0.5 4:44.97 postgres: postgres postgres 127.0.0.1(45821) INSERT
11222 digoal 20 0 20.7g 2.6g 2.6g R 45.9 0.5 5:00.43 postgres: postgres postgres 127.0.0.1(40379) idle
11207 digoal 20 0 20.7g 2.6g 2.6g R 45.6 0.5 5:04.59 postgres: postgres postgres 127.0.0.1(46160) INSERT
11224 digoal 20 0 20.7g 2.6g 2.6g R 45.3 0.5 5:02.60 postgres: postgres postgres 127.0.0.1(40381) INSERT
11249 digoal 20 0 20.7g 2.6g 2.6g R 45.3 0.5 4:59.58 postgres: postgres postgres 127.0.0.1(46187) INSERT
11045 digoal 20 0 20.7g 2.6g 2.6g R 44.6 0.5 4:39.75 postgres: postgres postgres 127.0.0.1(64424) idle
11064 digoal 20 0 20.7g 2.6g 2.6g R 44.6 0.5 4:44.69 postgres: postgres postgres 127.0.0.1(45819) INSERT
11145 digoal 20 0 20.7g 2.6g 2.6g S 44.3 0.5 4:46.18 postgres: postgres postgres 127.0.0.1(44876) INSERT
10865 digoal 20 0 20.7g 2.6g 2.6g R 44.0 0.5 4:59.89 postgres: postgres postgres 127.0.0.1(49769) INSERT
11080 digoal 20 0 20.7g 2.6g 2.5g R 44.0 0.5 4:43.70 postgres: postgres postgres 127.0.0.1(45825) INSERT
11247 digoal 20 0 20.7g 2.6g 2.6g R 43.4 0.5 5:01.91 postgres: postgres postgres 127.0.0.1(40391) idle
11163 digoal 20 0 20.7g 2.6g 2.6g R 42.7 0.5 4:48.34 postgres: postgres postgres 127.0.0.1(44882) idle
11164 digoal 20 0 20.7g 2.6g 2.6g R 42.4 0.5 4:53.21 postgres: postgres postgres 127.0.0.1(44883) INSERT
10882 digoal 20 0 20.7g 2.6g 2.6g R 41.8 0.5 5:04.78 postgres: postgres postgres 127.0.0.1(49772) INSERT
10868 digoal 20 0 20.7g 2.6g 2.6g R 41.4 0.5 5:00.30 postgres: postgres postgres 127.0.0.1(49770) INSERT
寫入tps
換算成單機的寫入,約6.5萬行/s。
progress: 729.0 s, 55.0 tps, lat 11.610 ms stddev 4.836
progress: 732.0 s, 59.7 tps, lat 20.071 ms stddev 107.984
progress: 735.0 s, 57.0 tps, lat 20.492 ms stddev 125.445
progress: 738.0 s, 38.7 tps, lat 25.891 ms stddev 154.607
progress: 741.0 s, 41.0 tps, lat 24.405 ms stddev 140.247
progress: 744.0 s, 43.0 tps, lat 13.550 ms stddev 10.448
progress: 747.0 s, 60.0 tps, lat 20.691 ms stddev 131.640
progress: 750.0 s, 60.0 tps, lat 17.394 ms stddev 83.385
progress: 753.0 s, 44.3 tps, lat 25.510 ms stddev 146.719
progress: 756.0 s, 25.0 tps, lat 39.819 ms stddev 213.642
progress: 759.0 s, 50.0 tps, lat 11.439 ms stddev 5.319
progress: 762.0 s, 60.0 tps, lat 20.979 ms stddev 106.782
progress: 765.0 s, 60.0 tps, lat 18.778 ms stddev 167.714
progress: 768.0 s, 58.0 tps, lat 18.017 ms stddev 99.949
progress: 771.0 s, 51.0 tps, lat 19.636 ms stddev 124.429
寫入性能基本上取決於tsvector字段的元素個數,散列程度,本例每條記錄約40個元素。如果元素個數下降一半,性能將提升一倍左右。
postgres=# select array_length(tsvector_to_array(content),1) from bptest1 limit 10;
array_length
--------------
40
37
40
45
35
42
38
46
30
40
(10 rows)
評估每秒構建了多少個索引條目
1. 全文檢索索引條目
每條記錄約40個元素,當插入的tps=6.5萬時,構建的全文檢索條目數約 260萬/s。
65000*40 = 2600000
2. uid索引條目,較小,忽略不計。
3. ts索引條目,使用BRIN塊級索引,忽略不計。
性能影響最大,資源消耗最多的就是全文檢索索引條目的構建。
查詢性能
舉例
postgres=# \dt+ bptest1
List of relations
Schema | Name | Type | Owner | Size | Description
--------+---------+-------+----------+---------+-------------
public | bptest1 | table | postgres | 1689 MB |
(1 row)
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from bptest1 where ts between 1494999617 and 1495999617 and content @@ to_tsquery ('english','abc');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.bptest1 (cost=175.95..23691.41 rows=20015 width=811) (actual time=7.017..23.376 rows=19755 loops=1)
Output: uid, ts, content
Recheck Cond: (bptest1.content @@ '''abc'''::tsquery)
Filter: ((bptest1.ts >= 1494999617) AND (bptest1.ts <= 1495999617))
Heap Blocks: exact=18933
Buffers: shared hit=18948
-> Bitmap Index Scan on bptest1_content_idx (cost=0.00..170.94 rows=20019 width=0) (actual time=3.811..3.811 rows=19755 loops=1)
Index Cond: (bptest1.content @@ '''abc'''::tsquery)
Buffers: shared hit=15
Planning time: 0.097 ms
Execution time: 24.517 ms
(11 rows)
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from bptest1 where ts between 1494999617 and 1495999617 and content @@ to_tsquery ('english','abc & bc');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.bptest1 (cost=36.27..2598.42 rows=1996 width=811) (actual time=4.577..6.711 rows=2125 loops=1)
Output: uid, ts, content
Recheck Cond: (bptest1.content @@ '''abc'' & ''bc'''::tsquery)
Filter: ((bptest1.ts >= 1494999617) AND (bptest1.ts <= 1495999617))
Heap Blocks: exact=2116
Buffers: shared hit=2239
-> Bitmap Index Scan on bptest1_content_idx (cost=0.00..35.77 rows=1997 width=0) (actual time=4.291..4.291 rows=2125 loops=1)
Index Cond: (bptest1.content @@ '''abc'' & ''bc'''::tsquery)
Buffers: shared hit=123
Planning time: 0.125 ms
Execution time: 6.849 ms
(11 rows)
純SSD fsync=on 寫入性能
1. 寫入TPS
7萬/s ,構建的全文檢索條目數約 280萬/s。
性能比較平穩。
純SATA+SSD bcache fsync=off 寫入性能
1. 寫入TPS
7.5萬/s ,構建的全文檢索條目數約 300萬/s。
性能比較平穩。
小結
1. 查詢聚合
由於日誌數據打散分布在多個集群,多個表內,建議使用plproxy進行查詢的聚合。
參考
《A Smart PostgreSQL extension plproxy 2.2 practices》
《阿裏雲ApsaraDB RDS for PostgreSQL 最佳實踐 - 4 水平分庫 之 節點擴展》
《阿裏雲ApsaraDB RDS for PostgreSQL 最佳實踐 - 3 水平分庫 vs 單機 性能》
《阿裏雲ApsaraDB RDS for PostgreSQL 最佳實踐 - 2 教你RDS PG的水平分庫》
2. 寫入分片
寫入分片,可以在業務層完成,隨機打散寫入。
實際應用時,可以根據需要,切分成更多的分區。
3. 主要的開銷是postgres的開銷,如果需要詳細的分析,建議重新編譯postgres
4. gin索引的優化
https://www.postgresql.org/docs/9.6/static/sql-createindex.html
GIN indexes accept different parameters:
1. fastupdate
This setting controls usage of the fast update technique described in Section 63.4.1.
It is a Boolean parameter: ON enables fast update, OFF disables it.
(Alternative spellings of ON and OFF are allowed as described in Section 19.1.) The default is ON.
Note: Turning fastupdate off via ALTER INDEX prevents future insertions from going into the list of pending index entries,
but does not in itself flush previous entries.
You might want to VACUUM the table or call gin_clean_pending_list function afterward to ensure the pending list is emptied.
2. gin_pending_list_limit
Custom gin_pending_list_limit parameter.
This value is specified in kilobytes.
gin_pending_list_limit的目的是延遲合並,因為一條記錄中可能涉及較多的GIN KEY,如果實時更新,GIN索引的寫入量會非常大,性能受到影響。
本例gin_pending_list_limit設置為2MB,tps比較平緩,如果設置過大,當CPU資源不足時,抖動會比較嚴重。
用戶可以根據實際測試,設置合理的gin_pending_list_limit值。
5. 如果把PostgreSQL完全當成索引庫使用,並且允許數據丟失,那麼可以使用fsync=off的開關,(檢查點fsync對IO的影響比較大,本例使用的是SATA盤,將會導致較大的性能抖動)。
postgresql.auto.conf
fsync = off
zero_damaged_pages = on
如果有ha的話,丟失的風險又會更小。(但是服務器CRASH後,需要重建備庫,這麼大的量,還是挺恐怖的。)
建議用更多的數據庫實例,每個實例的大小可控(例如 < 2TB),重建的時間也相對可控。
6. 為了達到更好的響應速度(RT),建議明細和索引分開存放,明細要求寫入RT低,索引可以存在一定的延遲。 並且索引與明細數據的可靠性要求也不一樣。
最後更新:2017-05-20 01:01:29
上一篇:
520:一入侯門“深”似海,深度學習深幾許(入門係列之一)
下一篇:
DNA遺傳哲學? - 數據庫裏schema應該屬於誰?
miui 係統組件 功能提示
MSSQL 2000 823錯誤原因分析及數據恢複方案
智能馬桶蓋在天貓賣爆,鬆下“蓋神”掌舵後做了一個決定......
《Microsoft.NET企業級應用架構設計(第2版)》——第1章 今天的架構師和架構 1.1軟件架構到底是什麼
ASP.NET MVC以ValueProvider為核心的值提供係統: NameValueCollectionValueProvider
1小時,這62個品牌天貓雙11成交過億!
jsp連接SQL Server 2000數據庫代碼
推薦係統分析 - 推薦算法, RecDB推薦數據庫介紹
Tomcat的設置1——設置根目錄
2013 年 KDE 大災難:損毀文件完美鏡像