955
技術社區[雲棲]
Redis4.0新特性(一)-Memory Command
Redis4.0版本增加了很多誘人的新特性,在redis精細化運營管理中都非常有用(猜想和antirez加入redislabs有很大關係);此係列幾篇水文主要介紹以下幾個新特性的使用和效果。
- Redis Memeory Command:詳細分析內存使用情況,內存使用診斷,內存碎片回收;
- PSYNC2:解決failover和從實例重啟不能部分同步;PSYNC3已經路上了;
- LazyFree: 再也不用怕big key的刪除引起集群故障切換;
- LFU: 支持近似的LFU內存淘汰算法;
- Active Memory Defragmentation:內存碎片回收效果很好(實驗階段);
- Modules: Redis成為更多的可能(覺得像mongo/mysql引入engine的階段);
- 因暫未有官方的詳細文檔,加之業餘時間有限; 還請各位看官請輕拍。:)
那本文先介紹第一個特性memory指令。
Memory Command簡介
redis4.0引入新的命令memory, memory命令共有5個子命令;
讓我們能更深入要了解redis內部的內存使用情況。
通過memory help命令,可以查看除memory doctor的其他4個子命令;
5個指令簡介如下:
- MEMORY USAGE [SAMPLES] -“Estimate memory usage of key”
- MEMORY STATS -“Show memory usage details”
- MEMORY PURGE -“Ask the allocator to release memory”
- MEMORY DOCTOR - “A better observability on the Redis memory usage.”
- MEMORY MALLOC-STATS - “Show allocator internal stats”
本文簡述memory每個子命令的用途和部分實現。
1 memory usage
在redis4.0之前,隻能通過DEBUG OBJECT命令估算key的內存使用(字段serializedlength),但因為相差太大,沒有太多參考價值。
注:可以通過rdb工具分析rdb文件,獲得某個key的實際使用內存
如以下示例,k1的序列化值是7。
127.0.0.1:6379> set k1 value1
OK
127.0.0.1:6379> DEBUG OBJECT k1
xx refcount:1 encoding:embstr serializedlength:7 lru:7723189 lru_seconds_idle:160
memory usage的基本使用
usage子命令使用非常簡單,直接按memory usage key名字;如果當前key存在,則返回key的value實際使用內存估算值;如果key不存在,則返回nil.
示例:
127.0.0.1:6379> set k1 value1
OK
127.0.0.1:6379> memory usage k1 //這裏k1 value占用57字節內存
(integer) 57
127.0.0.1:6379> memory usage aaa // aaa鍵不存在,返回nil.
(nil)
memory usage細節分析
memory usage不包含key串的內存占用
127.0.0.1:6379>set k1 a // key長度為2字符
OK
127.0.0.1:6379> memory usage k1
(integer) 52
127.0.0.1:6379> set k111111111111 a //key長度為13字符
OK
127.0.0.1:6379> memory usage k111111111111 //兩個value相同,但key長度不同的key, usage分析的內存占用相同
(integer) 52
- memory usage不包含Key Expire的內存占用
127.0.0.1:6379> memory usage k1
(integer) 52
127.0.0.1:6379> expire k1 10000 //對k1設置ttl
(integer) 1
127.0.0.1:6379> memory usage k1 //usage不包含ttl的內存占用
(integer) 52
- 對於集合的數據類型(除string外), usage子命令采用類似LRU SAMPLES的抽樣方式,默認抽樣5個元素求平均 X 元數個數 得出實際內存占用(下一節會詳細說明)。所以計算是近似值,當麵可以指定抽樣的SAMPLES個數。 示例說明: 生成一個100w個字段的hash鍵:hkey, 每字段的value長度是從1~1024字節的隨機值。
127.0.0.1:6379> hlen hkey // hkey有100w了字段,每個字段的value長度介入1~1024個字節
(integer) 1000000
127.0.0.1:6379> MEMORY usage hkey //默認SAMPLES為5,分析hkey鍵內存占用521588753字節
(integer) 521588753
127.0.0.1:6379> MEMORY usage hkey SAMPLES 1000 //指定SAMPLES為1000,分析hkey鍵內存占用617977753字節
(integer) 617977753
127.0.0.1:6379> MEMORY usage hkey SAMPLES 10000 //指定SAMPLES為10000,分析hkey鍵內存占用624950853字節
(integer) 624950853
這是使用抽樣求平均的算法,要想獲取key較精確的內存值,就指定更大SAMPLES個數。但並不越大越好,因為越大,memory usage占用cpu時間分片就大。
- memory usage時間複雜度,和指定的SAMPLES數有點 見以下示例,SAMPLES為1000耗時0.176ms, 為100000耗時14.65ms
127.0.0.1:6379> SLOWLOG get
1) 1) (integer) 3
3) (integer) 14651
4) 1) "MEMORY"
2) "usage"
3) "hkey"
4) "SAMPLES"
5) "100000"
2) 1) (integer) 1
3) (integer) 176
4) 1) "MEMORY"
2) "usage"
3) "hkey"
4) "SAMPLES"
5) "1000"
注:全實例的Expire內存占用,詳見下文memory stats子命令的overhead.hashtable.expires)
memory usage源碼實現
- memory命令的入口函數為memoryCommand(object.c文件中)
/* The memory command will eventually be a complete interface for the
* memory introspection capabilities of Redis.
*
Usage: MEMORY usage <key> /
void memoryCommand(client *c) {
對於memory usage的計算核心函數objectComputeSize(object.c文件中)
因為文章篇幅,這裏隻貼少部分代碼:
//函數注釋已說明,value計算和取樣計算的處理
/* Returns the size in bytes consumed by the key's value in RAM.
* Note that the returned value is just an approximation, especially in the
* case of aggregated data types where only "sample_size" elements
are checked and averaged to estimate the total size. /
#define OBJ_COMPUTE_SIZE_DEF_SAMPLES 5 / Default sample size. / //對於集合類型結構,默認取樣個數為5
size_t objectComputeSize(robj *o, size_t sample_size) {
if (o->type == OBJ_STRING) { //String類型
--- 省略---------
} else if (o->type == OBJ_LIST) { //List類型
if (o->encoding == OBJ_ENCODING_QUICKLIST) {
quicklist *ql = o->ptr;
quicklistNode *node = ql->head;
asize = sizeof(*o)+sizeof(quicklist);
//獲取List的頭sample_size個元數,計算長度之和
do {
elesize += sizeof(quicklistNode)+ziplistBlobLen(node->zl);
samples++;
} while ((node = node->next) && samples < sample_size); 個元素
asize += (double)elesize/samples*listTypeLength(o); //求平均 X List元素個數,計算出內存之和
} ---省略----
2 memory stats
在redis 4.0之前,我們隻能通過info memory查看redis實例的內存大體使用狀況;而內存的使用細節,比如expire的消耗,client output buffer, query buffer等是很難直觀顯示的。 memory stats命令就是為展現redis內部內存使用細節。
memory stats的基本使用
memory stats命令直接運行,返回當前實例內存使用細節;命令的係統開銷小(可用於監控采集)。示例如下: 運行時返回33行數據,16個子項目; 下節詳細分析,每個子項目的具體含義。
127.0.0.1:6379> memory stats
1) "peak.allocated"
2) (integer) 3211205544
3) "total.allocated"
4) (integer) 875852320
5) "startup.allocated"
6) (integer) 765608
7) "replication.backlog"
8) (integer) 117440512
9) "clients.slaves"
10) (integer) 16858
11) "clients.normal"
12) (integer) 49630
13) "aof.buffer"
14) (integer) 0
15) "db.0"
16) 1) "overhead.hashtable.main"
2) (integer) 48388888
3) "overhead.hashtable.expires"
4) (integer) 104
17) "overhead.total"
18) (integer) 166661600
19) "keys.count"
20) (integer) 1000007
21) "keys.bytes-per-key"
22) (integer) 875
23) "dataset.bytes"
24) (integer) 709190720
25) "dataset.percentage"
26) "81.042335510253906"
27) "peak.percentage"
28) "27.274873733520508"
29) "fragmentation"
30) "0.90553224086761475"
memory stats細節分析
- peak.allocated: redis從啟動來,allocator分配的內存峰值;同於info memory的used_memory_peak
- total.allocated: allocator分配當前內存字節數;同於info memory的used_memory
- startup.allocated: redis啟動完成使用的內存字節數;- initial_memory_usage; / Bytes used after initialization. /
- replication.backlog: redis複製積壓緩衝區(replication backlog)內存使用字節數; 通過repl-backlog-size參數設置,默認1M,上例中redis設置是100MB。(每個實例隻有一個backlog)
注意:1. redis啟用主從同步,不管backlog是否被填充,replication.backlog都等於repl-backlog-size的值。
筆者覺得此值應設置為repl_backlog_histlen更合適,沒太明白大神的用意。
2. slave也會啟用backlog;用於slave被提升為master後,
仍能使用PSYNC(這也是redis4.0 PSYNC 2.0實現的基礎
clients.slaves: 在master側,所有slave clients消耗的內存字節數(非常重要的指標)。
每個slave連接master有且隻有一個client, 標識為Sclient list命令中flag為S. 這裏消耗的內存指每個slave client的query buffer, client output buffer和client本身結構體占用。
有此指標,就能有效監控和分析slave client消耗的output buffer, 更優化地設置”client-output-buffer-limit”。
下麵示例:當slave client limit設置很大時,可見client的output占用內存非常大,clients.slaves已達3GB. 以前隻能通過client list的omem字段分析。
127.0.0.1:6379> memory stats
1) "peak.allocated"
2) (integer) 38697041192
9) "clients.slaves" //因slave client出現大量的client output buffer內存占用
10) (integer) 3312505550
11) "clients.normal"
12) (integer) 2531130
- clients.normal:Redis所有常規客戶端消耗內存節字數(非常重要) 即所有flag為N的客戶端內存使用: query buffer + client output buffer + client的結構體內存占用。 計算方式和clients.slave類似。 這個子項對於我們監測異常的數據寫入或讀取的client非常有用。 ```js 127.0.0.1:6379> memory stats //省略其他信息 9) "clients.slaves" // slave client主要占用是client output buffer,見下麵id=10520連接 10) (integer) 10256918 11) "clients.normal" //普通clients占用的內存,一般是query buffer和client output buffer. 12) (integer) 102618310 127.0.0.1:6379> client list //隻顯示幾個測試clients, 查看omem和qbuf id=10520 addr=xx:60592 fd=8 flags=S qbuf=0 qbuf-free=0 obl=0 oll=2 omem=10240060 events=rw cmd=replconf id=10591 addr=xx:56055 fd=10 flags=N qbuf=5799889 qbuf-free=4440113 obl=0 oll=0 omem=0 events=r cmd=set id=10592 addr=xx:56056 fd=11 flags=N qbuf=10121401 qbuf-free=118601 obl=0 oll=0 omem=0 events=r cmd=set id=10593 addr=xx:56057 fd=12 flags=N qbuf=0 qbuf-free=10240002 obl=0 oll=0 omem=0 events=r cmd=set
- aof.buffer: AOF BUFFER使用內存字節數; 一般較小,隻有出現AOF rewrite時buffer會變得較大。開啟AOF或定期執行BGREWRITEAOF命令的業務,可能使用內存較大,需關注此項目。
- overhead.hashtable.main:
- overhead.hashtable.expires:
- overhead.total:redis額外的總開銷內存字節數; 即分配器分配的總內存total.allocated,減去數據實際存儲使用內存。overhead.total由7部分組成,公式如下:
```js
計算公式:
overhead.total=startup.allocated + replication.backlog + clients.slaves
+ clients.normal + aof.buffer + overhead.hashtable.main
+ overhead.hashtable.expires
實例分析:(見上文 <memory stats的基本使用>節中的實例),通過計算驗證正確。
765608 + 117440512 + 16858 + 49630 + 0 + 48388888 + 104 = 166661600
且示例中:17) "overhead.total"=166661600
理論應盡量減少額外的內存開銷- overhead.total,現在有詳細監控,就可以很好入手分析
- keys.count: 整個實例key的個數; 相同於dbsize返回值
- keys.bytes-per-key:每個key平均占用字節數;把overhead也均攤到每個key上。不能以此值來表示業務實際的key平均長度。
計算公式:
keys.bytes-per-key = (total.allocated-startup.allocated)/keys.count
實例分析:
(875852320-765608)/1000007 = 875.08 //和上文中的875對應
- dataset.bytes:表示redis數據占用的內存容量,即分配的內存總量,減去總的額外開銷內存量。
計算公式:
dataset.bytes = total.allocated - overhead.total
實例分析:
875852320 - 166661600 = 709190720 // 有上文示例中"dataset.bytes"值相同
- dataset.percentage:表示redis數據占用內存占總內存分配的百分比(重要);
計算公式:
dataset.percentage = dataset.bytes/(total.allocated-startup.allocated) * 100%
實例分析:
709190720/(875852320-765608) * 100% = 81.042336750% //有上文示例中的dataset.percentage相同
可表示業務的redis數據存儲的內存效率
- peak.percentage:當前內存使用量與峰值時的占比
計算公式:
peak.percentage = total.allocated/peak.allocated * 100%
實例分析:
875852320/3211205544 * 100% = 27.274875681393% //有上文示例中的peak.percentage相同
- fragmentation: 表示Redis的內存碎片率(非常重要);前文的項目中都沒包含redis內存碎片屬性 / Fragmentation = RSS / allocated-bytes /, 同於info memory中的mem_fragmentation_ratio
3 memory doctor
memory doctor命令分析redis使用內存的狀態,根據一係列簡單判斷,給出一定的診斷建議,有一定參考價值。
memory doctor的基本使用
在redis-cli中運行memory doctor命令,如果內存使用有明顯不合裏的情況,會給出不合理的狀態,同時給出處理的建議。
示例如下:
127.0.0.1:6379> memory doctor
"Sam, I detected a few issues in this Redis instance memory implants:\n\n
Peak memory: In the past this instance used more than 150% the memory that is currently using.
The allocator is normally not able to release memory after a peak,
so you can expect to see a big fragmentation ratio,
however this is actually harmless and is only due to the memory peak, and if the Redis instance Resident Set Size (RSS) is currently bigger than expected,
the memory will be used as soon as you fill the Redis instance with more data.
If the memory peak was only occasional and you want to try to reclaim memory,
please try the MEMORY PURGE command, otherwise the only other option is to
shutdown and restart the instance.\n\n
I'm here to keep you safe, Sam. I want to help you.\n"
memory doctor細節分析
memory doctor主要列舉條件判斷,滿足條件的給出檢查結果和建議。
主要包含以下幾點,滿足其中一點,就給出診斷結果和建議:
- used_memory小於5M,doctor認為內存使用量過小,不做進一步診斷
- peak分配內存大於當前total_allocated的1.5倍,可能說明RSS遠大於used_memory
- 內存碎片率大於1.4
- 每個Normal Client平均使用內存大於200KB
- 每個Slave Client平均使用內存大於10MB
memory doctor的源碼實現
dockor命令的實現函數getMemoryDoctorReport(void)在object.c源文件
核心代碼塊如下:
sds getMemoryDoctorReport(void) {
int empty = 0; / Instance is empty or almost empty. /
int big_peak = 0; / Memory peak is much larger than used mem. /
int high_frag = 0; / High fragmentation. /
int big_slave_buf = 0; / Slave buffers are too big. /
int big_client_buf = 0; / Client buffers are too big. /
int num_reports = 0;
struct redisMemOverhead *mh = getMemoryOverheadData(); //獲取各個內存指標,用於後麵進行條件判斷
if (mh->total_allocated < (102410245)) { // 如果使用內存小於5MB,則判斷幾乎是空實例,不進行其他診斷
empty = 1;
num_reports++;
} else {
---- 省略---
/ Fragmentation is higher than 1.4? /
if (mh->fragmentation > 1.4) {
high_frag = 1;
num_reports++;
}
/ Slaves using more than 10 MB each? /
if (numslaves > 0 && mh->clients_slaves / numslaves > (1024102410)) {
big_slave_buf = 1;
num_reports++;
}
}
sds s;
if (num_reports == 0) {
s = sdsnew(
"Hi Sam, I can't find any memory issue in your instance. "
"I can only account for what occurs on this base.\n");
} else if (empty == 1) {
s = sdsnew(
"Hi Sam, this instance is empty or is using very little memory, "
"my issues detector can't be used in these conditions. "
"Please, leave for your mission on Earth and fill it with some data. "
"The new Sam and I will be back to our programming as soon as I "
"finished rebooting.\n");
} else {
if (high_frag) {
s = sdscatprintf(s," * High fragmentation: This instance has a memory fragmentation greater than 1.4 (this means that the Resident Set Size of the Redis process is much larger than the sum of the logical allocations Redis performed). This problem is usually due either to a large peak memory (check if there is a peak memory entry above in the report) or may result from a workload that causes the allocator to fragment memory a lot. If the problem is a large peak memory, then there is no issue. Otherwise, make sure you are using the Jemalloc allocator and not the default libc malloc. Note: The currently used allocator is \"%s\".\n\n", ZMALLOC_LIB);
}
if (big_slave_buf) {
s = sdscat(s," * Big slave buffers: The slave output buffers in this instance are greater than 10MB for each slave (on average). This likely means that there is some slave instance that is struggling receiving data, either because it is too slow or because of networking issues. As a result, data piles on the master output buffers. Please try to identify what slave is not receiving data correctly and why. You can use the INFO output in order to check the slaves delays and the CLIENT LIST command to check the output buffers of each slave.\n\n");
}
if (big_client_buf) {
}
4 memory purge
memory purge命令通過調用jemalloc內部命令,進行內存釋放,盡量把redis進程占用但未有效使用內存,即常說的內存碎片釋放給操作係統。
memory purge功能隻適用於使用jemalloc作為allocator的實例。
redis的內存碎片率,是DBA比較頭疼的事; 如某個業務下線刪除了大量的key,redis不會把“清理”的內存及時歸還給操作係統;但這部分內存可以被redis再次利用。
redis4.0提供兩種機製解決內存碎片問題,一是memory purge命令; 二是Active memory defragmentation,目前還處於實驗階段,回收效率相當高; 本節隻介紹memory purge.
memory purge的基本使用
memory purge使用簡單,對性能沒明顯影響;通過測試驗證來看,內存碎片回收的效率不高,當mem_fragmentation_ratio為2時,執行purge基本沒有回收;
下麵例子中:內存碎片率mem_fragmentation_ratio為8.2,執行memory purge, 碎片率下降為7.31,回收內存0.28GB。 從4.0版本來看,回收的效率不太理想。
127.0.0.1:6379> info memory
# Memory
used_memory:344944360
used_memory_human:328.96M
used_memory_rss:2828042240
used_memory_rss_human:2.63G
mem_fragmentation_ratio:8.20
mem_allocator:jemalloc-4.0.3
active_defrag_running:0
lazyfree_pending_objects:0
127.0.0.1:6379> memory purge
OK
127.0.0.1:6379> info memory
# Memory
used_memory:344942912
used_memory_human:328.96M
used_memory_rss:2522521600
used_memory_rss_human:2.35G
used_memory_dataset_perc:86.02%
mem_fragmentation_ratio:7.31
mem_allocator:jemalloc-4.0.3
active_defrag_running:0
lazyfree_pending_objects:0
memory purge細節分析
memory purge命令隻在jemalloc分配器中有效。
因真正釋放內存操作,是通過jemalloc的底層實現,筆者沒太看明白;
感興趣的看官,閱讀object.c源文件中的memoryCommand()函數邏輯代碼如下:
else if (!strcasecmp(c->argv[1]->ptr,"purge") && c->argc == 2) {
#if defined(USE_JEMALLOC) //判斷當前redis使用提malloc是否為jemalloc
char tmp[32];
unsigned narenas = 0;
size_t sz = sizeof(unsigned);
if (!je_mallctl("arenas.narenas", &narenas, &sz, NULL, 0)) { //調用jemalloc的處理
sprintf(tmp, "arena.%d.purge", narenas);
if (!je_mallctl(tmp, NULL, 0, NULL, 0)) {
addReply(c, shared.ok);
return;
}
}
addReplyError(c, "Error purging dirty pages");
#else
addReply(c, shared.ok);
/ Nothing to do for other allocators. /
#endif
5 memory malloc-stats
此命令用於打印allocator內部的狀態,目前隻支持jemalloc。對於源碼開發同學,應該比較有用;簡單示例如下:
127.0.0.1:6379> memory malloc-stats
Begin jemalloc statistics
Version: 4.0.3-0-ge9192eacf8935e29fc62fddc2701f7942b1cc02c
Assertions disabled
Run-time option settings:
opt.abort: false
opt.lg_chunk: 21
opt.dss: "secondary"
opt.narenas: 96
opt.lg_dirty_mult: 3 (arenas.lg_dirty_mult: 3)
opt.stats_print: false
opt.junk: "false"
opt.quarantine: 0
opt.redzone: false
opt.zero: false
opt.tcache: true
opt.lg_tcache_max: 15
CPUs: 24
Arenas: 96
Pointer size: 8
Quantum size: 8
Page size: 4096
Min active:dirty page ratio per arena: 8:1
Maximum thread-cached size class: 32768
Chunk size: 2097152 (2^21)
Allocated: 345935320, active: 350318592, metadata: 65191296, resident: 455610368, mapped: 2501902336
Current active ceiling: 352321536
arenas[0]:
assigned threads: 1
dss allocation precedence: secondary
min active:dirty page ratio: 8:1
dirty pages: 85527:10020 active:dirty, 82084 sweeps, 112369 madvises, 665894 purged
allocated nmalloc ndalloc nrequests
small: 311397848 12825077 11674091 32248603
large: 983040 1850 1842 1854
huge: 33554432 8 6 8
total: 345935320 12826935 11675939 32250465
--省略------
總結
memory命令使用我們能直觀地查看redis內存分布,對我們掌握內存使用情況,有針對性地做業務的內存使用優化。尤其是purge, stats, usage三個子命令。 相信在新的版本中,memory命令的功能會更加強:)
原文發布時間為:2017-11-6
本文作者:RogerZhuo
本文來自雲棲社區合作夥伴“老葉茶館”,了解相關信息可以關注“老葉茶館”微信公眾號
最後更新:2017-11-07 13:03:36