閱讀450 返回首頁    go 技術社區[雲棲]


Off-heap Memory in Apache Flink and the curious JIT compiler

Running data-intensive code in the JVM and making it well-behaved is tricky. Systems that put billions of data objects naively onto the JVM heap face unpredictable OutOfMemoryErrors and Garbage Collection stalls. Of course, you still want to to keep your data in memory as much as possible, for speed and responsiveness of the processing applications. In that context, “off-heap” has become almost something like a magic word to solve these problems.

 

In this blog post, we will look at how Flink exploits off-heap memory. 
The feature is part of the upcoming release, but you can try it out with the latest nightly builds. We will also give a few interesting insights into the behavior for Java’s JIT compiler for highly optimized methods and loops.

 

Why actually bother with off-heap memory?

Given that Flink has a sophisticated level of managing on-heap memory, why do we even bother with off-heap memory? It is true that “out of memory” has been much less of a problem for Flink because of its heap memory management techniques. Nonetheless, there are a few good reasons to offer the possibility to move Flink’s managed memory out of the JVM heap:

  • Very large JVMs (100s of GBytes heap memory) tend to be tricky. It takes long to start them (allocate and initialize heap) and garbage collection stalls can be huge (minutes). While newer incremental garbage collectors (like G1) mitigate this problem to some extend, an even better solution is to just make the heap much smaller and allocate Flink’s managed memory chunks outside the heap.

  • I/O and network efficiency: In many cases, we write MemorySegments to disk (spilling) or to the network (data transfer). Off-heap memory can be written/transferred with zero copies, while heap memory always incurs an additional memory copy.

  • Off-heap memory can actually be owned by other processes. That way, cached data survives process crashes (due to user code exceptions) and can be used for recovery. Flink does not exploit that, yet, but it is interesting future work.

Flink傳統的基於‘on-heap’ 內存管理機製,已經可以解決很多的java關於‘out of memory’或gc的問題,那我們為何還要用 ‘off-heap’的技術,

1. very large的JVM會要很長的啟動時間,並且gc的代價也會很大 
2. heap在寫磁盤或network時,至少要一次copy,而off-heap可以實現zero copy 
3. off-heap內存是進程共享的,JVM進程crash不會丟失數據

 

The opposite question is also valid. Why should Flink ever not use off-heap memory?

  • On-heap is easier and interplays better with tools. Some container environments and monitoring tools get confused when the monitored heap size does not remotely reflect the amount of memory used by the process.

  • Short lived memory segments are cheaper on the heap. Flink sometimes needs to allocate some short lived buffers, which works cheaper on the heap than off-heap.

  • Some operations are actually a bit faster on heap memory (or the JIT compiler understands them better).

為何Flink不直接用off-heap memory?

越強大的東西,一般都越麻煩,

所以一般case下,用on-heap就夠了

 

The off-heap Memory Implementation

Given that all memory intensive internal algorithms are already implemented against the MemorySegment, our implementation to switch to off-heap memory is actually trivial. 
You can compare it to replacing allByteBuffer.allocate(numBytes) calls with ByteBuffer.allocateDirect(numBytes)
In Flink’s case it meant that we made the MemorySegment abstract and added the HeapMemorySegment andOffHeapMemorySegment subclasses. 
TheOffHeapMemorySegment takes the off-heap memory pointer from a java.nio.DirectByteBuffer and implements its specialized access methods using sun.misc.Unsafe
We also made a few adjustments to the startup scripts and the deployment code to make sure that the JVM is permitted enough off-heap memory (direct memory, -XX:MaxDirectMemorySize).

使用off-heap在內存管理機製上和使用on-heap並沒有太大的區別,

相比於NIO,使用ByteBuffer.allocate(numBytes)來分配heap內存,而用ByteBuffer.allocateDirect(numBytes)來分配off-heap內存

Flink,對MemorySegment,生成兩個子類,HeapMemorySegment and OffHeapMemorySegment

其中OffHeapMemorySegment,以java.nio.DirectByteBuffer的形式使用off-heap memory, 通過sun.misc.Unsafe接口來操作這些memory

 

Understanding the JIT and tuning the implementation

The MemorySegment was (before our change) a standalone class, it was final (had no subclasses). Via Class Hierarchy Analysis (CHA), the JIT compiler was able to determine that all of the accessor method calls go to one specific implementation. That way, all method calls can be perfectly de-virtualized and inlined, which is essential to performance, and the basis for all further optimizations (like vectorization of the calling loop).

With two different memory segments loaded at the same time, the JIT compiler cannot perform the same level of optimization any more, which results in a noticeable difference in performance: A slowdown of about 2.7 x in the following example:

image

 

這裏是考慮性能優化問題,

這裏提出的一個問題就是,如果MemorySegment是standalone class類,沒有之類,那麼會比較高效,因為編譯的時候,他所調研的method都是確定的,可以提前做優化; 
如果具有兩個子類,那麼隻有到真正運行到時候才知道是哪個子類,這樣就不能提前做優化;

實際測,性能的差距在2.7倍左右

解決方法:

Approach 1: Make sure that only one memory segment implementation is ever loaded.

We re-structured the code a bit to make sure that all places that produce long-lived and short-lived memory segments instantiate the same MemorySegment subclass (Heap- or Off-Heap segment). Using factories rather than directly instantiating the memory segment classes, this was straightforward.

如果在代碼裏麵隻可能實例化其中的一個子類,另一個子類根本就沒有被實例化過,那麼JIT會意識到,並做優化;我們可以用factories來實例化對象,這樣更方便;

Approach 2: Write one segment that handles both heap and off-heap memory

We created a class HybridMemorySegment which handles transparently both heap- and off-heap memory. It can be initialized either with a byte array (heap memory), or with a pointer to a memory region outside the heap (off-heap memory).

第二種方法就是用HybridMemorySegment,同時處理heap和off-heap,這樣就不需要子類 
並且有tricky的方式,可以做到透明的處理兩種memory

細節看原文

最後更新:2017-04-07 21:05:50

  上一篇:go Raft
  下一篇:go HybridTime - Accessible Global Consistency with High Clock Uncertainty