有關“雙重檢查鎖定失效”的說明

雙重檢查鎖定（以下稱為DCL）已被廣泛當做多線程環境下延遲初始化的一種高效手段。

遺憾的是，在Java中，如果沒有額外的同步，它並不可靠。在其它語言中，如c++，實現DCL，需要依賴於處理器的內存模型、編譯器實行的重排序以及編譯器與同步庫之間的交互。由於c++沒有對這些做出明確規定，很難說DCL是否有效。可以在c++中使用顯式的內存屏障來使DCL生效，但Java中並沒有這些屏障。

來看下麵的代碼

`01`	`// Single threaded version`

`02`	`class` `Foo {`

`03`	`private` `Helper helper =` `null;`

`04`	`public` `Helper getHelper() {`

`05`	`if` `(helper ==` `null)`

`06`	`helper =` `new` `Helper();`

`07`	`return` `helper;`

08 }

`09`	`// other functions and members...`

10 }

如果這段代碼用在多線程環境下，有幾個可能出錯的地方。最明顯的是，可能會創建出兩或多個Helper對象。（後麵會提到其它問題）。將getHelper()方法改為同步即可修複此問題。

`01`	`// Correct multithreaded version`

`02`	`class` `Foo {`

`03`	`private` `Helper helper =` `null;`

`04`	`public` `synchronized` `Helper getHelper() {`

`05`	`if` `(helper ==` `null)`

`06`	`helper =` `new` `Helper();`

`07`	`return` `helper;`

08 }

`09`	`// other functions and members...`

10 }

上麵的代碼在每次調用getHelper時都會執行同步操作。DCL模式旨在消除helper對象被創建後還需要的同步。

`01`	`// Broken multithreaded version`

`02`	`// "Double-Checked Locking" idiom`

`03`	`class` `Foo {`

`04`	`private` `Helper helper =` `null;`

`05`	`public` `Helper getHelper() {`

`06`	`if` `(helper ==` `null)`

`07`	`synchronized(this) {`

`08`	`if` `(helper ==` `null)`

`09`	`helper =` `new` `Helper();`

10 }

`11`	`return` `helper;`

12 }

`13`	`// other functions and members...`

14 }

不幸的是，這段代碼無論是在優化型的編譯器下還是在共享內存處理器中都不能有效工作。

不起作用

上麵代碼不起作用的原因有很多。接下來我們先說幾個比較顯而易見的原因。理解這些之後，也許你想找出一種方法來“修複”DCL模式。你的修複也不會起作用：這裏麵有很微妙的原因。在理解了這些原因之後，可能想進一步進行修複，但仍不會正常工作，因為存在更微妙的原因。

很多聰明的人在這上麵花費了很多時間。除了在每個線程訪問helper對象時執行鎖操作別無他法。

不起作用的第一個原因

最顯而易見的原因是，Helper對象初始化時的寫操作與寫入helper字段的操作可以是無序的。這樣的話，如果某個線程調用getHelper()可能看到helper字段指向了一個Helper對象，但看到該對象裏的字段值卻是默認值，而不是在Helper構造方法裏設置的那些值。

如果編譯器將調用內聯到構造方法中，那麼，如果編譯器能證明構造方法不會拋出異常或執行同步操作，初始化對象的這些寫操作與hepler字段的寫操作之間就能自由的重排序。

即便編譯器不對這些寫操作重排序，在多處理器上，某個處理器或內存係統也可能重排序這些寫操作，運行在其它處理器上的線程就可能看到重排序帶來的結果。

Doug Lea寫了一篇更詳細的有關編譯器重排序的文章。

展示其不起作用的測試案例

Paul Jakubik找到了一個使用DCL不能正常工作的例子。下麵的代碼做了些許整理：

001 public class DoubleCheckTest

002 {

003

004

005 // static data to aid in creating N singletons

006 static final Object dummyObject = new Object(); // for reference init

007 static final int A_VALUE = 256; // value to initialize 'a' to

008 static final int B_VALUE = 512; // value to initialize 'b' to

009 static final int C_VALUE = 1024;

010 static ObjectHolder[] singletons; // array of static references

011 static Thread[] threads; // array of racing threads

012 static int threadCount; // number of threads to create

013 static int singletonCount; // number of singletons to create

014

015

016 static volatile int recentSingleton;

017

018

019 // I am going to set a couple of threads racing,

020 // trying to create N singletons. Basically the

021 // race is to initialize a single array of

022 // singleton references. The threads will use

023 // double checked locking to control who

024 // initializes what. Any thread that does not

025 // initialize a particular singleton will check

026 // to see if it sees a partially initialized view.

027 // To keep from getting accidental synchronization,

028 // each singleton is stored in an ObjectHolder

029 // and the ObjectHolder is used for

030 // synchronization. In the end the structure

031 // is not exactly a singleton, but should be a

032 // close enough approximation.

033 //

034

035

036 // This class contains data and simulates a

037 // singleton. The static reference is stored in

038 // a static array in DoubleCheckFail.

039 static class Singleton

040 {

041 public int a;

042 public int b;

043 public int c;

044 public Object dummy;

045

046 public Singleton()

047 {

048 a = A_VALUE;

049 b = B_VALUE;

050 c = C_VALUE;

051 dummy = dummyObject;

052 }

053 }

054

055 static void checkSingleton(Singleton s, int index)

056 {

057 int s_a = s.a;

058 int s_b = s.b;

059 int s_c = s.c;

060 Object s_d = s.dummy;

061 if(s_a != A_VALUE)

062 System.out.println("[" + index + "] Singleton.a not initialized " +

063 s_a);

064 if(s_b != B_VALUE)

065 System.out.println("[" + index

066 + "] Singleton.b not intialized " + s_b);

067

068 if(s_c != C_VALUE)

069 System.out.println("[" + index

070 + "] Singleton.c not intialized " + s_c);

071

072 if(s_d != dummyObject)

073 if(s_d == null)

074 System.out.println("[" + index

075 + "] Singleton.dummy not initialized,"

076 + " value is null");

077 else

078 System.out.println("[" + index

079 + "] Singleton.dummy not initialized,"

080 + " value is garbage");

081 }

082

083 // Holder used for synchronization of

084 // singleton initialization.

085 static class ObjectHolder

086 {

087 public Singleton reference;

088 }

089

090 static class TestThread implements Runnable

091 {

092 public void run()

093 {

094 for(int i = 0; i < singletonCount; ++i)

095 {

096 ObjectHolder o = singletons[i];

097 if(o.reference == null)

098 {

099 synchronized(o)

100 {

101 if (o.reference == null) {

102 o.reference = new Singleton();

103 recentSingleton = i;

104 }

105 // shouldn't have to check singelton here

106 // mutex should provide consistent view

107 }

108 }

109 else {

110 checkSingleton(o.reference, i);

111 int j = recentSingleton-1;

112 if (j > i) i = j;

113 }

114 }

115 }

116 }

117

118 public static void main(String[] args)

119 {

120 if( args.length != 2 )

121 {

122 System.err.println("usage: java DoubleCheckFail" +

123 " <numThreads> <numSingletons>");

124 }

125 // read values from args

126 threadCount = Integer.parseInt(args[0]);

127 singletonCount = Integer.parseInt(args[1]);

128

129 // create arrays

130 threads = new Thread[threadCount];

131 singletons = new ObjectHolder[singletonCount];

132

133 // fill singleton array

134 for(int i = 0; i < singletonCount; ++i)

135 singletons[i] = new ObjectHolder();

136

137 // fill thread array

138 for(int i = 0; i < threadCount; ++i)

139 threads[i] = new Thread( new TestThread() );

140

141 // start threads

142 for(int i = 0; i < threadCount; ++i)

143 threads[i].start();

144

145 // wait for threads to finish

146 for(int i = 0; i < threadCount; ++i)

147 {

148 try

149 {

150 System.out.println("waiting to join " + i);

151 threads[i].join();

152 }

153 catch(InterruptedException ex)

154 {

155 System.out.println("interrupted");

156 }

157 }

158 System.out.println("done");

159 }

160 }

當上述代碼運行在使用Symantec JIT的係統上時，不能正常工作。尤其是，Symantec JIT將

`1`	`singletons[i].reference =` `new` `Singleton();`

編譯成了下麵這個樣子（Symantec JIT用了一種基於句柄的對象分配係統）。

0206106A   mov         eax,0F97E78h
0206106F   call        01F6B210                  ; allocate space for
                                                 ; Singleton, return result in eax
02061074   mov         dword ptr [ebp],eax       ; EBP is &singletons[i].reference 
                                                ; store the unconstructed object here.
02061077   mov         ecx,dword ptr [eax]       ; dereference the handle to
                                                 ; get the raw pointer
02061079   mov         dword ptr [ecx],100h      ; Next 4 lines are
0206107F   mov         dword ptr [ecx+4],200h    ; Singleton's inlined constructor
02061086   mov         dword ptr [ecx+8],400h
0206108D   mov         dword ptr [ecx+0Ch],0F84030h

如你所見，賦值給singletons[i].reference的操作在Singleton構造方法之前做掉了。在現有的Java內存模型下這完全是允許的，在c和c++中也是合法的（因為c/c++都沒有內存模型(譯者注：這篇文章寫作時間較久，c++11已經有內存模型了)）。

一種不起作用的“修複”

基於前文解釋的原因，一些人提出了下麵的代碼：

`01`	`// (Still) Broken multithreaded version`

`02`	`// "Double-Checked Locking" idiom`

`03`	`class` `Foo {`

`04`	`private` `Helper helper =` `null;`

`05`	`public` `Helper getHelper() {`

`06`	`if` `(helper ==` `null) {`

`07`	`Helper h;`

`08`	`synchronized(this) {`

`09`	`h = helper;`

`10`	`if` `(h ==` `null)`

`11`	`synchronized` `(this) {`

`12`	`h =` `new` `Helper();`

`13`	`}` `// release inner synchronization lock`

`14`	`helper = h;`

15 }

16 }

`17`	`return` `helper;`

18 }

`19`	`// other functions and members...`

20 }

將創建Helper對象的代碼放到了一個內部的同步塊中。直覺的想法是，在退出同步塊的時候應該有一個內存屏障，這會阻止Helper的初始化與helper字段賦值之間的重排序。

很不幸，這種直覺完全錯了。同步的規則不是這樣的。monitorexit（即，退出同步塊）的規則是，在monitorexit前麵的action必須在該monitor釋放之前執行。但是，並沒有哪裏有規定說monitorexit後麵的action不可以在monitor釋放之前執行。因此，編譯器將賦值操作helper = h;挪到同步塊裏麵是非常合情合理的，這就回到了我們之前說到的問題上。許多處理器提供了這種單向的內存屏障指令。如果改變鎖釋放的語義 —— 釋放時執行一個雙向的內存屏障 —— 將會帶來性能損失。

值得費這麼大勁嗎？

對於大部分應用來說，將getHelper()變成同步方法的代價並不高。隻有當你知道這確實造成了很大的應用開銷時才應該考慮這種細節的優化。

通常，更高級別的技巧，如，使用內部的歸並排序，而不是交換排序（見SPECJVM DB的基準），帶來的影響更大。

讓靜態單例生效

如果你要創建的是static單例對象（即，隻會創建一個Helper對象），這裏有個簡單優雅的解決方案。

隻需將singleton變量作為另一個類的靜態字段。Java的語義保證該字段被引用前是不會被初始化的，且任一訪問該字段的線程都會看到由初始化該字段所引發的所有寫操作。

`1`	`class` `HelperSingleton {`

`2`	`static` `Helper singleton =` `new` `Helper();`

3 }

對32位的基本類型變量DCL是有效的

雖然DCL模式不能用於對象引用，但可以用於32位的基本類型變量。注意，DCL也不能用於對long和double類型的基本變量，因為不能保證未同步的64位基本變量的讀寫是原子操作。

`01`	`// Correct Double-Checked Locking for 32-bit primitives`

`02`	`class` `Foo {`

`03`	`private` `int` `cachedHashCode =` `0;`

`04`	`public` `int` `hashCode() {`

`05`	`int` `h = cachedHashCode;`

`06`	`if` `(h ==` `0)`

`07`	`synchronized(this) {`

`08`	`if` `(cachedHashCode !=` `0)` `return` `cachedHashCode;`

`09`	`h = computeHashCode();`

`10`	`cachedHashCode = h;`

11 }

`12`	`return` `h;`

13 }

`14`	`// other functions and members...`

15 }

事實上，如果computeHashCode方法總是返回相同的結果且沒有其它附屬作用時（即，computeHashCode是個冪等方法），甚至可以消除這裏的所有同步。

`01`	`// Lazy initialization 32-bit primitives`

`02`	`// Thread-safe if computeHashCode is idempotent`

`03`	`class` `Foo {`

`04`	`private` `int` `cachedHashCode =` `0;`

`05`	`public` `int` `hashCode() {`

`06`	`int` `h = cachedHashCode;`

`07`	`if` `(h ==` `0) {`

`08`	`h = computeHashCode();`

`09`	`cachedHashCode = h;`

10 }

`11`	`return` `h;`

12 }

`13`	`// other functions and members...`

14 }

用顯式的內存屏障使DCL有效

如果有顯式的內存屏障指令可用，則有可能使DCL生效。例如，如果你用的是C++，可以參考來自Doug Schmidt等人所著書中的代碼：

`01`	`// C++ implementation with explicit memory barriers`

`02`	`// Should work on any platform, including DEC Alphas`

`03`	`// From "Patterns for Concurrent and Distributed Objects",`

`04`	`// by Doug Schmidt`

`05`	`template` `<class` `TYPE,` `class` `LOCK> TYPE *`

`06`	`Singleton<TYPE, LOCK>::instance (void) {`

`07`	`// First check`

`08`	`TYPE* tmp = instance_;`

`09`	`// Insert the CPU-specific memory barrier instruction`

`10`	`// to synchronize the cache lines on multi-processor.`

`11`	`asm ("memoryBarrier");`

`12`	`if` `(tmp == 0) {`

`13`	`// Ensure serialization (guard`

`14`	`// constructor acquires lock_).`

`15`	`Guard<LOCK> guard (lock_);`

`16`	`// Double check.`

`17`	`tmp = instance_;`

`18`	`if` `(tmp == 0) {`

`19`	`tmp =` `new` `TYPE;`

`20`	`// Insert the CPU-specific memory barrier instruction`

`21`	`// to synchronize the cache lines on multi-processor.`

`22`	`asm ("memoryBarrier");`

`23`	`instance_ = tmp;`

24 }

`25`	`return` `tmp;`

26 }

用線程局部存儲來修複DCL

Alexander Terekhov (TEREKHOV@de.ibm.com)提出了個能實現DCL的巧妙的做法 —— 使用線程局部存儲。每個線程各自保存一個flag來表示該線程是否執行了同步。

`01`	`class` `Foo {`

`02`	`/** If perThreadInstance.get() returns a non-null value, this thread`

`03`	`has done synchronization needed to see initialization`

`04`	`of helper */`

`05`	`private` `final` `ThreadLocal perThreadInstance =` `new` `ThreadLocal();`

`06`	`private` `Helper helper =` `null;`

`07`	`public` `Helper getHelper() {`

`08`	`if` `(perThreadInstance.get() ==` `null) createHelper();`

`09`	`return` `helper;`

10 }

`11`	`private` `final` `void` `createHelper() {`

`12`	`synchronized(this) {`

`13`	`if` `(helper ==` `null)`

`14`	`helper =` `new` `Helper();`

15 }

`16`	`// Any non-null value would do as the argument here`

`17`	`perThreadInstance.set(perThreadInstance);`

18 }

19 }

這種方式的性能嚴重依賴於所使用的JDK實現。在Sun 1.2的實現中，ThreadLocal是非常慢的。在1.3中變得更快了，期望能在1.4上更上一個台階。Doug Lea分析了一些延遲初始化技術實現的性能

在新的Java內存模型下

JDK5使用了新的Java內存模型和線程規範。

用volatile修複DCL

JDK5以及後續版本擴展了volatile語義，不再允許volatile寫操作與其前麵的讀寫操作重排序，也不允許volatile讀操作與其後麵的讀寫操作重排序。更多詳細信息見Jeremy Manson的博客。

這樣，就可以將helper字段聲明為volatile來讓DCL生效。。

`01`	`// Works with acquire/release semantics for volatile`

`02`	`// Broken under current semantics for volatile`

`03`	`class` `Foo {`

`04`	`private` `volatile` `Helper helper =` `null;`

`05`	`public` `Helper getHelper() {`

`06`	`if` `(helper ==` `null) {`

`07`	`synchronized(this) {`

`08`	`if` `(helper ==` `null)`

`09`	`helper =` `new` `Helper();`

10 }

11 }

`12`	`return` `helper;`

13 }

14 }

不可變對象的DCL

如果Helper是個不可變對象，那麼Helper中的所有字段都是final的，那麼不使用volatile也能使DCL生效。主要是因為指向不可變對象的引用應該表現出形如int和float一樣的行為；讀寫不可變對象的引用是原子操作。

文章轉自並發編程網-ifeve.com

最後更新：2017-05-22 17:31:48

有關“雙重檢查鎖定失效”的說明

不起作用

不起作用的第一個原因

展示其不起作用的測試案例

一種不起作用的“修複”

更多不起作用的“修複”

值得費這麼大勁嗎？

讓靜態單例生效

對32位的基本類型變量DCL是有效的

用顯式的內存屏障使DCL有效

用線程局部存儲來修複DCL

在新的Java內存模型下

用volatile修複DCL

不可變對象的DCL

上一篇： Java HotSpot VM中的JIT編譯

下一篇：（十二）都要好好的

相關內容

熱門內容

最新內容

有關“雙重檢查鎖定失效”的說明

不起作用

不起作用的第一個原因

展示其不起作用的測試案例

一種不起作用的“修複”

更多不起作用的“修複”

值得費這麼大勁嗎？

讓靜態單例生效

對32位的基本類型變量DCL是有效的

用顯式的內存屏障使DCL有效

用線程局部存儲來修複DCL

在新的Java內存模型下

用volatile修複DCL

不可變對象的DCL

上一篇： Java HotSpot VM中的JIT編譯

下一篇： （十二）都要好好的

相關內容

熱門內容

最新內容

下一篇：（十二）都要好好的