338
阿裏雲
技術社區[雲棲]
JDK的sql設計不合理導致的驅動類初始化死鎖問題
問題描述
當我們一個係統既需要mysql驅動,也需要oracle驅動的時候,在並發加載初始化這些驅動類的過程中產生死鎖的可能性非常大,下麵是一個模擬的例子,對於Thread2的實現其實是jdk裏java.sql.DriverService的邏輯,也是我們第一次調用java.sql.DriverManager.registerDriver注冊一個驅動實例要走的邏輯(jdk1.6下),不過這篇文章是使用我們生產環境的一個係統的線程dump和內存dump為基礎進行分析展開的。
01 |
import java.util.Iterator;
|
03 |
import sun.misc.Service;
|
06 |
public static void main(String[] args) throws ClassNotFoundException {
|
07 |
Thread1 thread1 = new Thread1();
|
08 |
Thread2 thread2 = new Thread2();
|
14 |
class Thread1 extends Thread {
|
17 |
Class clazz = Class.forName( "com.mysql.jdbc.Driver" , true , Thread.currentThread()
|
18 |
.getContextClassLoader());
|
19 |
System.out.println(clazz);
|
20 |
} catch (ClassNotFoundException e) {
|
25 |
class Thread2 extends Thread {
|
27 |
Iterator ps = Service.providers(java.sql.Driver. class );
|
29 |
while (ps.hasNext()) {
|
30 |
System.out.println(ps.next());
|
32 |
} catch (Throwable t) {
|
如果以上代碼運行過程中發現有線程一直卡死在Class.forName的調用裏,那麼說明問題已經重現了。
先上兩張圖


下麵是問題進程線程dump的分析”thread dump”,還有這個進程的內存dump分析”heap dump”,這個後麵展開分析的基礎
存疑點
仔細看看上麵的線程dump分析和內存dump分析裏的線程分析模塊,您可能會有如下兩個疑惑:
【為什麼線程”Thread-0″一直卡在Class.forName的位置】:這有點出乎意料,做一個類加載要麼找不到拋出ClassNotFoundException,要麼找到直接返回,為什麼會一直卡在這個位置呢?
【明明”Thread-0″注冊的是mysql驅動為什麼會去加載Odbc的驅動類】:通過”Thread-0″在棧上看倒數第二幀展開看到傳入Class.forName的參數是com.mysql.jdbc.Driver,然後展開棧上順序第二幀,看到傳入的參數是sun.jdbc.odbc.JdbcOdbcDriver,這意味著在對mysql驅動類做加載初始化的過程中又觸發了JdbcOdbc驅動類的加載
疑惑點解釋
疑惑二:
第一個疑惑我們先留著,先解釋下第二個疑惑,大家可以對照堆棧通過反編譯rt.jar還有ojdbc6-11.2.0.3.0.jar看具體的代碼
驅動類加載過程簡要介紹:
當要注冊某個sql驅動的時候是通過調用java.sql.DriverManager.registerDriver來實現的(注意這個方法加了synchronized關鍵字,後麵解釋第一個疑惑的時候是關鍵),而這個方法在第一次執行過程中,會在當前線程classloader的classpath下尋找所有/META-INF/services/java.sql.Driver文件,這個文件在mysql和oracle驅動jar裏都有,裏麵寫的是對應的驅動實現類名,這種機製是jdk提供的spi實現,找到這些文件之後,依次使用Class.forName(driverClassName, true, this.loader)來對這些驅動類進行加載,其中第二個參數是true,意味著不僅僅做一次loadClass的動作,還會初始化該類,即調用包含靜態塊的< clinit >方法,執行完之後才會返回,這樣就解釋了第二個疑惑,在mysql驅動注冊過程中還會對odbc驅動類進行加載並初始化
感想:
其實我覺得這種設計有點傻,為什麼要幹和自己不相關的事情呢,畫蛇添足的設計,首先類初始化的開銷是否放到一起做並沒有多大區別,其次正由於這種設計導致了今天這個死鎖的發生
疑惑一:
現在來說第一個疑惑,為什麼會一直卡在Class.forName呢,到底卡在哪裏,於是再通過jstack -m 命令將jvm裏的堆棧也打印出來,如下所示
01 |
----------------- 5738 -----------------
|
02 |
0x003f67a2 _dl_sysinfo_int80 + 0x2
|
03 |
0xb79a71ae _ZN2os13PlatformEvent4parkEv + 0xee
|
04 |
0xb7997acb _ZN13ObjectMonitor4waitExbP6Thread + 0x5fb
|
05 |
0xb7a73c53 _ZN18ObjectSynchronizer19waitUninterruptiblyE6HandlexP6Thread + 0x53
|
06 |
0xb777eb34 _ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread + 0x74
|
07 |
0xb777e288 _ZN13instanceKlass10initializeEP6Thread + 0x58
|
08 |
0xb7821ad9 _Z28find_class_from_class_loaderP7JNIEnv_12symbolHandleh6HandleS2_hP6Thread + 0xb9
|
09 |
0xb7807d99 JVM_FindClassFromClassLoader + 0x269
|
10 |
0xb734c236 Java_java_lang_Class_forName0 + 0x116
|
11 |
0xb433064a * java.lang.Class.forName0(java.lang.String, boolean , java.lang.ClassLoader) bci: 0 (Interpreted frame)
|
12 |
0xb4328fa7 * java.lang.Class.forName(java.lang.String, boolean , java.lang.ClassLoader) bci: 32 line: 247 (Interpreted frame)
|
13 |
0xb4328fa7 * sun.misc.Service$LazyIterator.next() bci: 31 line: 271 (Interpreted frame)
|
14 |
0xb4329483 * java.sql.DriverService.run() bci: 26 line: 664 (Interpreted frame)
|
16 |
0xb77a4e31 _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread + 0x1c1
|
17 |
0xb79a6418 _ZN2os20os_exception_wrapperEPFvP9JavaValueP12methodHandleP17JavaCallArgumentsP6ThreadES1_S3_S5_S7_ + 0x18
|
18 |
0xb77a4c5f _ZN9JavaCalls4callEP9JavaValue12methodHandleP17JavaCallArgumentsP6Thread + 0x2f
|
19 |
0xb780aace JVM_DoPrivileged + 0x40e
|
20 |
0xb734b95d Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedAction_2 + 0x3d
|
21 |
0xb433064a * java.security.AccessController.doPrivileged(java.security.PrivilegedAction) bci: 0 (Interpreted frame)
|
22 |
0xb4328fa7 * java.sql.DriverManager.loadInitialDrivers() bci: 31 line: 506 (Interpreted frame)
|
23 |
0xb432910d * java.sql.DriverManager.initialize() bci: 11 line: 612 (Interpreted frame)
|
24 |
0xb432910d * java.sql.DriverManager.registerDriver(java.sql.Driver) bci: 6 line: 281 (Interpreted frame)
|
25 |
0xb432910d * com.mysql.jdbc.Driver.() bci: 7 line: 65 (Interpreted frame)
|
27 |
0xb77a4e31 _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread + 0x1c1
|
28 |
0xb79a6418 _ZN2os20os_exception_wrapperEPFvP9JavaValueP12methodHandleP17JavaCallArgumentsP6ThreadES1_S3_S5_S7_ + 0x18
|
29 |
0xb77a4c5f _ZN9JavaCalls4callEP9JavaValue12methodHandleP17JavaCallArgumentsP6Thread + 0x2f
|
30 |
0xb77800c1 _ZN13instanceKlass27call_class_initializer_implE19instanceKlassHandleP6Thread + 0xa1
|
31 |
0xb777ed8e _ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread + 0x2ce
|
32 |
0xb777e288 _ZN13instanceKlass10initializeEP6Thread + 0x58
|
33 |
0xb7821ad9 _Z28find_class_from_class_loaderP7JNIEnv_12symbolHandleh6HandleS2_hP6Thread + 0xb9
|
34 |
0xb7807d99 JVM_FindClassFromClassLoader + 0x269
|
35 |
0xb734c236 Java_java_lang_Class_forName0 + 0x116
|
36 |
0xb433064a * java.lang.Class.forName0(java.lang.String, boolean , java.lang.ClassLoader) bci: 0 (Interpreted frame)
|
37 |
0xb4328fa7 * java.lang.Class.forName(java.lang.String, boolean , java.lang.ClassLoader) bci: 32 line: 247 (Interpreted frame)
|
38 |
0xb4328fa7 * Thread1.run() bci: 9 line: 17 (Interpreted frame)
|
我們看到其實正在做類的初始化動作,並且線程正在調用ObjectSynchronizer::waitUninterruptibly一直沒返回,在看這方法的調用者instanceKlass1::initialize_impl,我們找到源碼位置如下:
01 |
void instanceKlass::initialize_impl(instanceKlassHandle this_oop, TRAPS) {
|
02 |
// Make sure klass is linked (verified) before initialization
|
03 |
// A class could already be verified, since it has been reflected upon.
|
04 |
this_oop->link_class(CHECK);
|
06 |
DTRACE_CLASSINIT_PROBE(required, instanceKlass::cast(this_oop()), - 1 );
|
10 |
// refer to the JVM book page 47 for description of steps
|
12 |
{ ObjectLocker ol(this_oop, THREAD);
|
14 |
Thread *self = THREAD; // it's passed the current thread
|
17 |
// If we were to use wait() instead of waitInterruptibly() then
|
18 |
// we might end up throwing IE from link/symbol resolution sites
|
19 |
// that aren't expected to throw. This would wreak havoc. See 6320309.
|
20 |
while (this_oop->is_being_initialized() && !this_oop->is_reentrant_initialization(self)) {
|
22 |
ol.waitUninterruptibly(CHECK);
|
26 |
if (this_oop->is_being_initialized() && this_oop->is_reentrant_initialization(self)) {
|
27 |
DTRACE_CLASSINIT_PROBE_WAIT(recursive, instanceKlass::cast(this_oop()), - 1 ,wait);
|
32 |
if (this_oop->is_initialized()) {
|
33 |
DTRACE_CLASSINIT_PROBE_WAIT(concurrent, instanceKlass::cast(this_oop()), - 1 ,wait);
|
38 |
if (this_oop->is_in_error_state()) {
|
39 |
DTRACE_CLASSINIT_PROBE_WAIT(erroneous, instanceKlass::cast(this_oop()), - 1 ,wait);
|
40 |
ResourceMark rm(THREAD);
|
41 |
const char * desc = "Could not initialize class " ;
|
42 |
const char * className = this_oop->external_name();
|
43 |
size_t msglen = strlen(desc) + strlen(className) + 1 ;
|
44 |
char * message = NEW_RESOURCE_ARRAY( char , msglen);
|
45 |
if (NULL == message) {
|
46 |
// Out of memory: can't create detailed error message
|
47 |
THROW_MSG(vmSymbols::java_lang_NoClassDefFoundError(), className);
|
49 |
jio_snprintf(message, msglen, "%s%s" , desc, className);
|
50 |
THROW_MSG(vmSymbols::java_lang_NoClassDefFoundError(), message);
|
55 |
this_oop->set_init_state(being_initialized);
|
56 |
this_oop->set_init_thread(self);
|
類的初始化過程:
當某個線程獲得機會對某個類進行初始化的時候(請看上麵的Step 6),會設置這個類的init_state屬性為being_initialized(如果初始化好了會設置為fully_initialized,異常的話會設置為initialization_error),還會設置init_thread屬性為當前線程,在這個設置過程中是有針對這個類提供了一把互斥鎖的,因此當有別的線程進來的時候會被攔截在外麵,如果設置完了,這把互斥鎖也釋放了,但是因為這個類的狀態被設置了,因此並發問題也得到了解決,當另外一個線程也嚐試初始化這個類的時候會判斷這個類的狀態是不是being_initialized,並且其init_thread不是當前線程,那麼就會一直卡在那裏,也就是此次線程dump的線程所處的狀態,正在初始化類的線程會調用< clinit >方法,如果正常結束了,那麼就設置其狀態為fully_initialized,並且通知之前卡在那裏等待初始化完成的線程,然他們繼續往下走(下一個動作就是再判斷下狀態,發現完成了就直接return了)
猜想:
在了解了上麵的過程之後,於是我們猜測兩種可能
第一,這個類的狀態還是being_intialized,還在while循環裏沒有跳出來
第二,事件通知機製出現了問題,也就是pthread_cond_wait和pthread_cond_signal之間的通信過程出現了問題。
不過第二種可能性非常小,比較linux久經考驗了,那接下來我們驗證其實是第一個猜想
驗證:
我們通過GDB attach的方式連到了問題機器上(好在機器沒有掛),首先我們要找到具體的問題線程,我們通過上麵的jstack -m命令看到了線程ID是5738,然後通過info threads找到對應的線程,並得到它的序號14
02 |
17 process 5724 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
03 |
16 process 6878 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
04 |
15 process 5739 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
05 |
14 process 5738 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
06 |
13 process 5737 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
07 |
12 process 5736 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
08 |
11 process 5735 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
09 |
10 process 5734 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
10 |
9 process 5733 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
11 |
8 process 5732 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
12 |
7 process 5731 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
13 |
6 process 5730 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
14 |
5 process 5729 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
15 |
4 process 5728 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
16 |
3 process 5727 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
17 |
2 process 5726 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
18 |
1 process 5725 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
然後通過thread 14切換到對應的線程,並通過bt看到了如下的堆棧,正如我們想象的那樣,正在做類的初始化,一直卡在那裏
02 |
[Switching to thread 14 (process 5738 )]# 0 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
04 |
# 0 0x003f67a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so. 2
|
05 |
# 1 0x005e0d76 in pthread_cond_wait@ @GLIBC_2 . 3.2 () from /lib/tls/i686/nosegneg/libpthread.so. 0
|
06 |
# 2 0x005e13ee in pthread_cond_wait @GLIBC_2 . 0 () from /lib/tls/i686/nosegneg/libpthread.so. 0
|
07 |
# 3 0xb79a71ae in os::PlatformEvent::park () from /home/opt/taobao/install/jdk1. 6 .0_33/jre/lib/i386/server/libjvm.so
|
08 |
# 4 0xb7997acb in ObjectMonitor::wait () from /home/opt/taobao/install/jdk1. 6 .0_33/jre/lib/i386/server/libjvm.so
|
09 |
# 5 0xb7a73c53 in ObjectSynchronizer::waitUninterruptibly () from /home/opt/taobao/install/jdk1. 6 .0_33/jre/lib/i386/server/libjvm.so
|
10 |
# 6 0xb777eb34 in instanceKlass::initialize_impl () from /home/opt/taobao/install/jdk1. 6 .0_33/jre/lib/i386/server/libjvm.so
|
11 |
# 7 0xb777e288 in instanceKlass::initialize () from /home/opt/taobao/install/jdk1. 6 .0_33/jre/lib/i386/server/libjvm.so
|
12 |
# 8 0xb7821ad9 in find_class_from_class_loader () from /home/opt/taobao/install/jdk1. 6 .0_33/jre/lib/i386/server/libjvm.so
|
13 |
# 9 0xb7807d99 in JVM_FindClassFromClassLoader () from /home/opt/taobao/install/jdk1. 6 .0_33/jre/lib/i386/server/libjvm.so
|
14 |
# 10 0xb734c236 in Java_java_lang_Class_forName0 () from /home/opt/taobao/install/jdk1. 6 .0_33/jre/lib/i386/libjava.so
|
15 |
# 11 0xb433064a in ?? ()
|
16 |
# 12 0x0813b120 in ?? ()
|
17 |
# 13 0x70aaa690 in ?? ()
|
18 |
# 14 0x70aaa6a0 in ?? ()
|
19 |
# 15 0x00000001 in ?? ()
|
20 |
# 16 0x70aaa698 in ?? ()
|
21 |
# 17 0x00000000 in ?? ()
|
我們通過f 6選擇第7幀,在通過disassemble反匯編該幀,也就是對instanceKlass::initialize_impl ()這個方法反匯編
01 |
0xb777eaed <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 45 >: lea 0xfffffff4 (%ebp),%esp //將%ebp偏移0xfffffff4位置的值存到%esp棧頂,然後下麵的pop操作存到%ebx
|
02 |
0xb777eaf0 <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 48 >: pop %ebx
|
03 |
0xb777eaf1 <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 49 >: pop %esi
|
04 |
0xb777eaf2 <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 50 >: pop %edi
|
05 |
0xb777eaf3 <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 51 >: pop %ebp
|
06 |
0xb777eaf4 <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 52 >: ret
|
07 |
0xb777eaf5 <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 53 >: push $ 0x1
|
08 |
0xb777eaf7 <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 55 >: lea 0xffffffd8 (%ebp),%edx
|
09 |
0xb777eafa <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 58 >: push %esi
|
10 |
0xb777eafb <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 59 >: push %ebx
|
11 |
0xb777eafc <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 60 >: push %edx
|
12 |
0xb777eafd <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 61 >: call 0xb7a73a80 <_ZN12ObjectLockerC1E6HandleP6Threadb>
|
13 |
0xb777eb02 <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 66 >: add $ 0x10 ,%esp
|
14 |
0xb777eb05 <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 69 >: xor %eax,%eax
|
15 |
0xb777eb07 <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 71 >: test %ebx,%ebx
|
16 |
0xb777eb09 <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 73 >: je 0xb777eb0d <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 77 >
|
17 |
0xb777eb0b <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 75 >: mov (%ebx),%eax //將%ebx的值移到%eax
|
18 |
0xb777eb0d <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 77 >: cmpl $ 0x4 , 0xe0 (%eax) //對比%eax偏移0xe0位置的值和0x4(這個值其實就是上麵提到的being_initialized狀態,這就說明了%eax偏移0xe0位置其實存的就是初始化類的初始化狀態)
|
19 |
0xb777eb14 <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 84 >: jne 0xb777eb4f <_ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread+ 143 >
|
從上麵的注釋我們其實得出了,我們要看當前類的初始化狀態,那就是看eax寄存器偏移0xe0的位置的值,而eax其實就是ebp寄存器偏移0xfffffff4位置的值,於是我們通過如下地址內存查到得到是4
1 |
(gdb) x $ebp + 0xfffffff4
|
3 |
(gdb) x/3w 0x71af2180 + 0xe0
|
4 |
0x71af2260 : 0x00000004 0x0813c800 0x0000001a
|
而4其實代表的就是being_initialized這個狀態,代碼如下
從這於是我們驗證了第一個猜想,其實是狀態一直沒有變更,因此一直卡在那裏,為了更進一步確認這個問題,要是我們能找到該類的init_thread線程id就更清楚了,拿到這個ID我們就能看到這個線程棧,就知道它在幹什麼了,但是很遺憾,這個很難獲取到,至少我一直沒有找到辦法,因為線程ID在線程對象裏一直沒有存,都是調用的os函數來獲取的,得換個思路。
突然發現instanceKlass.hpp代碼中得知兩個屬性原來是相鄰的(init_state和init_thread),於是斷定下一個地址的值就代表是這個線程對象了,但是其屬性何其多,找到想要的太不易了,最主要的是還擔心自己看的代碼和服務器上的jvm代碼不一致,這樣更蛋疼了,於是繼續查看Thread.hpp中的JavaThread類,找到個關鍵字0xDEAD-2=0xDEAB,這個有可能是volatile TerminatedTypes _terminated屬性的值,於是把線程對象打印出來,果然查到了關鍵字0xDEAB
01 |
(gdb) x/100w 0x0813c800
|
02 |
0x813c800 : 0xb7bc06e8 0x00000000 0x00000000 0x00000000
|
03 |
0x813c810 : 0x0813c488 0x0813d2c8 0x00000000 0x00000000
|
04 |
0x813c820 : 0x080f9bf8 0x080f8b50 0x70a59b60 0x00000000
|
05 |
0x813c830 : 0x00000000 0x00000000 0x00000000 0x00000000
|
06 |
0x813c840 : 0x00014148 0x00000505 0x00000000 0x00000000
|
07 |
0x813c850 : 0x00000000 0x00000000 0x00000000 0x3f800021
|
08 |
0x813c860 : 0x00000001 0x00000023 0x3f800021 0x0001b530
|
09 |
0x813c870 : 0x00000000 0x00000000 0x00000000 0x080ffdc0
|
10 |
0x813c880 : 0x00000001 0x00000000 0x080ffe24 0x00000014
|
11 |
0x813c890 : 0x00000031 0x00000000 0x00000000 0x0813dab0
|
12 |
0x813c8a0 : 0x0813c428 0x0813ce98 0x70a5b000 0x00051000
|
13 |
0x813c8b0 : 0x00000000 0xffffffff 0x00000000 0x080ffdc0
|
14 |
0x813c8c0 : 0x00002bad 0x0813d400 0x0813d500 0x0813d700
|
15 |
0x813c8d0 : 0x0813d800 0x00000000 0x00000000 0x104aa1ad
|
16 |
0x813c8e0 : 0x544a5ab2 0x32378fc7 0x00008767 0x00000000
|
17 |
0x813c8f0 : 0x00000000 0x00000000 0x0ee9547d 0x00000000
|
18 |
0x813c900 : 0x00000000 0x00000000 0x0813b000 0x75878760
|
19 |
0x813c910 : 0x70a59a94 0x00000000 0x70a59abc 0xb7829020
|
20 |
0x813c920 : 0xb7bb7100 0x00000000 0x00000000 0x00000000
|
21 |
0x813c930 : 0x00000000 0x00000000 0x00000000 0x00000000
|
22 |
0x813c940 : 0x00000000 0x00000000 0x00000000 0x00000000
|
23 |
0x813c950 : 0x00000000 0x00000000 0x00000000 0x0000000a
|
24 |
0x813c960 : 0x0813da98 0x00000000 0x0000deab 0x00000001
|
25 |
0x813c970 : 0x00000000 0x00000000 0x00000002 0x00000000
|
26 |
0x813c980 : 0x00000000 0x00000000 0x00000000 0x00000000
|
因此順著這個屬性繼續往上找,找到了_thread_state表示線程狀態的值(向上偏移三個字),0x0000000a,即10,然後查看代碼知道原來線程是出於block狀態
02 |
volatile JavaThreadState _thread_state;
|
04 |
ThreadSafepointState *_safepoint_state; |
05 |
address _saved_exception_pc; |
06 |
volatile TerminatedTypes _terminated;
|
07 |
enum JavaThreadState {
|
08 |
_thread_uninitialized = 0 , // should never happen (missing initialization)
|
09 |
_thread_new = 2 , // just starting up, i.e., in process of being initialized
|
10 |
_thread_new_trans = 3 , // corresponding transition state (not used, included for completness)
|
11 |
_thread_in_native = 4 , // running in native code
|
12 |
_thread_in_native_trans = 5 , // corresponding transition state
|
13 |
_thread_in_vm = 6 , // running in VM
|
14 |
_thread_in_vm_trans = 7 , // corresponding transition state
|
15 |
_thread_in_Java = 8 , // running in Java or in stub code
|
16 |
_thread_in_Java_trans = 9 , // corresponding transition state (not used, included for completness)
|
17 |
_thread_blocked = 10 , // blocked in vm
|
18 |
_thread_blocked_trans = 11 , // corresponding transition state
|
19 |
_thread_max_state = 12 // maximum thread state+1 - used for statistics allocation
|
這樣一來查看下線程dump,發現”Thread-1″正好處於BLOCKED狀態,也就是說Thread-1就是那個正在對mysql驅動類做初始化的線程,這說明Thread-0和Thread-1成功互鎖了
於是我們展開【Thread-1】,看到- waiting to lock (a java.lang.Class for java.sql.DriverManager),該線程正在等待java.sql.DriverManager類型鎖,而blocked在那裏,而這個類型鎖是被【Thread-0】線程持有的,從【Thread-1】這個線程堆棧來看它其實也是在做Class.forName動作,並且通過”Thread-1″,展開第四幀我們可以看到其正在對加載sun.jdbc.odbc.JdbcOdbcDriver
問題現場遐想:
於是我們大膽設想一個場景,【Thread-1】先獲取到初始化sun.jdbc.odbc.JdbcOdbcDriver的機會,然後在執行sun.jdbc.odbc.JdbcOdbcDriver這個類的靜態塊的時候調用DriverManager.registerDriver(new Driver());,而該方法之前已經提到了是會加同步鎖的,再想象一下,在這個這個靜態塊之前,並且設置了sun.jdbc.odbc.JdbcOdbcDriver類的初始化狀態為being_initialized之後,【Thread-0】這個線程執行到了卡在的那個位置,並且我們從其堆棧可以看出它已經持有了java.sql.DriverManager這個類型的鎖,因此這兩個線程陷入了互鎖狀態
解決方案
解決方案目前想到的是將驅動類的加載過程變成單線程加載,不存在並發情況就沒問題了。
最後更新:2017-05-23 18:02:54