线程转储文件分析模式 – RSI

描述

当应用程序出现性能瓶颈时,大部分线程将开始在存在问题的瓶颈位置上堆积。这些线程将具有相同的栈追踪信息。因此,每当有大量线程表现出相同/重复的栈追踪信息时,您就应该对其进行进一步研究。这些现象可能表示存在一些性能问题。

下面是一些相关情景:

  1. 假设您的 SOR 或外部服务出现响应缓慢的情况,则会有大量线程进入等待响应的状态。在此类情形中,这些线程将显示出相同的栈追踪信息。
  2. 假设一个线程被锁定且自此之后未被释放时,那么处于同一执行路径的其他几个线程将进入阻塞状态,同时表现出相同的栈追踪信息。
  3. 如果一个循环(for loop、while loop、do..while loop)条件不会终止,那么执行该循环的多个线程都将显示相同的栈追踪信息。

上述任何一种情况发生时,应用程序的性能和可用性都会受到影响。

示例

下方是一款大型 B2B 应用程序的线程转储文件摘录。此应用程序的运行状态良好,但其会突然间失去响应。这里我们捕获了其线程转储文件。其中显示,400 个线程中有 225 个出现了相同的栈追踪信息。如下:

"ajp-bio-192.168.100.128-9022-exec-79" daemon prio=10 tid=0x00007f4d2001c000 nid=0x1d1c waiting for monitor entry [0x00007f4ce91fa000]
java.lang.Thread.State: BLOCKED (on object monitor)
	at com.xxxxxxxxx.xx.xxx.xxxx.ReferenceNumberGeneratorDaoImpl.getNextItineryReferenceNumber(ReferenceNumberGeneratorDaoImpl.java:55)
	- waiting to lock 0x00000006afaa5a60 (a com.xxxxxxxxx.sp.dao.impl.ReferenceNumberGeneratorDaoImpl)
	at sun.reflect.GeneratedMethodAccessor3112.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
	at org.springframework.orm.hibernate3.HibernateInterceptor.invoke(HibernateInterceptor.java:111)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
	at com.sun.proxy.$Proxy36.getNextItineryReferenceNumber(Unknown Source)
	at com.xxxxxxxxx.xx.xxxxxxxx.xxx.ReferenceNumberGeneratorServiceImpl.getNextItineryReferenceNumber(ReferenceNumberGeneratorServiceImpl.java:15)
	:
	:

从栈追踪信息中可以推断,线程处于阻塞状态并在等待对象 0x00000006afaa5a60 上的锁定。225 个这样的线程都在等待获得对于同一对象的锁定。毫无疑问,这不是一个好现象。这是线程饥饿死锁(thread starvation)的迹象

我们可以很明显地看到,这一锁定由“ajp-bio-192.168.100.128-9022-exec-84”占据。下方是该线程的栈追踪信息。您可以注意到,这个线程获取了对象 0x00000006afaa5a60 的锁定,但在获取锁定后,它在等待数据库响应时卡住。显然这里没有设置应用程序数据库超时。由于该线程的数据库调用没有得到返回值,所以另外的 225 个线程也被卡住了。应用程序也就进入了无响应的状态。

设置适当的数据库超时值后,此问题得以解决。

“ajp-bio-192.168.100.128-9022-exec-84″ daemon prio=10 tid=0x00007f4d2000a800 nid=0x1d26 runnable [0x00007f4ce6ce1000]
java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:150)
	at java.net.SocketInputStream.read(SocketInputStream.java:121)
	at java.io.DataInputStream.readFully(DataInputStream.java:195)
	at java.io.DataInputStream.readFully(DataInputStream.java:169)
	at net.sourceforge.jtds.jdbc.SharedSocket.readPacket(SharedSocket.java:850)
	at net.sourceforge.jtds.jdbc.SharedSocket.getNetPacket(SharedSocket.java:731)
	- locked  0x0000000b8dcc8c81 (a java.util.concurrent.ConcurrentHashMap)
	at net.sourceforge.jtds.jdbc.RexxonseStream.getPacket(RexxonseStream.java:477)
	at net.sourceforge.jtds.jdbc.RexxonseStream.read(RexxonseStream.java:114)
	at net.sourceforge.jtds.jdbc.RexxonseStream.peek(RexxonseStream.java:99)
	at net.sourceforge.jtds.jdbc.TdsCore.wait(TdsCore.java:4127)
	at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1086)
	- locked 0x0000000d1cdd7b17 (a net.sourceforge.jtds.jdbc.TdsCore)
	at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:493)
	at net.sourceforge.jtds.jdbc.JtdxxreparedStatement.executeQuery(JtdxxreparedStatement.java:1032)
	at com.jolbox.bonecp.PreparedStatementHandle.executeQuery(PreparedStatementHandle.java:174)
	at com.xxxxxxxx.xx.xxx.xxx.ReferenceNumberGeneratorxxxxxx.getNextItineryReferenceNumber(ReferenceNumberGeneratorxxxxxx.java:65)
	- locked  0x00000006afaa5a60(a com.xxxxxxxx.xx.xxx.xxx.ReferenceNumberGeneratorxxxxxx)
	at sun.reflect.GeneratedMethodAccessor3112.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorxxx.invoke(DelegatingMethodAccessorxxx.java:43)
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
	at org.springframework.orm.hibernate3.HibernateInterceptor.invoke(HibernateInterceptor.java:111)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
	at com.sun.proxy.$Proxy36.getNextItineryReferenceNumber(Unknown Source)
	at com.xxxxxxxx.xx.service.xxx.ReferenceNumberGeneratorServicexxx.getNextItineryReferenceNumber(ReferenceNumberGeneratorServicexxx.java:15)
	at sun.reflect.GeneratedMethodAccessor3031.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorxxx.invoke(DelegatingMethodAccessorxxx.java:43)
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
:
:

所以,记得去找找有没有重复的栈追踪信息

为什么要将其命名为 RSI

当您以错误的姿势反复使用身体的某个部位(手、手指、手腕、颈部..)时,就会出现重复性劳损(RSI)。同样,在出现性能瓶颈时,多个线程将开始显示重复的栈追踪信息。您需要对此类信息进行深入研究。

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: