20210519 使用jstack命令排查线程死锁问题

Posted 陈如水

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了20210519 使用jstack命令排查线程死锁问题相关的知识,希望对你有一定的参考价值。

问题:  针对线上多线程死锁、阻塞,跑着跑着就卡住了;查看线上线程池的状态;

jstack主要用于生成java虚拟机当前时刻的线程快照。线程快照是当前java虚拟机内每一条线程正在执行的方法堆栈的集合,生成线程快照的主要目的是定位线程出现长时间停顿的原因,如线程间死锁、死循环、请求外部资源导致的长时间等待等。 线程出现停顿的时候通过jstack来查看各个线程的调用堆栈,就可以知道没有响应的线程到底在后台做什么事情,或者等待什么资源。

jstack: java堆栈跟踪工具(线程快照和内存快照)

  • 执行jstack命令,将得到进程的堆栈信息。我一般使用jstack -l pid来得到长列表,显示其详细信息。

  • 有时线程挂起的时候,需要执行jstack -F pid来获取。

  • 在实际运行中,往往一次 dump的信息,还不足以确认问题。建议产生三次 dump信息,如果每次 dump都指向同一个问题,我们才确定问题的典型性。

  • 堆栈信息只是一种参考,一些正常RUNNING的线程,由于复杂网络环境和IO的影响,也是有问题的,用jstack就无法定位,需要结合对业务代码的理解。

常用命令

jps

top -H -p <pid>

jstack -l <pid>

top -H -p <pid>

1)在当前用户下,列出pid包含的所有线程。查询某个进程中所有的线程信息,cpu使用和内存。

2)线程id(十进制)和线程快照中nid相同(十六进制,操作系统级别的线程id)

3) 关注某个进程中,哪些线程的cpu和内存使用率最高,找到对应的线程快照,并且重点关注线程状态和业务代码。并且关注线程中,是否存在死锁。

 

 

1,死锁代码如下

/**
 * 测试死锁问题
 */
public class DeadLockDemo {

    public static void main(String[] args) {
        testDeadLock();
    }

    private static void testDeadLock() {
        ReentrantLock lock1 = new ReentrantLock();
        ReentrantLock lock2 = new ReentrantLock();

        Thread thread1 = new Thread(new Runnable() {
            @Override
            public void run() {
                lock1.lock();
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                lock2.lock();
            }
        });

        Thread thread2 = new Thread(new Runnable() {
            @Override
            public void run() {
                lock2.lock();
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                lock1.lock();
            }
        });

        thread1.setName("thread1");
        thread2.setName("thread2");

        System.out.println("启动线程");
        thread1.start();
        thread2.start();
        
    }
}

2,线程快照如下:

命令:jps获取进程编号,jstack -l 82595 >> thread.txt

使用jstack查看服务进程的线程情况

2021-05-19 15:29:10
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.291-b10 mixed mode):

"Attach Listener" #14 daemon prio=9 os_prio=31 tid=0x00007f8b2c45c800 nid=0x9c03 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"DestroyJavaVM" #13 prio=5 os_prio=31 tid=0x00007f8b2c015000 nid=0x2403 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"thread2" #12 prio=5 os_prio=31 tid=0x00007f8b1c016800 nid=0x9e03 waiting on condition [0x000000030b82d000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000076b014f98> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at com.anjuke.aifang.aifang.building.operate.scf.utils.DeadLockDemo$2.run(DeadLockDemo.java:42)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- <0x000000076b014fc8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"thread1" #11 prio=5 os_prio=31 tid=0x00007f8b2d55b000 nid=0x6493 waiting on condition [0x000000030b72a000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000076b014fc8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at com.anjuke.aifang.aifang.building.operate.scf.utils.DeadLockDemo$1.run(DeadLockDemo.java:29)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- <0x000000076b014f98> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"Service Thread" #10 daemon prio=9 os_prio=31 tid=0x00007f8afc023000 nid=0x5e0f runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"C1 CompilerThread3" #9 daemon prio=9 os_prio=31 tid=0x00007f8b2c401000 nid=0xa503 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"C2 CompilerThread2" #8 daemon prio=9 os_prio=31 tid=0x00007f8b2d8a0000 nid=0x5b03 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"C2 CompilerThread1" #7 daemon prio=9 os_prio=31 tid=0x00007f8b2d812000 nid=0x5903 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"C2 CompilerThread0" #6 daemon prio=9 os_prio=31 tid=0x00007f8b2c3e0800 nid=0x5803 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"Monitor Ctrl-Break" #5 daemon prio=5 os_prio=31 tid=0x00007f8b2c3dd800 nid=0x570b runnable [0x000000030b015000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
	- locked <0x000000076b06aab8> (a java.io.InputStreamReader)
	at java.io.InputStreamReader.read(InputStreamReader.java:184)
	at java.io.BufferedReader.fill(BufferedReader.java:161)
	at java.io.BufferedReader.readLine(BufferedReader.java:324)
	- locked <0x000000076b06aab8> (a java.io.InputStreamReader)
	at java.io.BufferedReader.readLine(BufferedReader.java:389)
	at com.intellij.rt.execution.application.AppMainV2$1.run(AppMainV2.java:48)

   Locked ownable synchronizers:
	- None

"Signal Dispatcher" #4 daemon prio=9 os_prio=31 tid=0x00007f8b2c808800 nid=0xa903 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"Finalizer" #3 daemon prio=8 os_prio=31 tid=0x00007f8b2c06f000 nid=0x351f in Object.wait() [0x000000030ad00000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x000000076ab08ee0> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
	- locked <0x000000076ab08ee0> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)

   Locked ownable synchronizers:
	- None

"Reference Handler" #2 daemon prio=10 os_prio=31 tid=0x00007f8b2c06c800 nid=0x4723 in Object.wait() [0x000000030abfd000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x000000076ab06c00> (a java.lang.ref.Reference$Lock)
	at java.lang.Object.wait(Object.java:502)
	at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
	- locked <0x000000076ab06c00> (a java.lang.ref.Reference$Lock)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

   Locked ownable synchronizers:
	- None

"VM Thread" os_prio=31 tid=0x00007f8b2c065800 nid=0x4903 runnable 

"GC task thread#0 (ParallelGC)" os_prio=31 tid=0x00007f8b2d817800 nid=0x2d4f runnable 

"GC task thread#1 (ParallelGC)" os_prio=31 tid=0x00007f8b2d818000 nid=0x5303 runnable 

"GC task thread#2 (ParallelGC)" os_prio=31 tid=0x00007f8b2d818800 nid=0x2f03 runnable 

"GC task thread#3 (ParallelGC)" os_prio=31 tid=0x00007f8b2c026000 nid=0x5003 runnable 

"GC task thread#4 (ParallelGC)" os_prio=31 tid=0x00007f8b2c026800 nid=0x4f03 runnable 

"GC task thread#5 (ParallelGC)" os_prio=31 tid=0x00007f8b2c027000 nid=0x3103 runnable 

"GC task thread#6 (ParallelGC)" os_prio=31 tid=0x00007f8b2c027800 nid=0x4d03 runnable 

"GC task thread#7 (ParallelGC)" os_prio=31 tid=0x00007f8b2c028800 nid=0x4b03 runnable 

"VM Periodic Task Thread" os_prio=31 tid=0x00007f8b2d55a000 nid=0x6163 waiting on condition 

JNI global references: 15


Found one Java-level deadlock:
=============================
"thread2":
  waiting for ownable synchronizer 0x000000076b014f98, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "thread1"
"thread1":
  waiting for ownable synchronizer 0x000000076b014fc8, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "thread2"

Java stack information for the threads listed above:
===================================================
"thread2":
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000076b014f98> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at com.anjuke.aifang.aifang.building.operate.scf.utils.DeadLockDemo$2.run(DeadLockDemo.java:42)
	at java.lang.Thread.run(Thread.java:748)
"thread1":
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000076b014fc8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at com.anjuke.aifang.aifang.building.operate.scf.utils.DeadLockDemo$1.run(DeadLockDemo.java:29)
	at java.lang.Thread.run(Thread.java:748)

Found 1 deadlock.

线程快照查看:

1)taskExecutor- 为线程词的名称;

2)java.lang.Thread.State 线程的状态;

3)具体为线程的执行到什么动作,代码行数;

4)tid: java内的线程id;

5)nid: 操作系统级别线程的线程id;

6)prio: java内定义的线程的优先级;

7)os_prio:操作系统级别的优先级;

 

一个Thread对象可以有多个状态,在java.lang.Thread.State中,总共定义六种状态:

1、NEW

线程刚刚被创建,也就是已经new过了,但是还没有调用start()方法,jstack命令不会列出处于此状态的线程信息。

2、RUNNABLE    #java.lang.Thread.State: RUNNABLE

RUNNABLE这个名字很具有欺骗性,很容易让人误以为处于这个状态的线程正在运行。事实上,这个状态只是表示,线程是可运行的。我们已经无数次提到过,一个单核CPU在同一时刻,只能运行一个线程。

3、BLOCKED     # java.lang.Thread.State: BLOCKED (on object monitor)

线程处于阻塞状态,正在等待一个monitor lock。通常情况下,是因为本线程与其他线程公用了一个锁。其他在线程正在使用这个锁进入某个synchronized同步方法块或者方法,而本线程进入这个同步代码块也需要这个锁,最终导致本线程处于阻塞状态。

4、WAITING

等待状态,调用以下方法可能会导致一个线程处于等待状态:

Object.wait 不指定超时时间 # java.lang.Thread.State: WAITING (on object monitor)

Thread.join with no timeout

LockSupport.park #java.lang.Thread.State: WAITING (parking)

例如:对于wait()方法,一个线程处于等待状态,通常是在等待其他线程完成某个操作。本线程调用某个对象的wait()方法,其他线程处于完成之后,调用同一个对象的notify或者notifyAll()方法。Object.wait()方法只能够在同步代码块中调用。调用了wait()方法后,会释放锁。

5、TIMED_WAITING

线程等待指定的时间,对于以下方法的调用,可能会导致线程处于这个状态:

Thread.sleep #java.lang.Thread.State: TIMED_WAITING (sleeping)

Object.wait 指定超时时间   #java.lang.Thread.State: TIMED_WAITING (on object monitor)

Thread.join with timeout

LockSupport.parkNanos   #java.lang.Thread.State: TIMED_WAITING (parking)

LockSupport.parkUntil   #java.lang.Thread.State: TIMED_WAITING (parking)

6、TERMINATED  线程终止。

7、parking指线程处于挂起中。

Wait on condition 等待资源,或等待某个条件的发生。具体原因需结合 stacktrace来分析。

常见情况是该线程在 sleep,等待 sleep的时间到了时候,将被唤醒。

关键字:TIMED_WAITING,sleeping,parking。TIMED_WAITING可能是调用了有超时参数的wait所引起的。

parking指线程处于挂起中。

 

如果发现有大量的线程都在处在 Wait on condition,  可能出现了某中瓶颈。

如果堆栈信息明确是应用代码,则证明该线程正在等待资源。一般是大量读取某资源,且该资源采用了资源锁的情况下,线程进入等待状态,等待资源的读取。

 

Blocked

线程阻塞,是指当前线程执行过程中,所需要的资源长时间等待却一直未能获取到,被容器的线程管理器标识为阻塞状态,可以理解为等待资源超时的线程。如果线程处于Blocked状态,但是原因不清楚。可以使用jstack -m pid得到线程的mixed信息。

 

在dump 文件里,哪些需要重点关注:

  • 死锁,Deadlock(重点关注)
  • 执行中,Runnable
  • 等待资源,Waiting on condition(重点关注)
  • 等待获取监视器,Waiting on monitor entry(重点关注)
  • 对象等待中,Object.wait() 或 TIMED_WAITING
  • 暂停,Suspended
  • 阻塞,Blocked(重点关注)
  • 停止,Parked

3,分析

Waiting on monitor entry 意味着线程在等待进入一个临界区;Monitor是Java中用以实现线程之间的互斥与协作的主要手段,它可以看成是对象或者 Class的锁。每一个对象都有,也仅有一个 monitor。

 

thread2正在等待锁0x000000076b014f98,它被线程1所持有;

thread1正在等待锁0x000000076b014fc8,它被线程2所持有;这样就造成了死锁问题。

并且堆栈也记录了发生死锁的代码行数。

 

4,在线分析线程快照

jstack输出到文本中,然后使用分析工具查看(分析线程状态)。

https://fastthread.io/   线程快照在线分析工具,上传线程快照后,查询是否有死锁问题。

 

以上是关于20210519 使用jstack命令排查线程死锁问题的主要内容,如果未能解决你的问题,请参考以下文章

jstack排查锁竞争

使用jstack命令dump线程信息

死锁排查的小窍门 --使用jdk自带管理工具jstack

CPU飚高问题排查基本步骤

JVM故障问题排查心得「线程诊断系列」通过jstack线程状态分析虚拟机线程问题指南

你们项目如何排查JVM问题