细读源码之Log4j打印异常堆栈导致线程Block问题分析

Posted 马士兵

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了细读源码之Log4j打印异常堆栈导致线程Block问题分析相关的知识,希望对你有一定的参考价值。

细读源码是《马士兵教育》旗下赏析源码的栏目。我们赏析源码的目的不是为了炫技,而是为了去理解作者的设计思想,并取其精华,去其糟粕,从而写出更加优秀的代码。

另一方面,也可以给面试加分,代码好坏的评价,不可避免地会代入个人的主观色彩,大家和而不同。

【背景】

最近线上环境出了一次严重的事故,在用户访问高峰期,出现了服务不可用的情况,同时监控系统报出了大量工作线程Block的告警,通过对Block线程的堆栈的分析,最后发现是由于打印异常堆栈日志导致的,还是非常出乎意料的。

本文将从源码的角度分析一下问题发生的原因,主要从以下几个方法进行讲解:

1.拟线上问题

2.Log4j打印异常堆栈机制

3.Classload加载class锁机制

4.反射调用优化

5.问题总结和解决方案


一.模拟线上问题
由于线上的代码逻辑比较复杂,为了更方便精准定位问题,下面编写了测试代码,去模拟线上问题的发生,过程如下:

1.新建myfirstmvn的maven工程

pom.xml文件如下:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId> <artifactId>myfirstmvn</artifactId> <version>1.0-SNAPSHOT</version> <packaging>jar</packaging>
<name>myfirstmvn</name> <url>http://maven.apache.org</url>
<properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties></project>

2.在myfirstmvn的增加一个MathUtil类

package org.example.myfirstmvn;
public class MathUtil { public static int div(int a, int b) { return a / b; }}

3.新建log4jtest的maven工程

pom.xml文件如下:

<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId> <artifactId>log4jtest</artifactId> <version>1.0-SNAPSHOT</version>
<dependencies> <dependency> <groupId>org.example</groupId> <artifactId>myfirstmvn</artifactId> <version>1.0-SNAPSHOT</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-api</artifactId> <version>2.14.1</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-core</artifactId> <version>2.14.1</version> </dependency> </dependencies></project>

4.在log4jtest中增加LogTest类

LogTest类用来模拟,多个线程同时打印日志情况,代码如下:

package org.example.sourcecode.threadblock;
import org.apache.logging.log4j.LogManager;import org.apache.logging.log4j.Logger;import org.example.myfirstmvn.MathUtil;
import java.lang.reflect.Method;
public class LogTest { private static Logger logger = LogManager.getLogger(LogManager.ROOT_LOGGER_NAME);
public static void main(String[] args) { for (int i = 0; i < 100; i++) { Thread thread = new Thread(new Runnable() { @Override public void run() { while (true) { LogTest.run(); } } }); thread.start(); } }
private static void run() { try { callDiv(1, 0); } catch (Exception e) { logger.error("div error", e); } }
private static int callDiv(int a, int b) throws Exception { Method method = MathUtil.class.getMethod("div", new Class[]{int.class, int.class}); return (int) method.invoke(null, a, b); }}

运行上面的代码,并使用jvisualvm工具来查看线程的运行状态,如下图所示:

从图上可以看出,打印日志的过程中,绝大多数线程处于BLOCKED (on object monitor)的状态,通过线程Dump,查看处于Block状态的线程堆栈,如下所示:

"Thread-100" #113 prio=5 os_prio=31 tid=0x00007f8d92021000 nid=0x11a03 waiting for monitor entry [0x000070000eed3000] java.lang.Thread.State: BLOCKED (on object monitor) at java.lang.ClassLoader.loadClass(ClassLoader.java:398) - waiting to lock <0x00000006c000ff28> (a java.lang.Object) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.logging.log4j.util.LoaderUtil.loadClass(LoaderUtil.java:171) at org.apache.logging.log4j.core.impl.ThrowableProxyHelper.loadClass(ThrowableProxyHelper.java:216) at org.apache.logging.log4j.core.impl.ThrowableProxyHelper.toExtendedStackTrace(ThrowableProxyHelper.java:114) at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:110) at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:94) at org.apache.logging.log4j.core.impl.MutableLogEvent.getThrownProxy(MutableLogEvent.java:347) at org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter.format(ExtendedThrowablePatternConverter.java:63) at org.apache.logging.log4j.core.pattern.PatternFormatter.format(PatternFormatter.java:38) at org.apache.logging.log4j.core.layout.PatternLayout$PatternSerializer.toSerializable(PatternLayout.java:344) at org.apache.logging.log4j.core.layout.PatternLayout.toText(PatternLayout.java:244) at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:229) at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:59) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181) at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156) at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129) at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120) at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84) at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:540) at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:498) at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:481) at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:456) at org.apache.logging.log4j.core.config.DefaultReliabilityStrategy.log(DefaultReliabilityStrategy.java:63) at org.apache.logging.log4j.core.Logger.log(Logger.java:161) at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205) at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159) at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142) at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2017) at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1983) at org.apache.logging.log4j.spi.AbstractLogger.error(AbstractLogger.java:750) at org.example.sourcecode.threadblock.LogTest.run(LogTest.java:34) at org.example.sourcecode.threadblock.LogTest.access$000(LogTest.java:13) at org.example.sourcecode.threadblock.LogTest$1.run(LogTest.java:22) at java.lang.Thread.run(Thread.java:748)

从上面的堆栈可以看出,是调用ClassLoader.loadClass方法,导致线程Block。

但是为什么Log4j打印异常堆栈,会触发ClassLoader.loadClass的调用呢,ClassLoader.loadClass的锁又是怎么回事呢,下面的内容会一一进行讲解。


二.Log4j打印异常堆栈机制

首先我们看一下先看一下Log4j的Logger.error和e.printStackTrace输出内容的差异。

1.Logger.error

package org.example.sourcecode.threadblock;
import org.apache.logging.log4j.LogManager;import org.apache.logging.log4j.Logger;import org.example.myfirstmvn.MathUtil;
import java.lang.reflect.Method;
public class Log2Test {
private static Logger logger = LogManager.getLogger(LogManager.ROOT_LOGGER_NAME);
public static void main(String[] args) { try { callDiv(1, 0); } catch (Exception e) { logger.error("div error", e); } }
private static int callDiv(int a, int b) throws Exception { Method method = MathUtil.class.getMethod("div", new Class[]{int.class, int.class}); return (int) method.invoke(null, a, b); }}

上面代码执行结果:

java.lang.reflect.InvocationTargetException: null at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_261] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_261] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_261] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261] at org.example.sourcecode.threadblock.Log2Test.callDiv(Log2Test.java:23) ~[classes/:?] at org.example.sourcecode.threadblock.Log2Test.main(Log2Test.java:15) [classes/:?]Caused by: java.lang.ArithmeticException: / by zero at org.example.myfirstmvn.MathUtil.div(MathUtil.java:9) ~[myfirstmvn-1.0-SNAPSHOT.jar:?] ... 6 more

2. e.printStackTrace

package org.example.sourcecode.threadblock;
import org.example.myfirstmvn.MathUtil;import java.lang.reflect.Method;
public class Log3Test { public static void main(String[] args) { try { callDiv(1, 0); } catch (Exception e) { e.printStackTrace(); } }
private static int callDiv(int a, int b) throws Exception { Method method = MathUtil.class.getMethod("div", new Class[]{int.class, int.class}); return (int) method.invoke(null, a, b); }}

上面代码执行结果:

java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.example.sourcecode.threadblock.Log3Test.callDiv(Log3Test.java:18) at org.example.sourcecode.threadblock.Log3Test.main(Log3Test.java:10)Caused by: java.lang.ArithmeticException: / by zero at org.example.myfirstmvn.MathUtil.div(MathUtil.java:9) ... 6 more

对比上面两部分代码的执行结果,发现Logger.error打印的堆栈信息更加完善。JDK包中的类打印了JDK版本[?:1.8.0_261],class path下的类打印了[classes/:?],其他jar文件中的类打印了类所在的jar文件名[myfirstmvn-1.0-SNAPSHOT.jar:?]。

而这些信息是通过log4j-core.jar包中的

ThrowableProxyHelper.toCacheEntry生成的,不同的Log4j版本会有差异。

低版本的Log4j没有ThrowableProxyHelper类,toCacheEntry方法在ThrowableProxy类中。

参考toCacheEntry方法的实现,写了一个展示类位置信息的测试程序,代码如下:

package org.example.sourcecode.threadblock;
import org.example.myfirstmvn.MathUtil;import java.net.URL;import java.security.CodeSource;
public class ClassLocationTest {
public static void main(String[] args) { System.out.println(displayClassLocation(String.class)); System.out.println(displayClassLocation(ClassLocationTest.class)); System.out.println(displayClassLocation(MathUtil.class)); }
private static String displayClassLocation(final Class<?> callerClass) { String location = "?"; String version = "?"; if (callerClass != null) { try { final CodeSource source = callerClass.getProtectionDomain().getCodeSource(); if (source != null) { final URL locationURL = source.getLocation(); if (locationURL != null) { final String str = locationURL.toString().replace('\\', '/'); int index = str.lastIndexOf("/"); if (index >= 0 && index == str.length() - 1) { index = str.lastIndexOf("/", index - 1); location = str.substring(index + 1); } else { location = str.substring(index + 1); } } } } catch (final Exception ex) { // Ignore the exception. } final Package pkg = callerClass.getPackage(); if (pkg != null) { final String ver = pkg.getImplementationVersion(); if (ver != null) { version = ver; } } } return "[" + location + " " + version + "]"; }}

上面代码执行结果:

[? 1.8.0_261][classes/ ?][myfirstmvn-1.0-SNAPSHOT.jar ?]

想要获得类所在位置的辅助信息,必须拿到Class对象,而堆栈对象StackTraceElement定义的declaringClass是String类型,想要得到相应的Class对象,必须使用Classloader进行类加载,从而触发了Classloader.loadClass方法的执行。


三.Classload加载class锁机制

1.Classloader.loadClass锁分析

Classload调用loadClass进行类加载的过程是线程安全的,loadClass的代码如下:

protected Class<?> loadClass(String name, boolean resolve)throws ClassNotFoundException {synchronized (getClassLoadingLock(name)) {// First, check if the class has already been loaded Class<?> c = findLoadedClass(name);if (c == null) {long t0 = System.nanoTime();try {if (parent != null) { c = parent.loadClass(name, false); } else { c = findBootstrapClassOrNull(name); } } catch (ClassNotFoundException e) {// ClassNotFoundException thrown if class not found// from the non-null parent class loader }
if (c == null) {// If still not found, then invoke findClass in order// to find the class.long t1 = System.nanoTime(); c = findClass(name);
// this is the defining class loader; record the stats sun.misc.PerfCounter.getParentDelegationTime().addTime(t1 - t0); sun.misc.PerfCounter.getFindClassTime().addElapsedTimeFrom(t1); sun.misc.PerfCounter.getFindClasses().increment(); } }if (resolve) { resolveClass(c); }return c; }}

进入loadClass方法,就是synchronized代码块,synchronized的对象是通过getClassLoadingLock返回的,其代码如下:

protected Object getClassLoadingLock(String className) { Object lock = this; if (parallelLockMap != null) { Object newLock = new Object(); lock = parallelLockMap.putIfAbsent(className, newLock); if (lock == null) { lock = newLock; } } return lock;}

通过分析getClassLoadingLock可以得出,loadClass的同步工作模式有两种:

1. 串行,synchronized对象是Classloader的this对象,在串行模式下执行,当classloader在加载一个类的时候,其他线程要进行类加载的时候,都必须等待;

2.并行,synchronized对象是从parallelLockMap中根据className查询得到的Object对象,在并行模式下执行,类名完全相同的类还是串行加载,类名不同的类之间是并行加载,加载过程互不影响,这样可以提供加载效率。

2.指定Classloader的加载的串并行模式

可以通过ClassLoader.registerAsParallelCapable()方法将调用次方法的Classloader注册为并行加载模式,如果不指定,默认是串行加载模式。最常用的AppClassLoader就是并行加载模式,其静态代码块如下所示:

static { ClassLoader.registerAsParallelCapable();}

3.Classloader加载无效类效率

由于Classloader的加载机制,当Classloader去加载一个他无法加载的类时,loadClass的效率就会变得非常低下,举例说明:

package org.example.sourcecode.threadblock;
import java.io.ByteArrayOutputStream;import java.io.FileInputStream;import java.io.IOException;
public class TryLoadClass { private static int COUNTER = 100000;
public static void main(String[] args) throws IOException, ClassNotFoundException { ClassLoader loader = ClassLoader.getSystemClassLoader(); for (int i = 0; i < 1000; i++) { long time1 = tryLoadClass(loader, "org.example.sourcecode.threadblock.TryLoadClass"); long time2 = tryLoadClass(loader, "org.example.sourcecode.threadblock.TryLoadClass2"); System.out.println(time1 + " " + time2); } }
public static long tryLoadClass(ClassLoader classLoader, String classFullName) { long start = System.currentTimeMillis(); for (int i = 0; i < COUNTER; i++) { try { classLoader.loadClass(classFullName); } catch (Exception e) {
} } return System.currentTimeMillis() - start; }}

上面代码执行结果:

71 277249 268748 261749 276948 262348 2611

org.example.sourcecode.threadblock.TryLoadClass2这个类不存在,加载效率比加载存在的类慢了几十倍。

加载过程执行的越慢,高并发的条件下,就越可能发生线程Block情况。


四.反射调用优化

在《细读源码之JAVA反射方法调用优化》详细讲解了反射优化过程,最后动态生成的二进制Class文件,交给了ClassDefiner进行处理,ClassDefiner代码如下:

class ClassDefiner { static final Unsafe unsafe = Unsafe.getUnsafe();
ClassDefiner() { }
static Class<?> defineClass(String var0, byte[] var1, int var2, int var3, final ClassLoader var4) { ClassLoader var5 = (ClassLoader)AccessController.doPrivileged(new PrivilegedAction<ClassLoader>() { public ClassLoader run() { return new DelegatingClassLoader(var4); } }); return unsafe.defineClass(var0, var1, var2, var3, var5, (ProtectionDomain)null); }}

最后执行动态二进制文件加载的Classloader是DelegatingClassLoader,其代码如下:

class DelegatingClassLoader extends ClassLoader { DelegatingClassLoader(ClassLoader var1) { super(var1); }}

分析上面的代码,我们可以得出下面的结论:反射优化的动态生成的Class,是不能被其他Classloader(如:AppClassloader)加载的,这个如果无法理解,请参加马老师的《JVM从入门到精通》课程,里面更详细的关于Classloader的内容。


五.问题总结和解决方案


1.问题总结

1.使用Log4j的Logger.error去打印异常堆栈的时候,为了打印出堆栈中类的位置信息,需要使用Classloader进行类加载;

2.Classloader加载是线程安全的,虽然并行加载可以提高加载不同类的效率,但是多线程加载相同的类时,还是需要互相同步等待,尤其当不同的线程打印的异常堆栈完全相同时,就会增加线程Block的风险,而Classloader去加载一个无法加载的类时,效率会急剧下降,使线程Block的情况进一步恶化;

3.因为反射调用效率问题,JDK对反射调用进行了优化,动态生成Java类进行方法调用,替换原来的native调用,而生成的动态类是由DelegatingClassLoader进行加载的,不能被其他的Classloader加载,异常堆栈中有反射优化的动态类,在高并发的条件下,就非常容易产生线程Block的情况。


2.解决方案

A.去掉不必要的异常堆栈打印

对于可以确定问题原因的异常,就没有必要打印异常堆栈信息,示例代码如下:

package org.example.sourcecode.threadblock;
import org.apache.logging.log4j.LogManager;import org.apache.logging.log4j.Logger;
import java.util.Scanner;
public class DisplayStackTrace {private static Logger logger = LogManager.getLogger(LogManager.ROOT_LOGGER_NAME);
public static void main(String[] args) { Scanner cin = new Scanner(System.in);String number = cin.nextLine();try { System.out.println(Integer.parseInt(number) + 100); } catch (Exception e) { logger.error("parse int error : " + number, e); } }}

如果Integer.parseInt发生异常,导致异常原因肯定是出入的number不合法,在这种情况下,打印异常堆栈完全没有必要,可以去掉堆栈的打印。

B.将堆栈信息转换为字符串再打印

package org.example.sourcecode.threadblock;
import org.apache.logging.log4j.LogManager;import org.apache.logging.log4j.Logger;import org.example.myfirstmvn.MathUtil;
import java.io.PrintWriter;import java.io.StringWriter;import java.lang.reflect.Method;
public class Log2Test {private static Logger logger = LogManager.getLogger(LogManager.ROOT_LOGGER_NAME);
public static void main(String[] args) {try { callDiv(1, 0); } catch (Exception e) { logger.error(convertStackTraceToString(e)); } }
private static String convertStackTraceToString(Throwable throwable) { StringWriter stringWriter = new StringWriter(); throwable.printStackTrace(new PrintWriter(stringWriter));return stringWriter.toString(); }
private static int callDiv(int a, int b) throws Exception { Method method = MathUtil.class.getMethod("div", new Class[]{int.class, int.class});return (int) method.invoke(null, a, b); }}

调用convertStackTraceToString将异常堆栈转换为字符串,但是字符串中就没有了类所在位置的辅助信息,这是一个取舍的过程,没有一个标准的答案,需要根据业务系统的实际情况做最后的决策。

C.禁用反射优化

使用Log4j打印堆栈信息,如果堆栈中有反射优化生成的动态代理类,这个代理类不能被其他的Classloader加载,这个时候打印堆栈,会严重影响执行效率。但是禁用反射优化也会有副作用,导致反射执行的效率降低。

并不是所有问题都有那种完美的解决方案,都是各方博弈得到的相对平衡的结果。对打印日志而言,就是执行效率,磁盘空间和排查问题难度几方博弈的结果,最终选择适合自己业务场景的方案。

——————
加微信进行业交流群
细读源码之Log4j打印异常堆栈导致线程Block问题分析

以上是关于细读源码之Log4j打印异常堆栈导致线程Block问题分析的主要内容,如果未能解决你的问题,请参考以下文章

Log4j打印错误异常的详细堆栈信息

JVM反调调用优化,导致发生大量异常时log4j2线程阻塞

log4j 不打印异常堆栈

log4j 不打印异常的堆栈跟踪

手机调试Android程序出异常时不打印堆栈信息

老哥,你遇到过log4j2线程阻塞的场景吗?