细读源码之Log4j打印异常堆栈导致线程Block问题分析
Posted 马士兵
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了细读源码之Log4j打印异常堆栈导致线程Block问题分析相关的知识,希望对你有一定的参考价值。
细读源码是《马士兵教育》旗下赏析源码的栏目。我们赏析源码的目的不是为了炫技,而是为了去理解作者的设计思想,并取其精华,去其糟粕,从而写出更加优秀的代码。
另一方面,也可以给面试加分,代码好坏的评价,不可避免地会代入个人的主观色彩,大家和而不同。
【背景】
最近线上环境出了一次严重的事故,在用户访问高峰期,出现了服务不可用的情况,同时监控系统报出了大量工作线程Block的告警,通过对Block线程的堆栈的分析,最后发现是由于打印异常堆栈日志导致的,还是非常出乎意料的。
本文将从源码的角度分析一下问题发生的原因,主要从以下几个方法进行讲解:
1.模拟线上问题
2.Log4j打印异常堆栈机制
3.Classload加载class锁机制
4.反射调用优化
5.问题总结和解决方案
1.新建myfirstmvn的maven工程
pom.xml文件如下:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>myfirstmvn</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>myfirstmvn</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
</project>
2.在myfirstmvn的增加一个MathUtil类
package org.example.myfirstmvn;
public class MathUtil {
public static int div(int a, int b) {
return a / b;
}
}
3.新建log4jtest的maven工程
pom.xml文件如下:
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>log4jtest</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.example</groupId>
<artifactId>myfirstmvn</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.14.1</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.14.1</version>
</dependency>
</dependencies>
</project>
4.在log4jtest中增加LogTest类
LogTest类用来模拟,多个线程同时打印日志情况,代码如下:
package org.example.sourcecode.threadblock;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.example.myfirstmvn.MathUtil;
import java.lang.reflect.Method;
public class LogTest {
private static Logger logger = LogManager.getLogger(LogManager.ROOT_LOGGER_NAME);
public static void main(String[] args) {
for (int i = 0; i < 100; i++) {
Thread thread = new Thread(new Runnable() {
public void run() {
while (true) {
LogTest.run();
}
}
});
thread.start();
}
}
private static void run() {
try {
callDiv(1, 0);
} catch (Exception e) {
logger.error("div error", e);
}
}
private static int callDiv(int a, int b) throws Exception {
Method method = MathUtil.class.getMethod("div", new Class[]{int.class, int.class});
return (int) method.invoke(null, a, b);
}
}
运行上面的代码,并使用jvisualvm工具来查看线程的运行状态,如下图所示:
从图上可以看出,打印日志的过程中,绝大多数线程处于BLOCKED (on object monitor)的状态,通过线程Dump,查看处于Block状态的线程堆栈,如下所示:
"Thread-100" #113 prio=5 os_prio=31 tid=0x00007f8d92021000 nid=0x11a03 waiting for monitor entry [0x000070000eed3000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.ClassLoader.loadClass(ClassLoader.java:398)
- waiting to lock <0x00000006c000ff28> (a java.lang.Object)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.logging.log4j.util.LoaderUtil.loadClass(LoaderUtil.java:171)
at org.apache.logging.log4j.core.impl.ThrowableProxyHelper.loadClass(ThrowableProxyHelper.java:216)
at org.apache.logging.log4j.core.impl.ThrowableProxyHelper.toExtendedStackTrace(ThrowableProxyHelper.java:114)
at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:110)
at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:94)
at org.apache.logging.log4j.core.impl.MutableLogEvent.getThrownProxy(MutableLogEvent.java:347)
at org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter.format(ExtendedThrowablePatternConverter.java:63)
at org.apache.logging.log4j.core.pattern.PatternFormatter.format(PatternFormatter.java:38)
at org.apache.logging.log4j.core.layout.PatternLayout$PatternSerializer.toSerializable(PatternLayout.java:344)
at org.apache.logging.log4j.core.layout.PatternLayout.toText(PatternLayout.java:244)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:229)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:59)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181)
at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129)
at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:540)
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:498)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:481)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:456)
at org.apache.logging.log4j.core.config.DefaultReliabilityStrategy.log(DefaultReliabilityStrategy.java:63)
at org.apache.logging.log4j.core.Logger.log(Logger.java:161)
at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142)
at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2017)
at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1983)
at org.apache.logging.log4j.spi.AbstractLogger.error(AbstractLogger.java:750)
at org.example.sourcecode.threadblock.LogTest.run(LogTest.java:34)
at org.example.sourcecode.threadblock.LogTest.access$000(LogTest.java:13)
at org.example.sourcecode.threadblock.LogTest$1.run(LogTest.java:22)
at java.lang.Thread.run(Thread.java:748)
从上面的堆栈可以看出,是调用ClassLoader.loadClass方法,导致线程Block。
但是为什么Log4j打印异常堆栈,会触发ClassLoader.loadClass的调用呢,ClassLoader.loadClass的锁又是怎么回事呢,下面的内容会一一进行讲解。
首先我们看一下先看一下Log4j的Logger.error和e.printStackTrace输出内容的差异。
1.Logger.error
package org.example.sourcecode.threadblock;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.example.myfirstmvn.MathUtil;
import java.lang.reflect.Method;
public class Log2Test {
private static Logger logger = LogManager.getLogger(LogManager.ROOT_LOGGER_NAME);
public static void main(String[] args) {
try {
callDiv(1, 0);
} catch (Exception e) {
logger.error("div error", e);
}
}
private static int callDiv(int a, int b) throws Exception {
Method method = MathUtil.class.getMethod("div", new Class[]{int.class, int.class});
return (int) method.invoke(null, a, b);
}
}
上面代码执行结果:
null :
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_261]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_261]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_261]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261]
at org.example.sourcecode.threadblock.Log2Test.callDiv(Log2Test.java:23) ~[classes/:?]
at org.example.sourcecode.threadblock.Log2Test.main(Log2Test.java:15) [classes/:?]
Caused by: java.lang.ArithmeticException: / by zero
at org.example.myfirstmvn.MathUtil.div(MathUtil.java:9) ~[myfirstmvn-1.0-SNAPSHOT.jar:?]
6 more
2. e.printStackTrace
package org.example.sourcecode.threadblock;
import org.example.myfirstmvn.MathUtil;
import java.lang.reflect.Method;
public class Log3Test {
public static void main(String[] args) {
try {
callDiv(1, 0);
} catch (Exception e) {
e.printStackTrace();
}
}
private static int callDiv(int a, int b) throws Exception {
Method method = MathUtil.class.getMethod("div", new Class[]{int.class, int.class});
return (int) method.invoke(null, a, b);
}
}
上面代码执行结果:
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.example.sourcecode.threadblock.Log3Test.callDiv(Log3Test.java:18)
at org.example.sourcecode.threadblock.Log3Test.main(Log3Test.java:10)
Caused by: java.lang.ArithmeticException: / by zero
at org.example.myfirstmvn.MathUtil.div(MathUtil.java:9)
6 more
对比上面两部分代码的执行结果,发现Logger.error打印的堆栈信息更加完善。JDK包中的类打印了JDK版本[?:1.8.0_261],class path下的类打印了[classes/:?],其他jar文件中的类打印了类所在的jar文件名[myfirstmvn-1.0-SNAPSHOT.jar:?]。
而这些信息是通过log4j-core.jar包中的
ThrowableProxyHelper.toCacheEntry生成的,不同的Log4j版本会有差异。
低版本的Log4j没有ThrowableProxyHelper类,toCacheEntry方法在ThrowableProxy类中。
参考toCacheEntry方法的实现,写了一个展示类位置信息的测试程序,代码如下:
package org.example.sourcecode.threadblock;
import org.example.myfirstmvn.MathUtil;
import java.net.URL;
import java.security.CodeSource;
public class ClassLocationTest {
public static void main(String[] args) {
System.out.println(displayClassLocation(String.class));
System.out.println(displayClassLocation(ClassLocationTest.class));
System.out.println(displayClassLocation(MathUtil.class));
}
private static String displayClassLocation(final Class<?> callerClass) {
String location = "?";
String version = "?";
if (callerClass != null) {
try {
final CodeSource source = callerClass.getProtectionDomain().getCodeSource();
if (source != null) {
final URL locationURL = source.getLocation();
if (locationURL != null) {
final String str = locationURL.toString().replace('\\', '/');
int index = str.lastIndexOf("/");
if (index >= 0 && index == str.length() - 1) {
index = str.lastIndexOf("/", index - 1);
location = str.substring(index + 1);
} else {
location = str.substring(index + 1);
}
}
}
} catch (final Exception ex) {
// Ignore the exception.
}
final Package pkg = callerClass.getPackage();
if (pkg != null) {
final String ver = pkg.getImplementationVersion();
if (ver != null) {
version = ver;
}
}
}
return "[" + location + " " + version + "]";
}
}
上面代码执行结果:
[? 1.8.0_261]
[classes/ ?]
[myfirstmvn-1.0-SNAPSHOT.jar ?]
想要获得类所在位置的辅助信息,必须拿到Class对象,而堆栈对象StackTraceElement定义的declaringClass是String类型,想要得到相应的Class对象,必须使用Classloader进行类加载,从而触发了Classloader.loadClass方法的执行。
1.Classloader.loadClass锁分析
Classload调用loadClass进行类加载的过程是线程安全的,loadClass的代码如下:
protected Class<?> loadClass(String name, boolean resolve)throws ClassNotFoundException {
synchronized (getClassLoadingLock(name)) {
// First, check if the class has already been loaded
Class<?> c = findLoadedClass(name);
if (c == null) {
long t0 = System.nanoTime();
try {if (parent != null) {
c = parent.loadClass(name, false);
} else {
c = findBootstrapClassOrNull(name);
}
} catch (ClassNotFoundException e) {
// ClassNotFoundException thrown if class not found
// from the non-null parent class loader
}
if (c == null) {
// If still not found, then invoke findClass in order
// to find the class.
long t1 = System.nanoTime();
c = findClass(name);
// this is the defining class loader; record the stats
sun.misc.PerfCounter.getParentDelegationTime().addTime(t1 - t0);
sun.misc.PerfCounter.getFindClassTime().addElapsedTimeFrom(t1);
sun.misc.PerfCounter.getFindClasses().increment();
}
}
if (resolve) {
resolveClass(c);
}
return c;
}
}
进入loadClass方法,就是synchronized代码块,synchronized的对象是通过getClassLoadingLock返回的,其代码如下:
protected Object getClassLoadingLock(String className) {
Object lock = this;
if (parallelLockMap != null) {
Object newLock = new Object();
lock = parallelLockMap.putIfAbsent(className, newLock);
if (lock == null) {
lock = newLock;
}
}
return lock;
}
通过分析getClassLoadingLock可以得出,loadClass的同步工作模式有两种:
1. 串行,synchronized对象是Classloader的this对象,在串行模式下执行,当classloader在加载一个类的时候,其他线程要进行类加载的时候,都必须等待;
2.并行,synchronized对象是从parallelLockMap中根据className查询得到的Object对象,在并行模式下执行,类名完全相同的类还是串行加载,类名不同的类之间是并行加载,加载过程互不影响,这样可以提供加载效率。
2.指定Classloader的加载的串并行模式
可以通过ClassLoader.registerAsParallelCapable()方法将调用次方法的Classloader注册为并行加载模式,如果不指定,默认是串行加载模式。最常用的AppClassLoader就是并行加载模式,其静态代码块如下所示:
static {
ClassLoader.registerAsParallelCapable();
}
3.Classloader加载无效类效率
由于Classloader的加载机制,当Classloader去加载一个他无法加载的类时,loadClass的效率就会变得非常低下,举例说明:
package org.example.sourcecode.threadblock;
import java.io.ByteArrayOutputStream;
import java.io.FileInputStream;
import java.io.IOException;
public class TryLoadClass {
private static int COUNTER = 100000;
public static void main(String[] args) throws IOException, ClassNotFoundException {
ClassLoader loader = ClassLoader.getSystemClassLoader();
for (int i = 0; i < 1000; i++) {
long time1 = tryLoadClass(loader, "org.example.sourcecode.threadblock.TryLoadClass");
long time2 = tryLoadClass(loader, "org.example.sourcecode.threadblock.TryLoadClass2");
System.out.println(time1 + " " + time2);
}
}
public static long tryLoadClass(ClassLoader classLoader, String classFullName) {
long start = System.currentTimeMillis();
for (int i = 0; i < COUNTER; i++) {
try {
classLoader.loadClass(classFullName);
} catch (Exception e) {
}
}
return System.currentTimeMillis() - start;
}
}
上面代码执行结果:
71 2772
49 2687
48 2617
49 2769
48 2623
48 2611
org.example.sourcecode.threadblock.TryLoadClass2这个类不存在,加载效率比加载存在的类慢了几十倍。
加载过程执行的越慢,高并发的条件下,就越可能发生线程Block情况。
四.反射调用优化
在《细读源码之JAVA反射方法调用优化》详细讲解了反射优化过程,最后动态生成的二进制Class文件,交给了ClassDefiner进行处理,ClassDefiner代码如下:
class ClassDefiner {
static final Unsafe unsafe = Unsafe.getUnsafe();
ClassDefiner() {
}
static Class<?> defineClass(String var0, byte[] var1, int var2, int var3, final ClassLoader var4) {
ClassLoader var5 = (ClassLoader)AccessController.doPrivileged(new PrivilegedAction<ClassLoader>() {
public ClassLoader run() {
return new DelegatingClassLoader(var4);
}
});
return unsafe.defineClass(var0, var1, var2, var3, var5, (ProtectionDomain)null);
}
}
最后执行动态二进制文件加载的Classloader是DelegatingClassLoader,其代码如下:
class DelegatingClassLoader extends ClassLoader {
DelegatingClassLoader(ClassLoader var1) {
super(var1);
}
}
分析上面的代码,我们可以得出下面的结论:反射优化的动态生成的Class,是不能被其他Classloader(如:AppClassloader)加载的,这个如果无法理解,请参加马老师的《JVM从入门到精通》课程,里面更详细的关于Classloader的内容。
五.问题总结和解决方案
1.使用Log4j的Logger.error去打印异常堆栈的时候,为了打印出堆栈中类的位置信息,需要使用Classloader进行类加载;
2.Classloader加载是线程安全的,虽然并行加载可以提高加载不同类的效率,但是多线程加载相同的类时,还是需要互相同步等待,尤其当不同的线程打印的异常堆栈完全相同时,就会增加线程Block的风险,而Classloader去加载一个无法加载的类时,效率会急剧下降,使线程Block的情况进一步恶化;
3.因为反射调用效率问题,JDK对反射调用进行了优化,动态生成Java类进行方法调用,替换原来的native调用,而生成的动态类是由DelegatingClassLoader进行加载的,不能被其他的Classloader加载,异常堆栈中有反射优化的动态类,在高并发的条件下,就非常容易产生线程Block的情况。
2.解决方案
A.去掉不必要的异常堆栈打印
对于可以确定问题原因的异常,就没有必要打印异常堆栈信息,示例代码如下:
package org.example.sourcecode.threadblock;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import java.util.Scanner;
public class DisplayStackTrace {
private static Logger logger = LogManager.getLogger(LogManager.ROOT_LOGGER_NAME);
public static void main(String[] args) {
Scanner cin = new Scanner(System.in);
String number = cin.nextLine();
try {
System.out.println(Integer.parseInt(number) + 100);
} catch (Exception e) {
logger.error("parse int error : " + number, e);
}
}
}
如果Integer.parseInt发生异常,导致异常原因肯定是出入的number不合法,在这种情况下,打印异常堆栈完全没有必要,可以去掉堆栈的打印。
B.将堆栈信息转换为字符串再打印
package org.example.sourcecode.threadblock;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.example.myfirstmvn.MathUtil;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.lang.reflect.Method;
public class Log2Test {
private static Logger logger = LogManager.getLogger(LogManager.ROOT_LOGGER_NAME);
public static void main(String[] args) {
try {
callDiv(1, 0);
} catch (Exception e) {
logger.error(convertStackTraceToString(e));
}
}
private static String convertStackTraceToString(Throwable throwable) {
StringWriter stringWriter = new StringWriter();
throwable.printStackTrace(new PrintWriter(stringWriter));
return stringWriter.toString();
}
private static int callDiv(int a, int b) throws Exception {
Method method = MathUtil.class.getMethod("div", new Class[]{int.class, int.class});
return (int) method.invoke(null, a, b);
}
}
调用convertStackTraceToString将异常堆栈转换为字符串,但是字符串中就没有了类所在位置的辅助信息,这是一个取舍的过程,没有一个标准的答案,需要根据业务系统的实际情况做最后的决策。
C.禁用反射优化
使用Log4j打印堆栈信息,如果堆栈中有反射优化生成的动态代理类,这个代理类不能被其他的Classloader加载,这个时候打印堆栈,会严重影响执行效率。但是禁用反射优化也会有副作用,导致反射执行的效率降低。
并不是所有问题都有那种完美的解决方案,都是各方博弈得到的相对平衡的结果。对打印日志而言,就是执行效率,磁盘空间和排查问题难度几方博弈的结果,最终选择适合自己业务场景的方案。
以上是关于细读源码之Log4j打印异常堆栈导致线程Block问题分析的主要内容,如果未能解决你的问题,请参考以下文章