Android Koom 处理 app 的OOM 一些系列问题(java /native/thread leak)
Posted 新根
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Android Koom 处理 app 的OOM 一些系列问题(java /native/thread leak)相关的知识,希望对你有一定的参考价值。
本篇文档是基于快手团队的Koom 2.2.0 的tag 版本的使用介绍。
前期工作
VERSION_NAME=2.2.0
// 引入koom 的静态库,这里版本2.2.0
implementation "com.kuaishou.koom:koom-native-leak-static:$VERSION_NAME"
implementation "com.kuaishou.koom:koom-java-leak-static:$VERSION_NAME"
implementation "com.kuaishou.koom:koom-thread-leak-static:$VERSION_NAME"
implementation "com.kuaishou.koom:xhook-static:$VERSION_NAME"
使用快手发布koom 的静态库,通过源码编译,可能遇到一些问题,编译不通过。
更多信息,请阅读快手 KOOM 详细文档
1.JavaLeak
1.1 koom输出java 泄漏的json 信息:
该json中包含:
-
runningInfo: app 当前进程信息,包含线程数、fd 数据等关键信息
-
gcPaths: 触发gc的对象的调用链
-
leakObjects:泄漏对象
-
classInfos:类信息
先来看下,leakObjects:
[
"className":"android.graphics.Bitmap",
"extDetail":"1920 x 1080",
"objectId":"327801464",
"size":"2073600"
,
"className":"int[]",
"objectId":"1972002816",
"size":"455869"
,
"className":"byte[]",
"objectId":"1973350400",
"size":"524301"
,
"className":"char[]",
"objectId":"1974407184",
"size":"1048589"
]
从上面看,可知有bimap 和数组存在泄漏,但无更详细信息。
接下来看下gcPaths中一部分信息:
"gcRoot":"Local variable in native code",
"instanceCount":1,
"leakReason":"Bitmap Size Over Threshold, 1920x1080",
"path":[
"declaredClass":"java.lang.ClassLoader",
"reference":"dalvik.system.PathClassLoader.runtimeInternalObjects",
"referenceType":"INSTANCE_FIELD"
,
"declaredClass":"java.lang.Object[]",
"reference":"java.lang.Object[]",
"referenceType":"ARRAY_ENTRY"
,
"declaredClass":"com.kwai.koom.demo.javaleak.test.LeakMaker",
"reference":"com.kwai.koom.demo.javaleak.test.LeakMaker.leakMakerList",
"referenceType":"STATIC_FIELD"
,
"declaredClass":"java.util.ArrayList",
"reference":"java.util.ArrayList.elementData",
"referenceType":"INSTANCE_FIELD"
,
"declaredClass":"java.lang.Object[]",
"reference":"java.lang.Object[]",
"referenceType":"ARRAY_ENTRY"
,
"declaredClass":"com.kwai.koom.demo.javaleak.test.LeakMaker",
"reference":"com.kwai.koom.demo.javaleak.test.BitmapLeakMaker.uselessObjectList",
"referenceType":"INSTANCE_FIELD"
,
"declaredClass":"java.util.ArrayList",
"reference":"java.util.ArrayList.elementData",
"referenceType":"INSTANCE_FIELD"
,
"declaredClass":"java.lang.Object[]",
"reference":"java.lang.Object[]",
"referenceType":"ARRAY_ENTRY"
,
"reference":"android.graphics.Bitmap",
"referenceType":"instance"
],
"signature":"38ba5ba71b7599737372f965417abcf2765dbb2a"
从gc 调用链看出,bitmap 被LeakMaker持有,LeakMaker 被BitmapLeakMaker持有,BitmapLeakMaker被LeakMaker 中静态leakMakerList持有,导致bitmap 一直无法被释放。
接下来看下runningInfo 的部分信息:
"buildModel":"PCLM50",
"currentPage":"javaleak.JavaLeakTestActivity",
"deviceMemAvaliable":"3643.6367",
"deviceMemTotal":"7398.6797",
"dumpReason":"reason_thread_oom",
"fdCount":"138",
"filterInstanceTime":"1.837",
"findGCPathTime":"16.967",
"jvmMax":"384.0",
"jvmUsed":"6.4137344",
"manufacture":"OPPO",
"nowTime":"2022-08-17_15-29-50_432",
"pss":"125.66699mb",
"rss":"161.82812mb",
"sdkInt":"31",
"threadCount":"725"
从上面信息,可知 线程是725个,fd 是138个,当前页面是JavaLeakTestActivity等关键信息。
1.2 studio 解析hprof 文件
接下来,通过studio 解析下koom 生成的泄漏hprof 文件(sdcard/android/data/包名/files/performance/oom/memory/hrof-aly 目录下)。
先查看下UI(framgent/activity)泄漏:
接下来看下,json 文件中bitmap 泄漏的情况:
更多hprof 文件解读,自行百度。
2.NativeLeak:
2.1查看logcat 中输出的native 泄漏的日志:
2022-08-09 11:21:21.987 15584-15696/com.kwai.koom.demo I/NativeLeakTestActivity: Activity: com.kwai.koom.demo.nativeleak.NativeLeakTestActivity@36fb614
//.......
LeakSize: 24 Byte
LeakThread: .kwai.koom.demo
Backtrace:
#0 pc 0x1d9c libnative-leak-test.so
#1 pc 0x190c libnative-leak-test.so
#2 pc 0xda278 libc.so
#3 pc 0x7a448 libc.so
2.2 借用android ndk 工具(ndk-stack或者addr21line )来定位代码位置。
执行addr2line的相关命令:
从上面可以看出native-leak-test.cpp 文件中93行:
static NOINLINE void TestContainerLeak()
std::vector<std::string *> str_vector(NR_TEST_CASE);
for (int i = 0; i < NR_TEST_CASE; i++)
str_vector[i] = new std::string("test_leak_container");
c++ 与java 是很大不同,没有gc 垃圾回收机制,在c++ 中 听new 开辟的内存,必须手动delete 删除。从上面代码可见,通过new 创建了string 指针后,执行完TestContainerLeak()
后并没有delete删除 该内存,因此造成native 泄漏。
3.ThreadLeakMonitor使用
3.1了解下Koom中线程泄漏的案例
先来解读下Koom中线程泄漏案例的代码:
static NOINLINE void TestThreadLeak(int64_t delay)
//这里使用的是c++ thread,使用lamba表达式方式来创建线程,在c++ 函数也是指针。
std::thread test_thread([](int64_t delay)
//设置线程名称为test_thread
pthread_setname_np(pthread_self(), "test_thread");
LOGI("test_thread run");
// 声明线程指针
std::thread *test_thread_1;
std::thread *test_thread_2;
test_thread_1 = new std::thread([]()
pthread_setname_np(pthread_self(), "test_thread_1");
LOGI("test_thread_1 run");
);
test_thread_2 = new std::thread([]()
pthread_setname_np(pthread_self(), "test_thread_2");
LOGI("test_thread_2 run");
);
// 沉睡delay 秒时间,再调用 上面线程的detach()和join()
std::this_thread::sleep_for(std::chrono::milliseconds(delay));
test_thread_1->detach();
LOGI("test_thread_1 detach");
test_thread_2->join();
LOGI("test_thread_2 join");
, delay);
test_thread.detach();
简单来说,创建一个名为test_thread的线程,在其内部开启两个线程test_thread1和test_thread2 ,沉睡指定时间后,再调用它两的detach()和join()。
接下来,看下实际的效果:
3.2 查看线程泄漏的日志
当点击测试案例,线程test_thread 开启线程test_thread2和test_thread_1, 沉睡10秒后再调用它两的join()或者detach()。
先是在logcat中输出一下日志:
2022-08-09 09:57:13.334 13961-26723/com.kwai.koom.demo I/ThreadLeakTest: test_thread run
2022-08-09 09:57:13.335 13961-26726/com.kwai.koom.demo I/ThreadLeakTest: test_thread_2 run
2022-08-09 09:57:13.335 13961-26724/com.kwai.koom.demo I/ThreadLeakTest: test_thread_1 run
监控thread 没有执行join()或者detach()方法下,执行了pthread_exit,则记录下泄露线程信息。
2022-08-09 09:57:13到2022-08-09 09:57:18的间隔时间是5秒,刚好是enableThreadLeakCheck(2 * 1000L, 5 * 1000L)
中的泄漏时间,超过这个时间,就会上报线程泄漏信息。更多详细日志如下所示:
2022-08-09 09:57:18.538 13961-14017/com.kwai.koom.demo I/ThreadLeakTest: tid: 26726
createTime: 284943128292812 Byte
startTime: 284943128354687
endTime: 284943128382916
name: test_thread_2
createCallStack:
#00 pc 0000000000002084 /data/app/~~UA5bVzbMO-QDKhKIgEGpxg==/com.kwai.koom.demo-Fnd4kAPgIstzWnyszO4chg==/lib/arm64/libnative-leak-test.so (BuildId: b3e2c22d2f281ecd24ed2bdd07577439)
#01 pc 00000000000da278 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+64) (BuildId: 1ca28d785d6567d2b225cf978ef04de5)
#02 pc 000000000007a448 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) (BuildId: 1ca28d785d6567d2b225cf978ef04de5)
2022-08-09 09:57:18.538 13961-14017/com.kwai.koom.demo I/ThreadLeakTest: tid: 26724
createTime: 284943128217864 Byte
startTime: 284943128528853
endTime: 284943128671353
name: test_thread_1
createCallStack:
#00 pc 000000000000203c /data/app/~~UA5bVzbMO-QDKhKIgEGpxg==/com.kwai.koom.demo-Fnd4kAPgIstzWnyszO4chg==/lib/arm64/libnative-leak-test.so (BuildId: b3e2c22d2f281ecd24ed2bdd07577439)
#01 pc 00000000000da278 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+64) (BuildId: 1ca28d785d6567d2b225cf978ef04de5)
#02 pc 000000000007a448 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) (BuildId: 1ca28d785d6567d2b225cf978ef04de5)
当然Koom 的线程监控并不影响自身线程的逻辑,2022-08-09 09:57:13到2022-08-09 09:57:23,期间刚好沉睡10秒后,会再调用它两的join()或者detach(),以下日志也刚好验证。
2022-08-09 09:57:23.335 13961-26723/com.kwai.koom.demo I/ThreadLeakTest: test_thread_1 detach
2022-08-09 09:57:23.335 13961-26723/com.kwai.koom.demo I/ThreadLeakTest: test_thread_2 join。
以上是关于Android Koom 处理 app 的OOM 一些系列问题(java /native/thread leak)的主要内容,如果未能解决你的问题,请参考以下文章