Android 12 Watchdog 案例分析集
Posted pecuyu
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Android 12 Watchdog 案例分析集相关的知识,希望对你有一定的参考价值。
文章托管在gitee上 Android Notes , 同步csdn
通常判断是否是发送Watchdog,可以通过以下log:
- 查看 /data/anr 目录下是否生成 Watchdog trace 文件
- events log中查看 watchdog event. 在Watchdog#run的方法中打印event事件
// If we got here, that means that the system is most likely hung.
// First collect stack traces from all threads of the system process.
// Then kill this process so that the system will restart.
EventLog.writeEvent(EventLogTags.WATCHDOG, subject);
m
- system log中查看 watchdog 事件log. 在Watchdog#run的方法中打印Watchdog重启事件
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
Slog.w(TAG, "*** GOODBYE!");
案例一
这个案例是 system_server 发起binder调用到 surfaceflinger,而 surfaceflinger 继续提供 hwbinder 调用 hwc,最终卡在 hwc的相关函数。
查看events log中的Watchdog事件
从中可以发现,系统阻塞在android.fg等线程
03-01 23:29:48.089083 2339 2461 I watchdog: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (android.fg), Blocked in handler on ui thread (android.ui), Blocked in handler on display thread (android.display), Blocked in handler on animation thread (android.anim)
查看system log中的Watchdog重启事件
从log可以看出前后两次生成的trace文件,以及卡住的线程
03-01 23:29:56.357723 2339 2461 E Watchdog: First set of traces taken from /data/anr/anr_2022-03-01-23-29-14-351
03-01 23:29:56.524827 2339 2461 E Watchdog: Second set of traces taken from /data/anr/anr_2022-03-01-23-29-48-401
03-01 23:29:56.712212 2339 29320 I DropBoxManagerService: add tag=system_server_watchdog isTagEnabled=true flags=0x6
03-01 23:29:56.738777 2339 2461 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (android.fg), Blocked in handler on ui thread (android.ui), Blocked in handler on display thread (android.display), Blocked in handler on animation thread (android.anim)
03-01 23:29:56.739727 2339 2461 W Watchdog: android.fg annotated stack trace:
03-01 23:29:56.739791 2339 2461 W Watchdog: at com.android.server.am.ActivityManagerService.monitor(ActivityManagerService.java:16455)
03-01 23:29:56.740235 2339 2461 W Watchdog: - waiting to lock <0x0d483326> (a com.android.server.am.ActivityManagerService)
03-01 23:29:56.740377 2339 2461 W Watchdog: at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:276)
03-01 23:29:56.740448 2339 2461 W Watchdog: at android.os.Handler.handleCallback(Handler.java:938)
03-01 23:29:56.740476 2339 2461 W Watchdog: at android.os.Handler.dispatchMessage(Handler.java:99)
03-01 23:29:56.740486 2339 2461 W Watchdog: at android.os.Looper.loopOnce(Looper.java:238)
03-01 23:29:56.740496 2339 2461 W Watchdog: at android.os.Looper.loop(Looper.java:379)
03-01 23:29:56.740506 2339 2461 W Watchdog: at android.os.HandlerThread.run(HandlerThread.java:67)
03-01 23:29:56.740516 2339 2461 W Watchdog: at com.android.server.ServiceThread.run(ServiceThread.java:44)
03-01 23:29:56.740750 2339 2461 W Watchdog: android.ui annotated stack trace:
03-01 23:29:56.740780 2339 2461 W Watchdog: at com.android.server.wm.RootWindowContainer.onDisplayChanged(RootWindowContainer.java:2916)
03-01 23:29:56.740850 2339 2461 W Watchdog: - waiting to lock <0x033bcac7> (a com.android.server.wm.WindowManagerGlobalLock)
03-01 23:29:56.740882 2339 2461 W Watchdog: at android.hardware.display.DisplayManagerGlobal$DisplayListenerDelegate.handleMessage(DisplayManagerGlobal.java:973)
03-01 23:29:56.740953 2339 2461 W Watchdog: - locked <0x09a04c63> (a android.hardware.display.DisplayManagerGlobal$DisplayListenerDelegate)
03-01 23:29:56.740977 2339 2461 W Watchdog: at android.os.Handler.dispatchMessage(Handler.java:106)
03-01 23:29:56.740987 2339 2461 W Watchdog: at android.os.Looper.loopOnce(Looper.java:238)
03-01 23:29:56.740996 2339 2461 W Watchdog: at android.os.Looper.loop(Looper.java:379)
03-01 23:29:56.741005 2339 2461 W Watchdog: at android.os.HandlerThread.run(HandlerThread.java:67)
03-01 23:29:56.741025 2339 2461 W Watchdog: at com.android.server.ServiceThread.run(ServiceThread.java:44)
03-01 23:29:56.741053 2339 2461 W Watchdog: at com.android.server.UiThread.run(UiThread.java:45)
...
03-01 23:29:56.743657 2339 2461 W Watchdog: *** GOODBYE!
分析 traces_SystemServer_WDT01_03_23_29_56.357
查看Blocked的线程。可以发现main在等待thread 15
"main" prio=5 tid=1 Blocked
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x72a2dc18 self=0xb4000079fc150c00
| sysTid=2339 nice=-2 cgrp=foreground sched=0/0 handle=0x79fd7924f8
| state=S schedstat=( 291309050514 268264950373 1674947 ) utm=15535 stm=13595 core=4 HZ=100
| stack=0x7fdcded000-0x7fdcdef000 stackSize=8188KB
| held mutexes=
at com.android.server.wm.DisplayRotation.updateOrientationListener(DisplayRotation.java:1059)
- waiting to lock <0x033bcac7> (a com.android.server.wm.WindowManagerGlobalLock) held by thread 15
at com.android.server.policy.PhoneWindowManager.finishedGoingToSleep(PhoneWindowManager.java:5031)
at com.android.server.power.Notifier.lambda$handleLateInteractiveChange$3$Notifier(Notifier.java:523)
继续搜索tid=15, 从log看是进行binder通信,等待对端返回。发起的调用即 BpSurfaceComposer::setTransactionState
"android.anim" prio=5 tid=15 Native
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x1acc09c0 self=0xb400007934711400
| sysTid=2459 nice=-4 cgrp=top-app sched=0/0 handle=0x78d3678cb0
| state=S schedstat=( 13497493576 7900077917 56495 ) utm=941 stm=408 core=5 HZ=100
| stack=0x78d3575000-0x78d3577000 stackSize=1039KB
| held mutexes=
native: #00 pc 00000000000e802c /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+12)
native: #01 pc 00000000000a1b48 /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+156)
native: #02 pc 0000000000053f54 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+292)
native: #03 pc 0000000000055ac4 /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+64)
native: #04 pc 0000000000055808 /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+224)
native: #05 pc 000000000004ca1c /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+196)
native: #06 pc 00000000000c89d0 /system/lib64/libgui.so (android::BpSurfaceComposer::setTransactionState(android::FrameTimelineInfo const&, android::Vector<android::ComposerState> const&, android::Vector<android::DisplayState> const&, unsigned int, android::sp<android::IBinder> const&, android::InputWindowCommands const&, long, bool, android::client_cache_t const&, bool, std::__1::vector<android::ListenerCallbacks, std::__1::allocator<android::ListenerCallbacks> > const&, unsigned long)+1648)
native: #07 pc 00000000000f5fd8 /system/lib64/libgui.so (android::SurfaceComposerClient::Transaction::apply(bool)+2500)
at android.view.SurfaceControl.nativeApplyTransaction(Native method)
at android.view.SurfaceControl.access$2900(SurfaceControl.java:91)
at android.view.SurfaceControl$GlobalTransactionWrapper.applyGlobalTransaction(SurfaceControl.java:3747)
at android.view.SurfaceControl.closeTransaction(SurfaceControl.java:1726)
- locked <0x0818e060> (a java.lang.Class<android.view.SurfaceControl>)
at com.android.server.wm.WindowManagerService.closeSurfaceTransaction(WindowManagerService.java:1200)
at com.android.server.wm.RootWindowContainer.performSurfacePlacementNoTrace(RootWindowContainer.java:876)
at com.android.server.wm.RootWindowContainer.performSurfacePlacement(RootWindowContainer.java:825)
at com.android.server.wm.WindowSurfacePlacer.performSurfacePlacementLoop(WindowSurfacePlacer.java:177)
at com.android.server.wm.WindowSurfacePlacer.performSurfacePlacement(WindowSurfacePlacer.java:126)
at com.android.server.wm.WindowSurfacePlacer.performSurfacePlacement(WindowSurfacePlacer.java:115)
at com.android.server.wm.WindowSurfacePlacer$Traverser.run(WindowSurfacePlacer.java:57)
- locked <0x033bcac7> (a com.android.server.wm.WindowManagerGlobalLock)
at android.os.Handler.handleCallback(Handler.java:938)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:238)
at android.os.Looper.loop(Looper.java:379)
at android.os.HandlerThread.run(HandlerThread.java:67)
at com.android.server.ServiceThread.run(ServiceThread.java:44)
通过BpSurfaceComposer::setTransactionState可以知道,对端应该是SurfaceFlinger了。那么怎么确定对端是哪个线程在处理? 通过sysTid=2459 可以知道它在系统的线程id。 接下来,可以在binder通信的log搜索2459:
从log看,发起端是 2339进程的2459线程,接收端是 1205进程的2112线程
// 发起端 system_server
proc 2339
context binder
...
thread 2459: l 10 need_return 0 tr 0
outgoing transaction 365209941: 0000000000000000 from 2339:2459 to 1205:2112 code 8 flags 10 pri 0:116 r1
transaction complete
// 接收端 SurfaceFlinger
proc 1205
context binder
...
thread 2112: l 01 need_return 1 tr 0
incoming transaction 365209941: 0000000000000000 from 2339:2459 to 1205:2112 code 8 flags 10 pri 0:116 r1 node 2483 size 1676:48 data 0000000000000000
接下来看 1205进程的2112线程的调用栈, 从log看,SurfaceFlinger::setTransactionState 又发起了一个 hwbinder 调用 BpHwDisplayConfig::perform
"Binder:1205_3" sysTid=2112
#00 pc 00000000000e802c /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+12) (BuildId: fb76ba9523d8aaae2d41a1fbd415c1dc)
#01 pc 00000000000a1b48 /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+156) (BuildId: fb76ba9523d8aaae2d41a1fbd415c1dc)
#02 pc 0000000000066b40 /system/lib64/libhidlbase.so (android::hardware::IPCThreadState::transact(int, unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int)+1540) (BuildId: c608f6971797e3ae95b93820aa139249)
#03 pc 00000000000663dc /system/lib64/libhidlbase.so (android::hardware::BpHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+80) (BuildId: c608f6971797e3ae95b93820aa139249)
#04 pc 0000000000016340 /system_ext/lib64/vendor.display.config@2.0.so (vendor::display::config::V2_0::BpHwDisplayConfig::_hidl_perform(android::hardware::IInterface*, android::hardware::details::HidlInstrumentor*, unsigned long, unsigned int, android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::hidl_handle> const&, std::__1::function<void (int, android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::hidl_handle> const&)>) (.cfi)+608) (BuildId: a9e3ee6979c034ea920452c69d780029)
#05 pc 0000000000015ad4 /system_ext/lib64/vendor.display.config@2.0.so (vendor::display::config::V2_0::BpHwDisplayConfig::perform(unsigned long, unsigned int, android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::hidl_handle> const&, std::__1::function<void (int, android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::hidl_handle> const&)>)+188) (BuildId: a9e3ee6979c034ea920452c69d780029)
#06 pc 0000000000013fc8 /system_ext/lib64/libdisplayconfig.system.qti.so (DisplayConfig::ClientImpl::CreateVirtualDisplay(unsigned int, unsigned int, int)+188) (BuildId: 7a35e6af402a03a0989fd76de91c38b4)
#07 pc 00000000001ac040 /system/bin/surfaceflinger (android::SurfaceFlinger::setTransactionState(android::FrameTimelineInfo const&, android::Vector<android::ComposerState> const&, android::Vector<android::DisplayState> const&, unsigned int, android::sp<android::IBinder> const&, android::InputWindowCommands const&, long, bool, android::client_cache_t const&, bool, std::__1::vector<android::ListenerCallbacks, std::__1::allocator<android::ListenerCallbacks> > const&, unsigned long)+1184) (BuildId: f0bd600747bdf092064c4802e154f3ef)
#08 pc 00000000000c727c /system/lib64/libgui.so (android::BnSurfaceComposer::onTransact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+14572) (BuildId: 843237695aa14d88272bc2dcc327f092)
#09 pc 00000000001b9f3c /system/bin/surfaceflinger (android::SurfaceFlinger::onTransact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+156) (BuildId: f0bd600747bdf092064c4802e154f3ef)
#10 pc 000000000004a880 /system/lib64/libbinder.so (android::BBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+252) (BuildId: a82e3bba4ac6144031ff6985fc101a95)
#11 pc 0000000000055134 /system/lib64/libbinder.so (android::IPCThreadState::executeCommand(int)+3484) (BuildId: a82e3bba4ac6144031ff6985fc101a95)
#12 pc 00000000000542cc /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+164) (BuildId: a82e3bba4ac6144031ff6985fc101a95)
#13 pc 00000000000554b0 /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+72) (BuildId: a82e3bba4ac6144031ff6985fc101a95)
#14 pc 0000000000081e18 /system/lib64/libbinder.so (android::PoolThread::threadLoop()+28) (BuildId: a82e3bba4ac6144031ff6985fc101a95)
#15 pc 000000000001358c /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+264) (BuildId: 6fc8f24a666a9723e0829fb53721b425)
接下来继续查看对端,根据线程sysTid=2112 在binder log搜索2112确定调用链:
// 发起端 SurfaceFlinger
proc 1205
context hwbinder
...
thread 2112: l 10 need_return 0 tr 0
outgoing transaction 365210005: 0000000000000000 from 1205:2112 to 1124:1458 code 2 flags 10 pri 1:98 r1
// 服务端 hwc
proc 1124
context hwbinder
...
thread 1458: l 02 need_return 1 tr 0
incoming transaction 365210005: 0000000000000000 from 1205:2112 to 1124:1458 code 2 flags 10 pri 1:98 r1 node 1349 size 216:24 data 0000000000000000
接下来看 1124:1458,从堆栈看出是卡住在vendor.qti.hardware.display.composer-service的 HWCSession::WaitForResources。到此,已经定位到了问题点,将问题交由hwc继续分析。最后问题貌似是高通方面提供了patch解决。
"HwBinder:1124_1" sysTid=1458
#00 pc 00000000000940f0 /apex/com.android.runtime/lib64/bionic/libc.so (syscall+32) (BuildId: fb76ba9523d8aaae2d41a1fbd415c1dc)
#01 pc 00000000000989bc /apex/com.android.runtime/lib64/bionic/libc.so (__futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+148) (BuildId: fb76ba9523d8aaae2d41a1fbd415c1dc)
#02 pc 00000000000fc02c /apex/com.android.runtime/lib64/bionic/libc.so (pthread_cond_wait+84) (BuildId: fb76ba9523d8aaae2d41a1fbd415c1dc)
#03 pc 0000000000056ca0 /apex/com.android.vndk.v31/lib64/libc++.so (std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&)+24) (BuildId: 3800ebf760648abf274c4f36afc38008)
#04 pc 00000000000962e4 /vendor/bin/hw/vendor.qti.hardware.display.composer-service (sdm::HWCSession::WaitForResources(bool, unsigned long, unsigned long)+1096) (BuildId: 0ea515b02e255ad4a6681b5df410666b)
#05 pc 00000000000a3e44 /vendor/bin/hw/vendor.qti.hardware.display.composer-service (sdm::HWCSession::DisplayConfigImpl::CreateVirtualDisplay(unsigned int, unsigned int, int)+340) (BuildId: 0ea515b02e255ad4a6681b5df410666b)
#06 pc 000000000001c280 /vendor/lib64/libdisplayconfig.qti.so (DisplayConfig::DeviceImpl::DeviceClientContext::ParseCreateVirtualDisplay(android::hardware::hidl_vec<unsigned char> const&, std::__1::function<void (int, android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::hidl_handle> const&)>)+88) (BuildId: 14bf4b7d6c0063e6da991a1e3abdedff)
#07 pc 00000000000182ac /vendor/lib64/libdisplayconfig.qti.so (DisplayConfig::DeviceImpl::perform(unsigned long, unsigned int, android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::hidl_handle> const&, std::__1::function<void (int, android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::hidl_handle> const&)>)+8824) (BuildId: 14bf4b7d6c0063e6da991a1e3abdedff)
#08 pc 00000000000132d0 /vendor/lib64/vendor.display.config@2.0.so (vendor::display::config::V2_0::BnHwDisplayConfig::_hidl_perform(android::hidl::base::V1_0::BnHwBase*, android::hardware::Parcel const&, android::hardware::Parcel*, std::__1::function<void (android::hardware::Parcel&)>) (.cfi)+724) (BuildId: d203a85715a76890e73a85b38b115ec5)
#09 pc 0000000000012280 /vendor/lib64/vendor.display.config@2.0.so (vendor::display::config::V2_0::BnHwDisplayConfig::onTransact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+336) (BuildId: d203a85715a76890e73a85b38b115ec5)
#10 pc 000000000006d44c /apex/com.android.vndk.v31/lib64/libhidlbase.so (android::hardware::BHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+96) (BuildId: 54569e58303a962f81df8fe06e9274cb)
#11 pc 000000000006b9c0 /apex/com.android.vndk.v31/lib64/libhidlbase.so (android::hardware::IPCThreadState::getAndExecuteCommand()+1112) (BuildId: 54569e58303a962f81df8fe06e9274cb)
#12 pc 000000000006b484 /apex/com.android.vndk.v31/lib64/libhidlbase.so (android::hardware::IPCThreadState::joinThreadPool(bool)+100) (BuildId: 54569e58303a962f81df8fe06e9274cb)
#13 pc 000000000006b408 /apex/com.android.vndk.v31/lib64/libhidlbase.so (android::hardware::PoolThread::threadLoop()+28) (BuildId: 54569e58303a962f81df8fe06e9274cb)
#14 pc 000000000001358c /apex/com.android.vndk.v31/lib64/libutils.so (android::Thread::_threadLoop(void*)+264) (BuildId: 3246d7568b7735dff37d41c19433144c)
案例二
此案例是sensor hal被异常冻结,导致hwbinder通信卡住,导致的Watchdog
查看events log中的Watchdog事件
从中可以发现发生Watchdog,系统阻塞在android.fg等线程
03-05 01:31:13.268348 2144 2446 I watchdog: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (android.fg), Blocked in handler on main thread (main), Blocked in handler on display thread (android.display), Blocked in handler on animation thread (android.anim), Blocked in handler on ActivityManager (ActivityManager)
查看system log中的Watchdog事件
查看system log发现Watchdog并没有kill system, 因为是在跑monkey的原因。
03-05 01:30:36.694976 2144 2446 I Watchdog: WAITED_HALF
03-05 01:31:19.282718 2144 2446 I Watchdog: Collecting Binder Transaction Status Information
03-05 01:31:19.282889 2144 2446 I Watchdog: Collecting Binder state file from binderfs
03-05 01:31:24.533333 2144 2446 E Watchdog: First set of traces taken from /data/anr/anr_2022-03-05-01-30-36-809
03-05 01:31:24.719023 2144 2446 E Watchdog: Second set of traces taken from /data/anr/anr_2022-03-05-01-31-13-619
03-05 01:31:24.946323 2144 2446 I Watchdog: Reporting stuck state to activity controller
分析 traces_SystemServer_WDT04_03_17_31_24.533_pid2144
查看blocked 的 android.fg 线程调用栈,发现HandlerChecker在执行monitor时等锁,而锁被thread 53持有。
"android.fg" prio=5 tid=13 Blocked
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x13d407e0 self=0xb4000078f80f6800
| sysTid=2438 nice=0 cgrp=foreground sched=0/0 handle=0x786f585cb0
| state=S schedstat=( 302218682432 246835604021 994874 ) utm=19112 stm=11109 core=3 HZ=100
| stack=0x786f482000-0x786f484000 stackSize=1039KB
| held mutexes=
at com.android.server.am.ActivityManagerService.monitor(ActivityManagerService.java:16455)
- waiting to lock <0x0fec94e0> (a com.android.server.am.ActivityManagerService) held by thread 53
at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:276)
at android.os.Handler.handleCallback(Handler.java:938)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:238)
at android.os.Looper.loop(Looper.java:379)
at android.os.HandlerThread.run(HandlerThread.java:67)
at com.android.server.ServiceThread.run(ServiceThread.java:44)
搜索 system_server(pid2144) tid=53 线程,它持有AMS锁,但也在等另外一个锁WindowManagerGlobalLock对象,该锁被thread 231持有.
"Binder:2144_D" prio=5 tid=53 Blocked
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x13d5f858 self=0xb400007834b6b000
| sysTid=4973 nice=-2 cgrp=foreground sched=0/0 handle=0x7867a8ccb0
| state=S schedstat=( 1116118050555 488843391069 8552582 ) utm=71855 stm=39756 core=0 HZ=100
| stack=0x7867995000-0x7867997000 stackSize=991KB
| held mutexes=
at com.android.server.wm.ActivityTaskManagerService$LocalService.compatibilityInfoForPackage(ActivityTaskManagerService.java:6227)
- waiting to lock <0x0a6587e6> (a com.android.server.wm.WindowManagerGlobalLock) held by thread 231
at com.android.server.am.ActivityManagerService.compatibilityInfoForPackage(ActivityManagerService.java:3044)
at com.android.server.am.ActiveServices.realStartServiceLocked(ActiveServices.java:4384)
at com.android.server.am.ActiveServices.bringUpServiceLocked(ActiveServices.java:4247)
at com.android.server.am.ActiveServices.startServiceInnerLocked(ActiveServices.java:1284)
at com.android.server.am.ActiveServices.startServiceInnerLocked(ActiveServices.java:1073)
at com.android.server.am.ActiveServices.startServiceLocked(ActiveServices.java:971)
at com.android.server.am.ActiveServices.startServiceLocked(ActiveServices.java:703)
at com.android.server.am.ActivityManagerService.startService(ActivityManagerService.java:12968)
- locked <0x0fec94e0> (a com.android.server.am.ActivityManagerService)
at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:2749)
at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2765)
at android.os.Binder.execTransactInternal(Binder.java:1198)
at android.os.Binder.execTransact(Binder.java:1157)
搜索 system_server(pid2144) tid=231 线程, 它在执行SensorService::disable时在等锁(MutexLockWithTimeout),因此需要找到是哪个线程正在执行. 我们可以搜索/system/lib64/libsensorservice.so,查看此服务哪个线程在持锁执行任务.
"Binder:2144_14" prio=5 tid=231 Native
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x13d60698 self=0xb40000783c268800
| sysTid=8793 nice=-10 cgrp=foreground sched=0/0 handle=0x786746ccb0
| state=S schedstat=( 1132724004197 492411705656 8727320 ) utm=72806 stm=40465 core=5 HZ=100
| stack=0x7867375000-0x7867377000 stackSize=991KB
| held mutexes=
native: #00 pc 00000000000940f0 /apex/com.android.runtime/lib64/bionic/libc.so (syscall+32)
native: #01 pc 00000000000989bc /apex/com.android.runtime/lib64/bionic/libc.so (__futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+148)
native: #02 pc 00000000000fe04c /apex/com.android.runtime/lib64/bionic/libc.so (NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*, bool, timespec const*)+224)
native: #03 pc 00000000000366b0 /system/lib64/libsensorservice.so (android::SensorService::disable(android::sp<android::SensorService::SensorEventConnection> const&, int)+68)
native: #04 pc 00000000000287b0 /system/lib64/libsensorservice.so (android::SensorService::SensorEventConnection::enableDisable(int, bool, long, long, int)+320)
native: #05 pc 00000000001ab8f8 /system/lib64/libandroid_runtime.so ((anonymous namespace)::nativeDisableSensor(_JNIEnv*, _jclass*, long, int)+120)
at android.hardware.SystemSensorManager$BaseEventQueue.nativeDisableSensor(Native method)
at android.hardware.SystemSensorManager$BaseEventQueue.disableSensor(SystemSensorManager.java:823)
at android.hardware.SystemSensorManager$BaseEventQueue.removeAllSensors(SystemSensorManager.java:743)
at android.hardware.SystemSensorManager.unregisterListenerImpl(SystemSensorManager.java:242)
- locked <0x0be8257d> (a java.util.HashMap)
at android.hardware.SensorManager.unregisterListener(SensorManager.java:679)
at com.android.server.wm.WindowOrientationListener.disable(WindowOrientationListener.java:209)
- locked <0x08f23c72> (a java.lang.Object)
at com.android.server.wm.DisplayRotation$OrientationListener.disable(DisplayRotation.java:1745)
at com.android.server.wm.DisplayRotation.updateOrientationListenerLw(DisplayRotation.java:1146)
at com.android.server.wm.DisplayRotation.updateOrientationListener(DisplayRotation.java:1060)
- locked <0x0a6587e6> (a com.android.server.wm.WindowManagerGlobalLock)
at com.android.server.wm.DisplayRotation.noteUpdateOrientationFromApp(DisplayRotation.java:1856)
at com.android.server.wm.RootWindowContainer.ensureVisibilityAndConfig(RootWindowContainer.java:1880)
at com.android.server.wm.Task.resumeTopActivityInnerLocked(Task.java:6745)
at com.android.server.wm.Task.resumeTopActivityUncheckedLocked(Task.java:6359)
at com.android.server.wm.Task.resumeTopActivityUncheckedLocked(Task.java:6372)
at com.android.server.wm.RootWindowContainer.resumeFocusedTasksTopActivities(RootWindowContainer.java:2660)
at com.android.server.wm.ActivityStarter.resumeTargetRootTaskIfNeeded(ActivityStarter.java:3123)
at com.android.server.wm.ActivityStarter.recycleTask(ActivityStarter.java:2399)
at com.android.server.wm.ActivityStarter.startActivityInner(ActivityStarter.java:2026)
at com.android.server.wm.ActivityStarter.startActivityUnchecked(ActivityStarter.java:1833)
at com.android.server.wm.ActivityStarter.executeRequest(ActivityStarter.java:1433)
at com.android.server.wm.ActivityStarter.execute(ActivityStarter.java:829)
- locked <0x0a6587e6> (a com.android.server.wm.WindowManagerGlobalLock)
at com.android.server.wm.ActivityTaskManagerService.startActivityAsUser(ActivityTaskManagerService.java:1510)
at com.android.server.wm.ActivityTaskManagerService.startActivityAsUser(ActivityTaskManagerService.java:1435)
at com.android.server.am.ActivityManagerService.startActivityAsUserWithFeature(ActivityManagerService.java:3200)
at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:4294)
at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2765)
at android.os.Binder.execTransactInternal(Binder.java:1198)
at android.os.Binder.execTransact(Binder.java:1157)
通过搜索发现SensorService的其他线程都在等锁,只有下面这个线程在进行hwbinder通信,其对端是sensor hal(android.hardware.sensors@2.0)
"Binder:2144_A" prio=5 tid=127 Native
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x13d5c810 self=0xb400007870095400
| sysTid=3855 nice=0 cgrp=foreground sched=0/0 handle=0x77e63cccb0
| state=S schedstat=( 1189398409434 521393263938 9255328 ) utm=76615 stm=42324 core=1 HZ=100
| stack=0x77e62d5000-0x77e62d7000 stackSize=991KB
| held mutexes=
native: #00 pc 00000000000e802c /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+12)
native: #01 pc 00000000000a1b48 /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+156)
native: #02 pc 00000000000714d4 /system/lib64/libhidlbase.so (android::hardware::IPCThreadState::transact(int, unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int)+424)
native: #03 pc 00000000000711cc /system/lib64/libhidlbase.so (android::hardware::BpHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+80)
native: #04 pc 000000000002245c /system/lib64/android.hardware.sensors@2.0.so (android::hardware::sensors::V2_0::BpHwSensors::_hidl_activate(android::hardware::IInterface*, android::hardware::details::HidlInstrumentor*, int, bool) (.cfi)+408)
native: #05 pc 000000000001aef4 /system/lib64/libsensorservice.so (android::SensorDevice::activateLocked(void*, int, int)+1144)
native: #06 pc 000000000001c67c /system/lib64/libsensorservice.so (android::SensorDevice::activate(void*, int, int)+76)
native: #07 pc 0000000000036730 /system/lib64/libsensorservice.so (android::SensorService::disable(android::sp<android::SensorService::SensorEventConnection> const&, int)+196)
native: #08 pc 00000000000287b0 /system/lib64/libsensorservice.so (android::SensorService::SensorEventConnection::enableDisable(int, bool, long, long, int)+320)
native: #09 pc 000000000000ccd0 /system/lib64/libsensor.so (android::BnSensorEventConnection::onTransact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+460)
native: #10 pc 000000000004a880 /system/lib64/libbinder.so (android::BBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+252)
native: #11 pc 00000000000551c4 /system/lib64/libbinder.so (android::IPCThreadState::executeCommand(int)+3628)
native: #12 pc 00000000000542cc /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+164)
native: #13 pc 000000000005553c /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+72)
native: #14 pc 0000000000081ea4 /system/lib64/libbinder.so (android::PoolThread::threadLoop()+28)
native: #15 pc 000000000001358c /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+264)
native: #16 pc 00000000000bf47c /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+148)
native: #17 pc 0000000000012de8 /system/lib64/libutils.so (thread_data_t::trampoline(thread_data_t const*)+408)
native: #18 pc 00000000000fcebc /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+400)
native: #19 pc 00000000000994bc /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68)
通过 sysTid=3855 在binder transaction log可以知道其通信对端是进程1137,线程号0(也就是没有线程处理),通常这种情况是因为该进程状态异常(比如进程已死或者处于D状态)或者binder线程没有空闲线程
proc 2144
context hwbinder
thread 3855: l 10 need_return 0 tr 0
outgoing transaction 625975487: 00000000777dc692 from 2144:3855 to 1137:0 code 3 flags 10 pri 0:120 r1
proc 1137
context hwbinder
thread 1137: l 12 need_return 0 tr 0
pending transaction 625975487: 00000000777dc692 from 2144:3855 to 1137:0 code 3 flags 10 pri 0:120 r1 node 7558 size 48:0 data 000000006f380266
接下来使用ps查看sensor hal进程,可以发现进程1137处于D状态:
# ps-A | grep 1137
USER PID PPID VSZ RSS WCHAN ADDR S NAME
system 1137 1 2911284 6680 __refrigerator 0 D android.hardware.sensors@2.1-service.mu1tihal
查看其cgroup如下,此进程应该是处于冻结状态
# cat /proc/1137/cgroup
5:freezer:/bg
4:memory:/
3:cpuset:/system-background
2:cpu:/
1:blkio:/
0::/uid_1000/pid_1137
# cat/proc/1137/stat
1137(sensors@2.1-ser) D 1 1137 0 0 -1 4260096 173126 5796 220 0 18273 21477 13 25 20 0 36 0 710 2981154816 1582 1844674 4073709551615 407914143744 407914214216 549746513600 0 0 0 2048 0 1073777912 00017 200400 407914227088 407914227 088 408491892736 549746516604 549746516665 549746516665 549746519995 0
此案与冻结实现同事沟通,原因是复用时刚好复用了一个之前解冻时没有被成功解冻的进程号,导致出现hwbinder通信卡住,此问题需要优化冻结相关的逻辑来解决此问题。
以上是关于Android 12 Watchdog 案例分析集的主要内容,如果未能解决你的问题,请参考以下文章