Android带你细看Android input系统中ANR的机制

Posted 2023-02-03 一代小强

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Android带你细看Android input系统中ANR的机制相关的知识，希望对你有一定的参考价值。

“本文基于android13源码，分析Input系统的Anr实现原理“

在文章之前，先提几个问题：

如果在activity任意周期（onCreate,onResume等），同步执行耗时超过5s（ANR时间）的任务，期间不进行点击，那会触发ANR吗？
如果在button点击的时候，在onClick回调同步执行耗时超过5s的任务。点击一次会触发ANR吗？点击2次呢，3次呢？

1、ANR 分类

首先看一下anr的分类：

Input ANR：按键或触摸事件在5s内没有相应，主要在activity、fragment中。
Service anr：前台service 响应时间是20s，后台service是200s。
Broadcast anr：前台广播是10s，后台广播是60s。
ContentProvider anr：publish执行未在10s内完成。
startForgoundService：应用调用startForegroundService，然后5s内未调用startForeground出现ANR或者Crash

有些小伙伴可能好奇，为啥没有Activity ANR的分类？Activity ANR准确的来说是——Input系统检测，触发activity 的anr。所以本文将通过input系统来讲述Android是如何触发activity的anr。

2、InputDispatcher

在了解Input Anr 原理之前，我们简单了解一下InputDispatcher是如何分发按键事件的。

Inputdispatcher中，在线程里面调用到dispatchOnce方法，该方法中主要做：

通过dispatchOnceInnerLocked()，取出mInboundQueue 里面的 EventEntry事件
通过enqueueDispatchEntryLocked()，生成事件DispatchEntry并加入connection的outbound队列。
通过startDispatchCycleLocked()，从outboundQueue中取出事件DispatchEntry, 重新放入connection的waitQueue队列。同时通过inputPublisher.publishKeyEvent() 方法将按键事件分发给java层。
通过processAnrsLocked()，判断是否需要触发ANR。

按键事件存储在3个queue中：

InputDispatcher的mInboundQueue：存储的是从InputReader 送来的输入事件。
Connection的outboundQueue：该队列是存储即将要发送给应用的输入事件。
Connection的waitQueue：队列存储的是已经发给应用的事件，但是应用还未处理完成的。

2.1 dispatchOnce

dispatchOnce() 中主要就是调用如下的两个方法：

事件分发：dispatchOnceInnerLocked()
检查ANR：processAnrsLocked()

> frameworks/native/services/inputflinger/dispatcher/InputDispatcher.cpp

void InputDispatcher::dispatchOnce() 
    nsecs_t nextWakeupTime = LONG_LONG_MAX; 
    
        ...
        // 如果没有挂起的命令，则运行调度循环。调度循环可能会将命令排入队列，以便稍后运行。
        if (!haveCommandsLocked()) 
            dispatchOnceInnerLocked(&nextWakeupTime);
        
        // 运行所有挂起的命令（如果有）。如果运行了任何命令，则强制下一次轮询立即唤醒。
        if (runCommandsLockedInterruptable()) 
            nextWakeupTime = LONG_LONG_MIN;
        
        ...
        // 我们可能必须早点醒来以检查应用程序是否正处于anr
        const nsecs_t nextAnrCheck = processAnrsLocked();
     
    // 等待回调、超时或唤醒。
    nsecs_t currentTime = now();
    int timeoutMillis = toMillisecondTimeoutDelay(currentTime, nextWakeupTime);
    mLooper->pollOnce(timeoutMillis);

我们先简单回顾下事件分发过程

3、事件分发

3.1 dispatchOnceInnerLocked

该方法主要是：

从mInboundQueue 中取出mPendingEvent
通过mPendingEvent的type决定事件类型和分发方式。比如当前是key类型。

主要代码如下：

> frameworks/native/services/inputflinger/dispatcher/InputDispatcher.cpp
  
void InputDispatcher::dispatchOnceInnerLocked(nsecs_t* nextWakeupTime) 
    nsecs_t currentTime = now();
    ...
    // 优化应用切换的延迟。本质上，当按下应用程序切换键（HOME）时，我们会开始一个短暂的超时。
    // 当它过期时，我们会抢占调度并删除所有其他挂起的事件。
    bool isAppSwitchDue = mAppSwitchDueTime <= currentTime;
    // 当前没有PendingEvent（即EventEntry），则取一个
    if (!mPendingEvent) 
         ...
        //  mInboundQueue不为空 ，就从队列前面取一个PendingEvent
            mPendingEvent = mInboundQueue.front();
            mInboundQueue.pop_front();
            traceInboundQueueLengthLocked();
    
    ...

3.2 enqueueDispatchEntryLocked

enqueueDispatchEntryLocked() 会创建一个新的DispatchEntry，然后将DispatchEntry 加入到connection#outboundQueue 中

> frameworks/native/services/inputflinger/dispatcher/InputDispatcher.cpp
  
void InputDispatcher::enqueueDispatchEntryLocked(const sp<Connection>& connection,
                                                 std::shared_ptr<EventEntry> eventEntry,
                                                 const InputTarget& inputTarget,
                                                 int32_t dispatchMode) 
    // 这是一个新事件。将新的调度条目排队到此连接的出站队列中。
    std::unique_ptr<DispatchEntry> dispatchEntry =
            createDispatchEntry(inputTarget, eventEntry, inputTargetFlags);
    ...
    // 将生成的dispatchEntry 加入到 connection的outboundQueue 中
    connection->outboundQueue.push_back(dispatchEntry.release());
    traceOutboundQueueLength(*connection);

3.3 startDispatchCycleLocked

该方法主要通过connection 发布最终的事件，至此，InputDispatcher完成事件的发布，并且将发布的事件保存在connection的waitQueue中。

> frameworks/native/services/inputflinger/dispatcher/InputDispatcher.cpp
  
void InputDispatcher::startDispatchCycleLocked(nsecs_t currentTime,
                                               const sp<Connection>& connection) 
    while (connection->status == Connection::Status::NORMAL && !connection->outboundQueue.empty()) 
         // 从outboundQueue 队列中取出 DispatchEntry
        DispatchEntry* dispatchEntry = connection->outboundQueue.front();
        const std::chrono::nanoseconds timeout = getDispatchingTimeoutLocked(connection);
        // 设置超时时间
        dispatchEntry->timeoutTime = currentTime + timeout.count();
        // 发布事件
        status_t status;
        const EventEntry& eventEntry = *(dispatchEntry->eventEntry);
        ...
        // 在等待队列上重新排队事件。
        connection->outboundQueue.erase(std::remove(connection->outboundQueue.begin(),
                                                    connection->outboundQueue.end(),
                                                    dispatchEntry));
        // 在waitQueue 尾部重新插入
        connection->waitQueue.push_back(dispatchEntry);
        if (connection->responsive) 
            // 插入事件对应的anr检查时间
            mAnrTracker.insert(dispatchEntry->timeoutTime,
                               connection->inputChannel->getConnectionToken());

3.4 ANR超时时间

由 startDispatchCycleLocked() 方法，知道是通过getDispatchingTimeoutLocked 获取到超时时间

> frameworks/native/services/inputflinger/dispatcher/InputDispatcher.cpp
  

//  如果没有用于确定适当调度超时的焦点应用程序或暂停窗口，则默认输入调度超时。
const std::chrono::duration DEFAULT_INPUT_DISPATCHING_TIMEOUT = std::chrono::milliseconds(
        android::os::IInputConstants::UNMULTIPLIED_DEFAULT_DISPATCHING_TIMEOUT_MILLIS *
        HwTimeoutMultiplier());

std::chrono::nanoseconds InputDispatcher::getDispatchingTimeoutLocked(
        const sp<Connection>& connection) 
    if (connection->monitor) 
         // 返回监控的超时时间
        return mMonitorDispatchingTimeout;
    
    const sp<WindowInfoHandle> window =
            getWindowHandleLocked(connection->inputChannel->getConnectionToken());
    if (window != nullptr) 
        // 可以找到focused Window
        return window->getDispatchingTimeout(DEFAULT_INPUT_DISPATCHING_TIMEOUT);
    
    // 获取默认的值
    return DEFAULT_INPUT_DISPATCHING_TIMEOUT;

WindowInfoHandle#getDispatchingTimeout 返回的值如下

> libs/gui/include/gui/WindowInfo.h

class WindowInfoHandle : public RefBase 
  inline std::chrono::nanoseconds getDispatchingTimeout(
           std::chrono::nanoseconds defaultValue) const 
      return mInfo.token ? std::chrono::nanoseconds(mInfo.dispatchingTimeout) : defaultValue;
  


struct WindowInfo : public Parcelable 
    std::chrono::nanoseconds dispatchingTimeout = std::chrono::seconds(5); // 5 秒

DEFAULT_INPUT_DISPATCHING_TIMEOUT 主要由UNMULTIPLIED_DEFAULT_DISPATCHING_TIMEOUT_MILLIS * HwTimeoutMultiplier() 计算得到

UNMULTIPLIED_DEFAULT_DISPATCHING_TIMEOUT_MILLIS的值如下

> android/os/IInputConstants.h
  
class IInputConstants : public ::android::IInterface 
public:
  enum : int32_t  UNMULTIPLIED_DEFAULT_DISPATCHING_TIMEOUT_MILLIS = 5000 ;
  ....
;  // class IInputConstants

HwTimeoutMultiplier() 方法定义如下，即读ro.hw_timeout_multiplier 属性值，默认是1。

> system/libbase/include/android-base/properties.h
 
static inline int HwTimeoutMultiplier() 
  return android::base::GetIntProperty("ro.hw_timeout_multiplier", 1);

3.5 调用栈

native层的事件分发调用栈如下

libs/input/InputTransport.cpp : InputPublisher::publishMotionEvent()
services/inputflinger/dispatcher/InputDispatcher.cpp : InputDispatcher::startDispatchCycleLocked()
services/inputflinger/dispatcher/InputDispatcher.cpp : InputDispatcher::enqueueDispatchEntriesLocked()
services/inputflinger/dispatcher/InputDispatcher.cpp : InputDispatcher::prepareDispatchCycleLocked()
services/inputflinger/dispatcher/InputDispatcher.cpp : InputDispatcher::dispatchKeyLocked()
services/inputflinger/dispatcher/InputDispatcher.cpp : InputDispatcher::dispatchOnceInnerLocked()
services/inputflinger/dispatcher/InputDispatcher.cpp : InputDispatcher::dispatchOnce()
services/inputflinger/dispatcher/InputDispatcher.cpp : InputDispatcher::start()

4、ANR触发

在dispatchOnce()，会调用processAnrsLocked 方法来决定是否需要触发anr

> frameworks/native/services/inputflinger/dispatcher/InputDispatcher.cpp
  
void InputDispatcher::dispatchOnce() 
    ...
    // 我们可能必须早点醒来以检查应用程序是否正处于anr
    const nsecs_t nextAnrCheck = processAnrsLocked();
    ....

4.1 processAnrsLocked

该方法是用于检查队列中是否有太旧的事件，如果存在就触发ANR

> frameworks/native/services/inputflinger/dispatcher/InputDispatcher.cpp
  
// 检查是否有任何连接的等待队列具有太旧的事件。如果我们等待事件被确认的时间超过窗口超时，
// 请引发 ANR。返回我们下次应该醒来的时间。
nsecs_t InputDispatcher::processAnrsLocked() 
    const nsecs_t currentTime = now();
    nsecs_t nextAnrCheck = LONG_LONG_MAX; // 下一次检查anr的时间
    // 检查我们是否正在等待一个聚焦窗口出现。如果等待时间过长就报 ANR
    if (mNoFocusedWindowTimeoutTime.has_value() && mAwaitedFocusedApplication != nullptr) 
        if (currentTime >= *mNoFocusedWindowTimeoutTime) 
            // 场景1: 触发noFocusedWindow的anr
            processNoFocusedWindowAnrLocked();
            mAwaitedFocusedApplication.reset();
            mNoFocusedWindowTimeoutTime = std::nullopt;
            return LONG_LONG_MIN;
         else 
            // 请继续等待。我们将在mNoFocusedWindowTimeoutTime到来时放弃该事件。
            nextAnrCheck = *mNoFocusedWindowTimeoutTime;
        
    
    // 检查是否有任何连接 ANR 到期，mAnrTracker 中保存所有已分发事件（未被确认消费的事件）的超时时间
    nextAnrCheck = std::min(nextAnrCheck, mAnrTracker.firstTimeout());
    if (currentTime < nextAnrCheck)  // 最有可能的情况
        // 一切正常，在 nextAnrCheck 再检查一次
        return nextAnrCheck;
    
    // 如果我们到达这里，则连接无响应。
    sp<Connection> connection = getConnectionLocked(mAnrTracker.firstToken());
    // 停止为此无响应的连接唤醒
    mAnrTracker.eraseToken(connection->inputChannel->getConnectionToken());
    // 场景2: 触发ANR
    onAnrLocked(connection);
    return LONG_LONG_MIN;

其中，mAnrTracker 存储已经成功分发给应用的事件。详情见startDispatchCycleLocked() 方法。

mNoFocusedWindowTimeoutTime 是在findFocusedWindowTargetsLocked() 方法中赋值的，在分发事件的时候会调用到findFocusedWindowTargetsLocked() :

> frameworks/native/services/inputflinger/dispatcher/InputDispatcher.cpp
  
InputEventInjectionResult InputDispatcher::findFocusedWindowTargetsLocked(
        nsecs_t currentTime, const EventEntry& entry, std::vector<InputTarget>& inputTargets,
        nsecs_t* nextWakeupTime) 
  ...
    // 兼容性行为：如果存在焦点应用程序但没有焦点窗口，则引发 ANR。只有当我们有重点事件要调度时，才开始计数。
    // 如果我们开始通过触摸（应用程序开关）与另一个应用程序交互，则 ANR 将被取消。
    // 如果将“无聚焦窗口 ANR”移动到策略中，则可以删除此代码。输入不知道应用是否应具有焦点窗口。
    if (focusedWindowHandle == nullptr && focusedApplicationHandle != nullptr) 
        if (!mNoFocusedWindowTimeoutTime.has_value()) 
            // 发现没有focusedWindow，就添加ANR定时器。
            std::chrono::nanoseconds timeout = focusedApplicationHandle->getDispatchingTimeout(
                    DEFAULT_INPUT_DISPATCHING_TIMEOUT);
            mNoFocusedWindowTimeoutTime = currentTime + timeout.count();
            ....
            return InputEventInjectionResult::PENDING;
        
    
  
    // 找到一个focusedwindow，就取消ANR定时器
    resetNoFocusedWindowTimeoutLocked();
  ...


void InputDispatcher::resetNoFocusedWindowTimeoutLocked() 
    // 取消ANR定时器
    mNoFocusedWindowTimeoutTime = std::nullopt;
    mAwaitedFocusedApplication.reset();

从上面的代码我们能小结出两个场景ANR的条件：

有等待获取焦点的应用：当前时间超过Timeout，调用processNoFocusedWindowAnrLocked() 进一步确认
存在window：当前时间超过事件响应的超时时间。调用onAnrLocked() 进一步确认。

4.2 processNoFocusedWindowAnrLocked

该方法触发anr的条件是：

当前关注的应用程序必须与我们等待的应用程序相同。
确保我们仍然没有聚焦窗口。

processNoFocusedWindowAnrLocked 最后也是调用到onAnrLocked。

> frameworks/native/services/inputflinger/dispatcher/InputDispatcher.cpp

//  如果没有聚焦窗口，请触发ANR。在触发 ANR 之前，请执行最终状态检查： 
void InputDispatcher::processNoFocusedWindowAnrLocked() 
    std::shared_ptr<InputApplicationHandle> focusedApplication =
            getValueByKey(mFocusedApplicationHandlesByDisplay, mAwaitedApplicationDisplayId);
    if (focusedApplication == nullptr ||
        focusedApplication->getApplicationToken() !=
                mAwaitedFocusedApplication->getApplicationToken()) 
        // 出乎意料，因为当前焦点应用程序已被更改，我们应该重置 ANR 计时器
        return;
    
    const sp<WindowInfoHandle>& focusedWindowHandle =
            getFocusedWindowHandleLocked(mAwaitedApplicationDisplayId);
    if (focusedWindowHandle != nullptr) 
        //我们现在有一个焦点window，不需要再触发ANR
        return;
    
    onAnrLocked(mAwaitedFocusedApplication);

onAnrLocked 有两种实现：

能找到当前focus的window
找不到当前focus的window，但是可以找到当前前台应用。

我们先看情况1

4.3 onAnrLocked（connection）

> frameworks/native/services/inputflinger/dispatcher/InputDispatcher.cpp
  
//情况1: 能找到window的情况
void InputDispatcher::onAnrLocked(const sp<Connection>& connection) 
    // 由于我们允许策略延长超时，因此 waitQueue 可能已经再次正常运行。在这种情况下不要触发 ANR
    if (connection->waitQueue.empty()) 
        return;
    
     // “最旧的条目”是首次发送到应用程序的条目。但是，该条目可能不是导致超时发生的条目。
     // 一种可能性是窗口超时已更改。这可能会导致较新的条目在已分派的条目之前超时。
     // 在这种情况下，最新条目会导致 ANR。但很有可能，该应用程序会线性处理事件。
     // 因此，提供有关最早条目的信息似乎是最有用的。
    DispatchEntry* oldestEntry = *connection->waitQueue.begin();
    // 获取到超时时长
    const nsecs_t currentWait = now() - oldestEntry->deliveryTime;
    std::string reason =  
            android::base::StringPrintf("%s is not responding. Waited %" PRId64 "ms for %s",
                                        connection->inputChannel->getName().c_str(),
                                        ns2ms(currentWait),
                                        oldestEntry->eventEntry->getDescription().c_str());
    sp<IBinder> connectionToken = connection->inputChannel->getConnectionToken();
    // 生成 reason 报告
    updateLastAnrStateLocked(getWindowHandleLocked(connectionToken), reason);
    processConnectionUnresponsiveLocked(*connection, std::move(reason));
    // 停止唤醒此连接上的事件，它已经没有响应
    cancelEventsForAnrLocked(connection);

// 捕获 ANR 时 InputDispatcher 状态的记录。
void InputDispatcher::updateLastAnrStateLocked(const std::string& windowLabel,
                                               const std::string& reason) 
    ....
    dumpDispatchStateLocked(mLastAnrState);

4.3.1 dumpDispatchStateLocked

dumpDispatchStateLocked 函数主要打印当前window和事件队列信息。执行dumpsys input 命令，dumpDispatchStateLocked函数输出的内容如下：

Input Dispatcher State:
    ....
  PendingEvent: <none> // 当前正在调度转储事件。
  InboundQueue: <empty> // Inbound 队列
  ReplacedKeys: <empty>
  Connections:
    317: channelName='cf1eda9 com.example.anrdemo/com.example.anrdemo.MainActivity (server

   
 (c)2006-2024 SYSTEM All Rights Reserved  IT常识