突然弹出王者荣耀停止运行,GC超时导致的后台应用崩溃问题分析
Posted River_ly
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了突然弹出王者荣耀停止运行,GC超时导致的后台应用崩溃问题分析相关的知识,希望对你有一定的参考价值。
写在前面
这个问题之所以会拿出来仔细分析,一方面是因为这个问题不是简单的应用崩溃而是框架层的报错,另一方面是因为希望通过这个问题梳理下后台GC的超时检测机制怎样的,这样我们后面在应用层如果重写finalize
方法回收时会考虑的更加全面点。
问题背景
复现概率: 偶现
问题版本: android R
问题现象: 处于微信界面,突然弹出王者荣耀停止运行
初步分析
拿到问题日志后,先看下报错的堆栈。
09-02 20:53:26.679 2073 2089 E AndroidRuntime: FATAL EXCEPTION: FinalizerWatchdogDaemon
09-02 20:53:26.679 2073 2089 E AndroidRuntime: Process: com.tencent.tmgp.sgame:xg_vip_service, PID: 2073
09-02 20:53:26.679 2073 2089 E AndroidRuntime: java.util.concurrent.TimeoutException: android.database.BulkCursorToCursorAdaptor.finalize() timed out after 10 seconds
//省略部分堆栈
09-02 20:53:26.679 2073 2089 E AndroidRuntime:at android.database.AbstractCursor.finalize(AbstractCursor.java:524)
09-02 20:53:26.679 2073 2089 E AndroidRuntime:at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:291)
09-02 20:53:26.679 2073 2089 E AndroidRuntime:at java.lang.Daemons$FinalizerDaemon.runInternal(Daemons.java:278)
单单从这段堆栈看的话,BulkCursorToCursorAdaptor
执行finalize
超过了10s,导致FinalizerWatchdogDaemon
报错,FinalizerWatchdogDaemon
字面上看像是监测回收超时的守护线程。
看下FinalizerWatchdogDaemon
代码中的作用解释。
/**
* The watchdog exits the VM if the finalizer ever gets stuck. We consider
* the finalizer to be stuck if it spends more than MAX_FINALIZATION_MILLIS
* on one instance.
*/
private static class FinalizerWatchdogDaemon extends Daemon
@UnsupportedAppUsage
private static final FinalizerWatchdogDaemon INSTANCE = new FinalizerWatchdogDaemon();
private boolean needToWork = true; // Only accessed in synchronized methods.
private long finalizerTimeoutNs = 0; // Lazily initialized.
FinalizerWatchdogDaemon()
super("FinalizerWatchdogDaemon");
简单解释下就是:如果对象的finalize
出现阻塞超时了会导致进程退出
这个问题中对应的是数据库的关闭,当然也可以发生在其它场景下,只要重写了成员函数finalize
的对象都有可能会遇到这个问题,所以如果再遇到GC超时的报错,报错堆栈AndroidRuntime:at java.lang.Daemons$
上面的内容可能会不一样。
那么对于重写了成员函数finalize的对象,当它们被GC决定要被回收时,会立刻回收吗?
其实不会马上被回收,而是被放入到一个队列中,等待FinalizerDaemon
守护线程去调用它们的成员函数finalize
后再被回收。
/**
* This heap management thread moves elements from the garbage collector's
* pending list to the managed reference queue.
*/
private static class ReferenceQueueDaemon extends Daemon
@UnsupportedAppUsage
private static final ReferenceQueueDaemon INSTANCE = new ReferenceQueueDaemon();
ReferenceQueueDaemon()
super("ReferenceQueueDaemon");
@Override public void runInternal()
while (isRunning())
Reference<?> list;
try
synchronized (ReferenceQueue.class)
while (ReferenceQueue.unenqueued == null)
ReferenceQueue.class.wait();
list = ReferenceQueue.unenqueued;
ReferenceQueue.unenqueued = null;
catch (InterruptedException e)
continue;
catch (OutOfMemoryError e)
continue;
ReferenceQueue.enqueuePending(list);
超时阈值
// This used to be final. IT IS NOW ONLY WRITTEN. We now update it when we look at the command
// line argument, for the benefit of mis-behaved apps that might read it. SLATED FOR REMOVAL.
// There is no reason to use this: Finalizers should not rely on the value. If a finalizer takes
// appreciable time, the work should be done elsewhere. Based on disassembly of Daemons.class,
// the value is effectively inlined, so changing the field never did have an effect.
// DO NOT USE. FOR ANYTHING. THIS WILL BE REMOVED SHORTLY.
@UnsupportedAppUsage
private static long MAX_FINALIZE_NANOS = 10L * 1000 * NANOS_PER_MILLI;
注释中对于该值的说明是它很快将被移除,实际这个值在代码中并没有起到真正的作用了,更新它的值是为了方便在外边读取到。
真正的超时阈值是通过VMRuntime.getFinalizerTimeoutMs
获取,默认值是10s.
finalizer_timeout_ms_ = runtime_options.GetOrDefault(Opt::FinalizerTimeoutMs);
RUNTIME_OPTIONS_KEY (unsigned int, FinalizerTimeoutMs, 10000u)
超时检测
通过watchdog机制检测finalizer
在超时时间内有没有成功析构回收对象。
* The watchdog exits the VM if the finalizer ever gets stuck. We consider
* the finalizer to be stuck if it spends more than MAX_FINALIZATION_MILLIS
* on one instance.
*/
private static class FinalizerWatchdogDaemon extends Daemon
@UnsupportedAppUsage
private static final FinalizerWatchdogDaemon INSTANCE = new FinalizerWatchdogDaemon();
private boolean needToWork = true; // Only accessed in synchronized methods.
private long finalizerTimeoutNs = 0; // Lazily initialized.
FinalizerWatchdogDaemon()
super("FinalizerWatchdogDaemon");
@Override public void runInternal()
while (isRunning())
if (!sleepUntilNeeded()) (1)
// We have been interrupted, need to see if this daemon has been stopped.
continue;
final Object finalizing = waitForFinalization();(2)
if (finalizing != null && !VMDebug.isDebuggerConnected())
finalizerTimedOut(finalizing);(3)
break;
- Step1 GC前的检查
/**
* Notify daemon that it's OK to sleep until notified that something is ready to be
* finalized.
*/
private synchronized void goToSleep()
needToWork = false;
/**
* Notify daemon that there is something ready to be finalized.
*/
private synchronized void wakeUp()
needToWork = true;
notify();
开启回收之前,needToWork
会被置为true,此时sleepUntilNeeded
返回的是true,所以线程不会wait
@Override public void runInternal()
// This loop may be performance critical, since we need to keep up with mutator
// generation of finalizable objects.
// We minimize the amount of work we do per finalizable object. For example, we avoid
// reading the current time here, since that involves a kernel call per object. We
// limit fast path communication with FinalizerWatchDogDaemon to what's unavoidable: A
// non-volatile store to communicate the current finalizable object, e.g. for
// reporting, and a release store (lazySet) to a counter.
// We do stop the FinalizerWatchDogDaemon if we have nothing to do for a
// potentially extended period. This prevents the device from waking up regularly
// during idle times.
// Local copy of progressCounter; saves a fence per increment on ARM and MIPS.
int localProgressCounter = progressCounter.get();
while (isRunning())
try
// Use non-blocking poll to avoid FinalizerWatchdogDaemon communication
// when busy.
FinalizerReference<?> finalizingReference = (FinalizerReference<?>)queue.poll();
if (finalizingReference != null)
finalizingObject = finalizingReference.get();
progressCounter.lazySet(++localProgressCounter);
else
finalizingObject = null;
progressCounter.lazySet(++localProgressCounter);
// Slow path; block.
FinalizerWatchdogDaemon.INSTANCE.goToSleep();
finalizingReference = (FinalizerReference<?>)queue.remove();
finalizingObject = finalizingReference.get();
progressCounter.set(++localProgressCounter);
//回收之前先唤醒看门狗线程
FinalizerWatchdogDaemon.INSTANCE.wakeUp();
//开始回收的流程
doFinalize(finalizingReference);
catch (InterruptedException ignored)
catch (OutOfMemoryError ignored)
如果此时线程处于wait
,被中断了或者有OOME
发生时,这个时候回到开头判断下isRunning()
,也就是看下回收对象这个线程是否为空,如果该线程为空的话,这个循环体就没有必要再继续执行下去了。
/**
* Wait until something is ready to be finalized.
* Return false if we have been interrupted
* See also http://code.google.com/p/android/issues/detail?id=22778.
*/
private synchronized boolean sleepUntilNeeded()
while (!needToWork)
try
wait();
catch (InterruptedException e)
// Daemon.stop may have interrupted us.
return false;
catch (OutOfMemoryError e)
return false;
return true;
- Step2 等待GC完成
这一步是等待回收结束的过程,这个睡眠过程中如果被中断,说明在这个周期内完成了析构,直接返回null
/**
* Return an object that took too long to finalize or return null.
* Wait VMRuntime.getFinalizerTimeoutMs. If the FinalizerDaemon took essentially the
* whole time processing a single reference, return that reference. Otherwise return
* null. Only called from a single thread.
*/
private Object waitForFinalization()
if (finalizerTimeoutNs == 0)
finalizerTimeoutNs =
NANOS_PER_MILLI * VMRuntime.getRuntime().getFinalizerTimeoutMs();
// Temporary app backward compatibility. Remove eventually.
MAX_FINALIZE_NANOS = finalizerTimeoutNs;
long startCount = FinalizerDaemon.INSTANCE.progressCounter.get();
// Avoid remembering object being finalized, so as not to keep it alive.
//如果回收对象没有超时的话,这里会返回null
if (!sleepForNanos(finalizerTimeoutNs))
// Don't report possibly spurious timeout if we are interrupted.
return null;
if (getNeedToWork() && FinalizerDaemon.INSTANCE.progressCounter.get() == startCount)
// We assume that only remove() and doFinalize() may take time comparable to
// the finalizer timeout.
// We observed neither the effect of the gotoSleep() nor the increment preceding a
// later wakeUp. Any remove() call by the FinalizerDaemon during our sleep
// interval must have been followed by a wakeUp call before we checked needToWork.
// But then we would have seen the counter increment. Thus there cannot have
// been such a remove() call.
// The FinalizerDaemon must not have progressed (from either the beginning or the
// last progressCounter increment) to either the next increment or gotoSleep()
// call. Thus we must have taken essentially the whole finalizerTimeoutMs in a
// single doFinalize() call. Thus it's OK to time out. finalizingObject was set
// just before the counter increment, which preceded the doFinalize call. Thus we
// are guaranteed to get the correct finalizing value below, unless doFinalize()
// just finished as we were timing out, in which case we may get null or a later
// one. In this last case, we are very likely to discard it below.
Object finalizing = FinalizerDaemon.INSTANCE.finalizingObject;
sleepForNanos(500 * NANOS_PER_MILLI);
// Recheck to make it even less likely we report the wrong finalizing object in
// the case which a very slow finalization just finished as we were timing out.
if (getNeedToWork()
&& FinalizerDaemon.INSTANCE.progressCounter.get() == startCount)
return finalizing;
return null;
sleepForNanos
对应的函数很简单,如果在超时时间内完成GC,就会计算传进来的超时阈值减去当前已经睡眠的时间,如果这个差值小于0,说明睡眠的时间超过了阈值。
/**
* Sleep for the given number of nanoseconds, or slightly longer.
* @return false if we were interrupted.
*/
private boolean sleepForNanos(long durationNanos)
// It's important to base this on nanoTime(), not currentTimeMillis(), since
// the former stops counting when the processor isn't running.
long startNanos = System.nanoTime();
while (true)
long elapsedNanos = System.nanoTime() - startNanos;
long sleepNanos = durationNanos - elapsedNanos;
if (sleepNanos <= 0)
return true;
// Ensure the nano time is always rounded up to the next whole millisecond,
// ensuring the delay is >= the requested delay.
long sleepMillis = (sleepNanos + NANOS_PER_MILLI - 1) / NANOS_PER_MILLI;
try
Thread.sleep(sleepMillis);
catch (InterruptedException e)
if (!isRunning())
return false;
catch (OutOfMemoryError ignored)
if (!isRunning())
return false;
- Step3 GC处理超时
如果第二步中的超时时间内析构没有完成,则返回析构的对象,触发finalizerTimedOut
。
到了这一步是最不希望看到的结局,此时系统会弹出应用停止运行的报错框。
注意这个时候并没有立刻杀死进程,杀死进程的选择权交给了用户,即通过弹窗展示给用户,但对于用户来说会一头雾水。
分析结论
这种问题其实还是比较常见的,特别是低内存的机器上。RootCasue
就是对象回收超时了,一般是由于队列中等待FinalizerDaemon
线程回收的对象太多导致,或者此时系统资源异常紧张比如CPU负载过高或者低内存环境下。
场景实测
- 模拟还原现场
通过模拟GC
时耗时操作,应用退到后台后10s会弹出报错框,堆栈如下。
验证了超时时间的确是10s,同时也验证了GC时耗时的操作确实会可能触发这个现象
- 对比机情况
在手头的小米note9 pro
上进行场景模拟测试,模拟GC耗时100s的情况。
在小米的机器上,到了默认的10s后并不会有弹窗,说明小米肯定修改了超时时间,第一次是等待了全部的100s后竟然正常回收,说明超时时间设置的比较大。紧接着下一次在达到了近80s时,进程收到signal 9
直接被kill了,此时再点击应用是冷启动。
小米修改了超时阈值(超过100s),通过直接sig 9杀掉了进程,没有报错弹窗,所以用户无感知
- 测试机情况
同样的在我们的机器上模拟GC耗时100s的情况
退出应用到后台,此时系统触发GC回收,达到十秒钟时,界面上直接弹出停止运行的报错框,此时只有点击了关闭应用,才会去kill进程
- 修改策略
在GC规定的超时时间内如果没有完成析构,直接sig 9
给对应进程
以上是关于突然弹出王者荣耀停止运行,GC超时导致的后台应用崩溃问题分析的主要内容,如果未能解决你的问题,请参考以下文章