Scheduling restart of crashed service解决方案与源码分析
Posted 爱炒饭
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Scheduling restart of crashed service解决方案与源码分析相关的知识,希望对你有一定的参考价值。
测试发现一个bug,service中某个方法由于空指针导致程序挂掉,接着触发程序的保活机制触发程序重启,但是这个异常service先启动访问未初始化资源导致程序连续循环重启。
下面代码模拟了service子线程显示toast引起程序挂掉
public class MyService extends Service {
@Override
public int onStartCommand(Intent intent, int flags, int startId) {
LogUtil.d("onStartCommand,flags="+super.onStartCommand(intent, flags, startId)+",START_NOT_STICKY="+START_NOT_STICKY);
LogUtil.d("onStartCommand,super.onStartCommand(intent, flags, startId)="+super.onStartCommand(intent, flags, startId)+",super.onStartCommand(intent, Service.START_NOT_STICKY, startId)="+super.onStartCommand(intent, Service.START_NOT_STICKY, startId));
// return super.onStartCommand(intent, flags, startId);
//super.onStartCommand(intent, Service.START_NOT_STICKY, startId);
return START_NOT_STICKY;
}
@Override
public void onCreate() {
LogUtil.d("onCreate");
super.onCreate();
new Thread(new Runnable() {
@Override
public void run() {
try {
Thread.sleep(10_000);
} catch (InterruptedException e) {
e.printStackTrace();
}
LogUtil.d("run crash before");
Toast.makeText(MyService.this,"演示子线程更新UI发生crash",Toast.LENGTH_SHORT).show();
LogUtil.d("run crash after");
}
}).start();
}
public MyService() {
LogUtil.d("MyService");
}
@Override
public void onDestroy() {
LogUtil.d("onDestroy");
super.onDestroy();
}
}
log中打印一个信息很关键。
07-17 09:52:57.674 1022 1037 I ActivityManager: Process com.shan.mvvm (pid 13678) has died: prcp SVC
07-17 09:52:57.675 1022 1037 W ActivityManager: Scheduling restart of crashed service com.shan.mvvm/.MyService in 1000ms for start-requested
一、解决方案
系统按照程序启动时要求重新启动了service。这就要提到Service的onStartCommand方法中涉及到的启动模式了。
/**
* Constant to return from {@link #onStartCommand}: compatibility
* version of {@link #START_STICKY} that does not guarantee that
* {@link #onStartCommand} will be called again after being killed.
*/
public static final int START_STICKY_COMPATIBILITY = 0;
/**
* Constant to return from {@link #onStartCommand}: if this service's
* process is killed while it is started (after returning from
* {@link #onStartCommand}), then leave it in the started state but
* don't retain this delivered intent. Later the system will try to
* re-create the service. Because it is in the started state, it will
* guarantee to call {@link #onStartCommand} after creating the new
* service instance; if there are not any pending start commands to be
* delivered to the service, it will be called with a null intent
* object, so you must take care to check for this.
*
* <p>This mode makes sense for things that will be explicitly started
* and stopped to run for arbitrary periods of time, such as a service
* performing background music playback.
*/
public static final int START_STICKY = 1;
/**
* Constant to return from {@link #onStartCommand}: if this service's
* process is killed while it is started (after returning from
* {@link #onStartCommand}), and there are no new start intents to
* deliver to it, then take the service out of the started state and
* don't recreate until a future explicit call to
* {@link Context#startService Context.startService(Intent)}. The
* service will not receive a {@link #onStartCommand(Intent, int, int)}
* call with a null Intent because it will not be restarted if there
* are no pending Intents to deliver.
*
* <p>This mode makes sense for things that want to do some work as a
* result of being started, but can be stopped when under memory pressure
* and will explicit start themselves again later to do more work. An
* example of such a service would be one that polls for data from
* a server: it could schedule an alarm to poll every N minutes by having
* the alarm start its service. When its {@link #onStartCommand} is
* called from the alarm, it schedules a new alarm for N minutes later,
* and spawns a thread to do its networking. If its process is killed
* while doing that check, the service will not be restarted until the
* alarm goes off.
*/
public static final int START_NOT_STICKY = 2;
/**
* Constant to return from {@link #onStartCommand}: if this service's
* process is killed while it is started (after returning from
* {@link #onStartCommand}), then it will be scheduled for a restart
* and the last delivered Intent re-delivered to it again via
* {@link #onStartCommand}. This Intent will remain scheduled for
* redelivery until the service calls {@link #stopSelf(int)} with the
* start ID provided to {@link #onStartCommand}. The
* service will not receive a {@link #onStartCommand(Intent, int, int)}
* call with a null Intent because it will only be restarted if
* it is not finished processing all Intents sent to it (and any such
* pending events will be delivered at the point of restart).
*/
public static final int START_REDELIVER_INTENT = 3;
一共四种模式,
START_STICKY (1)模式在服务死掉后被系统自动重启拉活,但是不会保留之前的intent参数;START_STICKY_COMPATIBILITY (0)是START_STICKY 的兼容模式,不保证服务死掉后被系统自动拉活;
START_NOT_STICKY(2)服务死掉系统不会自动去拉活;
START_REDELIVER_INTENT(3)模式在服务死掉后被系统自动重启拉活,并且保留之前的intent参数。
知道了这四种参数含义,我就将START_NOT_STICKY传入到onStartCommand方法中,但是还是会重启,怎么回事呢?排查发现我虽然将START_NOT_STICKY传入到onStartCommand方法中了,但是姿势不对,第一次的错误传参是这样的:
public int onStartCommand(Intent intent, int flags, int startId) {
super.onStartCommand(intent, Service.START_NOT_STICKY, startId);
}
大佬们应该知道错误出在什么地方了,实际上super.onStartCommand(intent, Service.START_NOT_STICKY, startId)返回的值还是START_STICKY ,打印log可以看到,实际上直接return START_NOT_STICK即可。
正确的做法是这样子的:
public int onStartCommand(Intent intent, int flags, int startId) {
LogUtil.d("onStartCommand,flags="+super.onStartCommand(intent, flags, startId)+",START_NOT_STICKY="+START_NOT_STICKY);
LogUtil.d("onStartCommand,super.onStartCommand(intent, flags, startId)="+super.onStartCommand(intent, flags, startId)+",super.onStartCommand(intent, Service.START_NOT_STICKY, startId)="+super.onStartCommand(intent, Service.START_NOT_STICKY, startId));
// return super.onStartCommand(intent, flags, startId);
//super.onStartCommand(intent, Service.START_NOT_STICKY, startId);
return START_NOT_STICKY;
}
log打印如下:
onStartCommand,flags=1,START_NOT_STICKY=2
onStartCommand,super.onStartCommand(intent, flags, startId)=1,super.onStartCommand(intent, Service.START_NOT_STICKY, startId)=1
二、源码分析
在 startService过程 一文中提到启动服务会走到realStartServiceLocked方法,在该方法中通过sendServiceArgsLocked方法设置onStartCommand中的参数。
2.1 ActiveServices.realStartServiceLocked
//ActiveServices.java
private final void realStartServiceLocked(ServiceRecord r,
ProcessRecord app, boolean execInFg) throws RemoteException {
……
app.thread.scheduleCreateService(r, r.serviceInfo,
mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),
app.repProcState); //创建service
……
sendServiceArgsLocked(r, execInFg, true); //添加service的启动参数
……
}
2.2 ActiveServices.sendServiceArgsLocked
sendServiceArgsLocked方法会调用ActivityThread的scheduleServiceArgs方法。
//ActiveServices.java
private final void sendServiceArgsLocked(ServiceRecord r, boolean execInFg,
boolean oomAdjusted) throws TransactionTooLargeException {
……
r.app.thread.scheduleServiceArgs(r, slice);
……
}
2.3 ActivityThread.scheduleServiceArgs
scheduleServiceArgs位于ActivityThread.java内部类ApplicationThread中,scheduleServiceArgs方法获取到service的参数集合,遍历其中的参数,通过hander发送消息H.SERVICE_ARGS。
//ActivityThread$ApplicationThread
public final void scheduleServiceArgs(IBinder token, ParceledListSlice args) {
List<ServiceStartArgs> list = args.getList();
for (int i = 0; i < list.size(); i++) {
ServiceStartArgs ssa = list.get(i);
ServiceArgsData s = new ServiceArgsData();
s.token = token;
s.taskRemoved = ssa.taskRemoved;
s.startId = ssa.startId;
s.flags = ssa.flags;
s.args = ssa.args;
sendMessage(H.SERVICE_ARGS, s);
}
}
2.4 H.handleMessage
H是ActivityThread.java内部类,它的父类是Handler, H.SERVICE_ARGS消息在H的handleMessage方法中中被处理,接着调用handleServiceArgs方法。
//ActivityThread&H
public void handleMessage(Message msg) {
……
case SERVICE_ARGS:
if (Trace.isTagEnabled(Trace.TRACE_TAG_ACTIVITY_MANAGER)) {
Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER,
("serviceStart: " + String.valueOf(msg.obj)));
}
handleServiceArgs((ServiceArgsData)msg.obj);
Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER);
break;
……
}
2.5 ActivityThread.handleServiceArgs
ActivityThread.java的handleServiceArgs方法首先将service中onStartCommand方法的返回值赋值给int型局部变量res,然后将res作为参数传入到AMS的serviceDoneExecuting方法中。
//ActivityThread.java
private void handleServiceArgs(ServiceArgsData data) {
Service s = mServices.get(data.token);
if (s != null) {
try {
if (data.args != null) {
data.args.setExtrasClassLoader(s.getClassLoader());
data.args.prepareToEnterProcess();
}
int res;
if (!data.taskRemoved) {
//这里取到service中onStartCommand方法的返回值
res = s.onStartCommand(data.args, data.flags, data.startId);
} else {
s.onTaskRemoved(data.args);
res = Service.START_TASK_REMOVED_COMPLETE;
}
QueuedWork.waitToFinish();
try {
//onStartCommand参数传输到AMS中
ActivityManager.getService().serviceDoneExecuting(
data.token, SERVICE_DONE_EXECUTING_START, data.startId, res); //将返回值传入到AMS中
} catch (RemoteException e) {
throw e.rethrowFromSystemServer();
}
} catch (Exception e) {
if (!mInstrumentation.onException(s, e)) {
throw new RuntimeException(
"Unable to start service " + s
+ " with " + data.args + ": " + e.toString(), e);
}
}
}
}
2.5 ActivityManagerService.serviceDoneExecuting
AMS的serviceDoneExecuting方法调用了ActiveServices.java的serviceDoneExecutingLocked方法。
//ActivityManagerService.java
public void serviceDoneExecuting(IBinder token, int type, int startId, int res) {
synchronized(this) {
if (!(token instanceof ServiceRecord)) {
Slog.e(TAG, "serviceDoneExecuting: Invalid service token=" + token);
throw new IllegalArgumentException("Invalid service token");
}
mServices.serviceDoneExecutingLocked((ServiceRecord)token, type, startId, res);
}
}
2.6 ActiveServices.serviceDoneExecutingLocked
ActiveServices.java的serviceDoneExecutingLocked方法对onStartCommand不同类型返回值进行了处理,这里重点关注r.stopIfKilled变量,可以看出START_STICKY类型的stopIfKilled为false,代表被杀重启;START_NOT_STICKY类型stopIfKilled为true,代表被杀就停止。然后将ServiceRecord对象传入到serviceDoneExecutingLocked方法中。
//ActiveServices.java
void serviceDoneExecutingLocked(ServiceRecord r, int type, int startId, int res) {
boolean inDestroying = mDestroyingServices.contains(r);
if (r != null) {
//启动阶段就分析ActivityThread.SERVICE_DONE_EXECUTING_START类型
if (type == ActivityThread.SERVICE_DONE_EXECUTING_START) {
// This is a call from a service start... take care of
// book-keeping.
r.callStart = true;
switch (res) {
case Service.START_STICKY_COMPATIBILITY:
case Service.START_STICKY: {
// We are done with the associated start arguments.
r.findDeliveredStart(startId, false, true);
// Don't stop if killed.
//START_STICKY类型的stopIfKilled为false,代表被杀重启
r.stopIfKilled = false;
break;
}
case Service.START_NOT_STICKY: {
// We are done with the associated start arguments.
r.findDeliveredStart(startId, false, true);
if (r.getLastStartId() == startId) {
// There is no more work, and this service
// doesn't want to hang around if killed.
//START_NOT_STICKY类型stopIfKilled为true,代表被杀就停止
r.stopIfKilled = true;
}
break;
}
case Service.START_REDELIVER_INTENT: {
// We'll keep this item until they explicitly
// call stop for it, but keep track of the fact
// that it was delivered.
ServiceRecord.StartItem si = r.findDeliveredStart(startId, false, false);
if (si != null) {
si.deliveryCount = 0;
si.doneExecutingCount++;
// Don't stop if killed.
r.stopIfKilled = true;
}
break;
}
case Service.START_TASK_REMOVED_COMPLETE: {
// Special processing for onTaskRemoved(). Don't
// impact normal onStartCommand() processing.
r.findDeliveredStart(startId, true, true);
break;
}
default:
throw new IllegalArgumentException(
"Unknown service start result: " + res);
}
if (res == Service.START_STICKY_COMPATIBILITY) {
r.callStart = false;
}
}
……
}
2.7 ActivityManagerService.appDiedLocked
可以搜索一下哪里使用了r.stopIfKilled变量,比如ServiceRecord.java的canStopIfKilled方法就有用到,从方法名也可以看出应该和程序重启有关。在上面提到服务异常重启日志中的第一行Process com.shan.mvvm (pid 13678) has died: prcp SVC 实际上是AMS的appDiedLocked方法中打印的,进一步看下handleAppDiedLocked函数,并且第三个参数allowRestart为true表示允许重启。
//ActivityManagerService.java
final void appDiedLocked(ProcessRecord app, int pid, IApplicationThread thread,
boolean fromBinderDied, String reason) {
……
// Clean up already done if the process has been re-started.
if (app.pid == pid && app.thread != null &&
app.thread.asBinder() == thread.asBinder()) {
boolean doLowMem = app.getActiveInstrumentation() == null;
boolean doOomAdj = doLowMem;
if (!app.killedByAm) {
//打印app死掉的信息
reportUidInfoMessageLocked(TAG,
"Process " + app.processName + " (pid " + pid + ") has died: "
+ ProcessList.makeOomAdjString(app.setAdj, true) + " "
+ ProcessList.makeProcStateString(app.setProcState), app.info.uid);
mAllowLowerMemLevel = true;
} else {
// Note that we always want to do oom adj to update our state with the
// new number of procs.
mAllowLowerMemLevel = false;
doLowMem = false;
}
EventLogTags.writeAmProcDied(app.userId, app.pid, app.processName, app.setAdj,
app.setProcState);
if (DEBUG_CLEANUP) Slog.v(TAG_CLEANUP,
"Dying app: " + app + ", pid: " + pid + ", thread: " + thread.asBinder());
//app死亡处理
handleAppDiedLocked(app, false, true);
……
}
2.8 ActivityManagerService.handleAppDiedLocked
handleAppDiedLocked方法调用cleanUpApplicationRecordLocked去清理应用记录,此时allowRestart仍然是true。
//ActivityManagerService.java
final void handleAppDiedLocked(ProcessRecord app,
boolean restarting, boolean allowRestart) {
int pid = app.pid;
boolean kept = cleanUpApplicationRecordLocked(app, restarting, allowRestart, -1,
false /*replacingPid*/);
……
}
2.9 ActivityManagerService.cleanUpApplicationRecordLocked
cleanUpApplicationRecordLocked方法就有调用ActiveServices的killServicesLocked方法。
//ActivityManagerService.java
final boolean cleanUpApplicationRecordLocked(ProcessRecord app,
boolean restarting, boolean allowRestart, int index, boolean replacingPid) {
……
mServices.killServicesLocked(app, allowRestart);
……
}
2.10 ActiveServices.killServicesLocked
ActiveServices.java的killServicesLocked方法会统计服务crash的次数,由于此时allowRestart 传入的参数为true,当服务次数小于16次是代码会走到else里面调用scheduleServiceRestartLocked方法。
//ActiveServices.java
final void killServicesLocked(ProcessRecord app, boolean allowRestart) {
// Report disconnected services.
……
// Any services running in the application may need to be placed
// back in the pending list.
//allowRestart为true,BOUND_SERVICE_MAX_CRASH_RETRY为16
if (allowRestart && sr.crashCount >= mAm.mConstants.BOUND_SERVICE_MAX_CRASH_RETRY
&& (sr.serviceInfo.applicationInfo.flags
&ApplicationInfo.FLAG_PERSISTENT) == 0) {
Slog.w(TAG, "Service crashed " + sr.crashCount
+ " times, stopping: " + sr);
EventLog.writeEvent(EventLogTags.AM_SERVICE_CRASHED_TOO_MUCH,
sr.userId, sr.crashCount, sr.shortInstanceName, app.pid);
bringDownServiceLocked(sr);
} else if (!allowRestart
|| !mAm.mUserController.isUserRunning(sr.userId, 0)) {
bringDownServiceLocked(sr);
} else {
//这里尝试重启service
final boolean scheduled = scheduleServiceRestartLocked(sr, true /* allowCancel */);
……
}
}
……
}
2.11 ActiveServices.scheduleServiceRestartLocked
scheduleServiceRestartLocked方法就会用到canStopIfKilled方法,上文中提到过START_STICKY类型canStopIfKilled方法为false,START_NOT_STICKY则为true,如果START_STICKY类型就会继续下面的service重启逻辑并且打印Scheduling restart of crashed service日志。
//Active以上是关于Scheduling restart of crashed service解决方案与源码分析的主要内容,如果未能解决你的问题,请参考以下文章
error : qemuMonitorIO:697 : internal error: End of file from qemu monitor