Scheduling restart of crashed service解决方案与源码分析
Posted 爱炒饭
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Scheduling restart of crashed service解决方案与源码分析相关的知识,希望对你有一定的参考价值。
测试发现一个bug,service中某个方法由于空指针导致程序挂掉,接着触发程序的保活机制触发程序重启,但是这个异常service先启动访问未初始化资源导致程序连续循环重启。
下面代码模拟了service子线程显示toast引起程序挂掉
public class MyService extends Service
@Override
public int onStartCommand(Intent intent, int flags, int startId)
LogUtil.d("onStartCommand,flags="+super.onStartCommand(intent, flags, startId)+",START_NOT_STICKY="+START_NOT_STICKY);
LogUtil.d("onStartCommand,super.onStartCommand(intent, flags, startId)="+super.onStartCommand(intent, flags, startId)+",super.onStartCommand(intent, Service.START_NOT_STICKY, startId)="+super.onStartCommand(intent, Service.START_NOT_STICKY, startId));
// return super.onStartCommand(intent, flags, startId);
//super.onStartCommand(intent, Service.START_NOT_STICKY, startId);
return START_NOT_STICKY;
@Override
public void onCreate()
LogUtil.d("onCreate");
super.onCreate();
new Thread(new Runnable()
@Override
public void run()
try
Thread.sleep(10_000);
catch (InterruptedException e)
e.printStackTrace();
LogUtil.d("run crash before");
Toast.makeText(MyService.this,"演示子线程更新UI发生crash",Toast.LENGTH_SHORT).show();
LogUtil.d("run crash after");
).start();
public MyService()
LogUtil.d("MyService");
@Override
public void onDestroy()
LogUtil.d("onDestroy");
super.onDestroy();
log中打印一个信息很关键。
07-17 09:52:57.674 1022 1037 I ActivityManager: Process com.shan.mvvm (pid 13678) has died: prcp SVC
07-17 09:52:57.675 1022 1037 W ActivityManager: Scheduling restart of crashed service com.shan.mvvm/.MyService in 1000ms for start-requested
一、解决方案
系统按照程序启动时要求重新启动了service。这就要提到Service的onStartCommand方法中涉及到的启动模式了。
/**
* Constant to return from @link #onStartCommand: compatibility
* version of @link #START_STICKY that does not guarantee that
* @link #onStartCommand will be called again after being killed.
*/
public static final int START_STICKY_COMPATIBILITY = 0;
/**
* Constant to return from @link #onStartCommand: if this service's
* process is killed while it is started (after returning from
* @link #onStartCommand), then leave it in the started state but
* don't retain this delivered intent. Later the system will try to
* re-create the service. Because it is in the started state, it will
* guarantee to call @link #onStartCommand after creating the new
* service instance; if there are not any pending start commands to be
* delivered to the service, it will be called with a null intent
* object, so you must take care to check for this.
*
* <p>This mode makes sense for things that will be explicitly started
* and stopped to run for arbitrary periods of time, such as a service
* performing background music playback.
*/
public static final int START_STICKY = 1;
/**
* Constant to return from @link #onStartCommand: if this service's
* process is killed while it is started (after returning from
* @link #onStartCommand), and there are no new start intents to
* deliver to it, then take the service out of the started state and
* don't recreate until a future explicit call to
* @link Context#startService Context.startService(Intent). The
* service will not receive a @link #onStartCommand(Intent, int, int)
* call with a null Intent because it will not be restarted if there
* are no pending Intents to deliver.
*
* <p>This mode makes sense for things that want to do some work as a
* result of being started, but can be stopped when under memory pressure
* and will explicit start themselves again later to do more work. An
* example of such a service would be one that polls for data from
* a server: it could schedule an alarm to poll every N minutes by having
* the alarm start its service. When its @link #onStartCommand is
* called from the alarm, it schedules a new alarm for N minutes later,
* and spawns a thread to do its networking. If its process is killed
* while doing that check, the service will not be restarted until the
* alarm goes off.
*/
public static final int START_NOT_STICKY = 2;
/**
* Constant to return from @link #onStartCommand: if this service's
* process is killed while it is started (after returning from
* @link #onStartCommand), then it will be scheduled for a restart
* and the last delivered Intent re-delivered to it again via
* @link #onStartCommand. This Intent will remain scheduled for
* redelivery until the service calls @link #stopSelf(int) with the
* start ID provided to @link #onStartCommand. The
* service will not receive a @link #onStartCommand(Intent, int, int)
* call with a null Intent because it will only be restarted if
* it is not finished processing all Intents sent to it (and any such
* pending events will be delivered at the point of restart).
*/
public static final int START_REDELIVER_INTENT = 3;
一共四种模式,
START_STICKY (1)模式在服务死掉后被系统自动重启拉活,但是不会保留之前的intent参数;START_STICKY_COMPATIBILITY (0)是START_STICKY 的兼容模式,不保证服务死掉后被系统自动拉活;
START_NOT_STICKY(2)服务死掉系统不会自动去拉活;
START_REDELIVER_INTENT(3)模式在服务死掉后被系统自动重启拉活,并且保留之前的intent参数。
知道了这四种参数含义,我就将START_NOT_STICKY传入到onStartCommand方法中,但是还是会重启,怎么回事呢?排查发现我虽然将START_NOT_STICKY传入到onStartCommand方法中了,但是姿势不对,第一次的错误传参是这样的:
public int onStartCommand(Intent intent, int flags, int startId)
super.onStartCommand(intent, Service.START_NOT_STICKY, startId);
大佬们应该知道错误出在什么地方了,实际上super.onStartCommand(intent, Service.START_NOT_STICKY, startId)返回的值还是START_STICKY ,打印log可以看到,实际上直接return START_NOT_STICK即可。
正确的做法是这样子的:
public int onStartCommand(Intent intent, int flags, int startId)
LogUtil.d("onStartCommand,flags="+super.onStartCommand(intent, flags, startId)+",START_NOT_STICKY="+START_NOT_STICKY);
LogUtil.d("onStartCommand,super.onStartCommand(intent, flags, startId)="+super.onStartCommand(intent, flags, startId)+",super.onStartCommand(intent, Service.START_NOT_STICKY, startId)="+super.onStartCommand(intent, Service.START_NOT_STICKY, startId));
// return super.onStartCommand(intent, flags, startId);
//super.onStartCommand(intent, Service.START_NOT_STICKY, startId);
return START_NOT_STICKY;
log打印如下:
onStartCommand,flags=1,START_NOT_STICKY=2
onStartCommand,super.onStartCommand(intent, flags, startId)=1,super.onStartCommand(intent, Service.START_NOT_STICKY, startId)=1
二、源码分析
在 startService过程 一文中提到启动服务会走到realStartServiceLocked方法,在该方法中通过sendServiceArgsLocked方法设置onStartCommand中的参数。
2.1 ActiveServices.realStartServiceLocked
//ActiveServices.java
private final void realStartServiceLocked(ServiceRecord r,
ProcessRecord app, boolean execInFg) throws RemoteException
……
app.thread.scheduleCreateService(r, r.serviceInfo,
mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),
app.repProcState); //创建service
……
sendServiceArgsLocked(r, execInFg, true); //添加service的启动参数
……
2.2 ActiveServices.sendServiceArgsLocked
sendServiceArgsLocked方法会调用ActivityThread的scheduleServiceArgs方法。
//ActiveServices.java
private final void sendServiceArgsLocked(ServiceRecord r, boolean execInFg,
boolean oomAdjusted) throws TransactionTooLargeException
……
r.app.thread.scheduleServiceArgs(r, slice);
……
2.3 ActivityThread.scheduleServiceArgs
scheduleServiceArgs位于ActivityThread.java内部类ApplicationThread中,scheduleServiceArgs方法获取到service的参数集合,遍历其中的参数,通过hander发送消息H.SERVICE_ARGS。
//ActivityThread$ApplicationThread
public final void scheduleServiceArgs(IBinder token, ParceledListSlice args)
List<ServiceStartArgs> list = args.getList();
for (int i = 0; i < list.size(); i++)
ServiceStartArgs ssa = list.get(i);
ServiceArgsData s = new ServiceArgsData();
s.token = token;
s.taskRemoved = ssa.taskRemoved;
s.startId = ssa.startId;
s.flags = ssa.flags;
s.args = ssa.args;
sendMessage(H.SERVICE_ARGS, s);
2.4 H.handleMessage
H是ActivityThread.java内部类,它的父类是Handler, H.SERVICE_ARGS消息在H的handleMessage方法中中被处理,接着调用handleServiceArgs方法。
//ActivityThread&H
public void handleMessage(Message msg)
……
case SERVICE_ARGS:
if (Trace.isTagEnabled(Trace.TRACE_TAG_ACTIVITY_MANAGER))
Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER,
("serviceStart: " + String.valueOf(msg.obj)));
handleServiceArgs((ServiceArgsData)msg.obj);
Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER);
break;
……
2.5 ActivityThread.handleServiceArgs
ActivityThread.java的handleServiceArgs方法首先将service中onStartCommand方法的返回值赋值给int型局部变量res,然后将res作为参数传入到AMS的serviceDoneExecuting方法中。
//ActivityThread.java
private void handleServiceArgs(ServiceArgsData data)
Service s = mServices.get(data.token);
if (s != null)
try
if (data.args != null)
data.args.setExtrasClassLoader(s.getClassLoader());
data.args.prepareToEnterProcess();
int res;
if (!data.taskRemoved)
//这里取到service中onStartCommand方法的返回值
res = s.onStartCommand(data.args, data.flags, data.startId);
else
s.onTaskRemoved(data.args);
res = Service.START_TASK_REMOVED_COMPLETE;
QueuedWork.waitToFinish();
try
//onStartCommand参数传输到AMS中
ActivityManager.getService().serviceDoneExecuting(
data.token, SERVICE_DONE_EXECUTING_START, data.startId, res); //将返回值传入到AMS中
catch (RemoteException e)
throw e.rethrowFromSystemServer();
catch (Exception e)
if (!mInstrumentation.onException(s, e))
throw new RuntimeException(
"Unable to start service " + s
+ " with " + data.args + ": " + e.toString(), e);
2.5 ActivityManagerService.serviceDoneExecuting
AMS的serviceDoneExecuting方法调用了ActiveServices.java的serviceDoneExecutingLocked方法。
//ActivityManagerService.java
public void serviceDoneExecuting(IBinder token, int type, int startId, int res)
synchronized(this)
if (!(token instanceof ServiceRecord))
Slog.e(TAG, "serviceDoneExecuting: Invalid service token=" + token);
throw new IllegalArgumentException("Invalid service token");
mServices.serviceDoneExecutingLocked((ServiceRecord)token, type, startId, res);
2.6 ActiveServices.serviceDoneExecutingLocked
ActiveServices.java的serviceDoneExecutingLocked方法对onStartCommand不同类型返回值进行了处理,这里重点关注r.stopIfKilled变量,可以看出START_STICKY类型的stopIfKilled为false,代表被杀重启;START_NOT_STICKY类型stopIfKilled为true,代表被杀就停止。然后将ServiceRecord对象传入到serviceDoneExecutingLocked方法中。
//ActiveServices.java
void serviceDoneExecutingLocked(ServiceRecord r, int type, int startId, int res)
boolean inDestroying = mDestroyingServices.contains(r);
if (r != null)
//启动阶段就分析ActivityThread.SERVICE_DONE_EXECUTING_START类型
if (type == ActivityThread.SERVICE_DONE_EXECUTING_START)
// This is a call from a service start... take care of
// book-keeping.
r.callStart = true;
switch (res)
case Service.START_STICKY_COMPATIBILITY:
case Service.START_STICKY:
// We are done with the associated start arguments.
r.findDeliveredStart(startId, false, true);
// Don't stop if killed.
//START_STICKY类型的stopIfKilled为false,代表被杀重启
r.stopIfKilled = false;
break;
case Service.START_NOT_STICKY:
// We are done with the associated start arguments.
r.findDeliveredStart(startId, false, true);
if (r.getLastStartId() == startId)
// There is no more work, and this service
// doesn't want to hang around if killed.
//START_NOT_STICKY类型stopIfKilled为true,代表被杀就停止
r.stopIfKilled = true;
break;
case Service.START_REDELIVER_INTENT:
// We'll keep this item until they explicitly
// call stop for it, but keep track of the fact
// that it was delivered.
ServiceRecord.StartItem si = r.findDeliveredStart(startId, false, false);
if (si != null)
si.deliveryCount = 0;
si.doneExecutingCount++;
// Don't stop if killed.
r.stopIfKilled = true;
break;
case Service.START_TASK_REMOVED_COMPLETE:
// Special processing for onTaskRemoved(). Don't
// impact normal onStartCommand() processing.
r.findDeliveredStart(startId, true, true);
break;
default:
throw new IllegalArgumentException(
"Unknown service start result: " + res);
if (res == Service.START_STICKY_COMPATIBILITY)
r.callStart = false;
……
2.7 ActivityManagerService.appDiedLocked
可以搜索一下哪里使用了r.stopIfKilled变量,比如ServiceRecord.java的canStopIfKilled方法就有用到,从方法名也可以看出应该和程序重启有关。在上面提到服务异常重启日志中的第一行Process com.shan.mvvm (pid 13678) has died: prcp SVC 实际上是AMS的appDiedLocked方法中打印的,进一步看下handleAppDiedLocked函数,并且第三个参数allowRestart为true表示允许重启。
//ActivityManagerService.java
final void appDiedLocked(ProcessRecord app, int pid, IApplicationThread thread,
boolean fromBinderDied, String reason)
……
// Clean up already done if the process has been re-started.
if (app.pid == pid && app.thread != null &&
app.thread.asBinder() == thread.asBinder())
boolean doLowMem = app.getActiveInstrumentation() == null;
boolean doOomAdj = doLowMem;
if (!app.killedByAm)
//打印app死掉的信息
reportUidInfoMessageLocked(TAG,
"Process " + app.processName + " (pid " + pid + ") has died: "
+ ProcessList.makeOomAdjString(app.setAdj, true) + " "
+ ProcessList.makeProcStateString(app.setProcState), app.info.uid);
mAllowLowerMemLevel = true;
else
// Note that we always want to do oom adj to update our state with the
// new number of procs.
mAllowLowerMemLevel = false;
doLowMem = false;
EventLogTags.writeAmProcDied(app.userId, app.pid, app.processName, app.setAdj,
app.setProcState);
if (DEBUG_CLEANUP) Slog.v(TAG_CLEANUP,
"Dying app: " + app + ", pid: " + pid + ", thread: " + thread.asBinder());
//app死亡处理
handleAppDiedLocked(app, false, true);
……
2.8 ActivityManagerService.handleAppDiedLocked
handleAppDiedLocked方法调用cleanUpApplicationRecordLocked去清理应用记录,此时allowRestart仍然是true。
//ActivityManagerService.java
final void handleAppDiedLocked(ProcessRecord app,
boolean restarting, boolean allowRestart)
int pid = app.pid;
boolean kept = cleanUpApplicationRecordLocked(app, restarting, allowRestart, -1,
false /*replacingPid*/);
……
2.9 ActivityManagerService.cleanUpApplicationRecordLocked
cleanUpApplicationRecordLocked方法就有调用ActiveServices的killServicesLocked方法。
//ActivityManagerService.java
final boolean cleanUpApplicationRecordLocked(ProcessRecord app,
boolean restarting, boolean allowRestart, int index, boolean replacingPid)
……
mServices.killServicesLocked(app, allowRestart);
……
2.10 ActiveServices.killServicesLocked
ActiveServices.java的killServicesLocked方法会统计服务crash的次数,由于此时allowRestart 传入的参数为true,当服务次数小于16次是代码会走到else里面调用scheduleServiceRestartLocked方法。
//ActiveServices.java
final void killServicesLocked(ProcessRecord app, boolean allowRestart)
// Report disconnected services.
……
// Any services running in the application may need to be placed
// back in the pending list.
//allowRestart为true,BOUND_SERVICE_MAX_CRASH_RETRY为16
if (allowRestart && sr.crashCount >= mAm.mConstants.BOUND_SERVICE_MAX_CRASH_RETRY
&& (sr.serviceInfo.applicationInfo.flags
&ApplicationInfo.FLAG_PERSISTENT) == 0)
Slog.w(TAG, "Service crashed " + sr.crashCount
+ " times, stopping: " + sr);
EventLog.writeEvent(EventLogTags.AM_SERVICE_CRASHED_TOO_MUCH,
sr.userId, sr.crashCount, sr.shortInstanceName, app.pid);
bringDownServiceLocked(sr);
else if (!allowRestart
|| !mAm.mUserController.isUserRunning(sr.userId, 0))
bringDownServiceLocked(sr);
else
//这里尝试重启service
final boolean scheduled = scheduleServiceRestartLocked(sr, true /* allowCancel */);
……
……
2.11 ActiveServices.scheduleServiceRestartLocked
scheduleServiceRestartLocked方法就会用到canStopIfKilled方法,上文中提到过START_STICKY类型canStopIfKilled方法为false,START_NOT_STICKY则为true,如果START_STICKY类型就会继续下面的service重启逻辑并且打印Scheduling restart of crashed service日志。
//ActiveServices.java
/** @return @code true if the restart is scheduled. */
private final boolean scheduleServiceRestartL以上是关于Scheduling restart of crashed service解决方案与源码分析的主要内容,如果未能解决你的问题,请参考以下文章
error : qemuMonitorIO:697 : internal error: End of file from qemu monitor