Android 12 Java trace 生成过程分析
Posted pecuyu
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Android 12 Java trace 生成过程分析相关的知识,希望对你有一定的参考价值。
概述
在分析一些android问题,比如ANR或Watchdog冻屏时,需要拿到相关进程的Java trace,然后分析是哪出了问题。但是这个Java trace是怎么生成的呢?在Android中的Java进程一般都是运行在art虚拟机之上的,而要拿到相关进程的Java trace,则需要它来完成相关dump操作。根据代码实现也能说明这一点,在art里面运行了一个 SignalCatcher 线程,专门用来处理这个逻辑。
SignalCatcher 线程启动后,会循环等待 SIGQUIT (信号3)的发生,当收到SIGQUIT后,会触发art去执行dump操作。当完成dump后,会通过socket连接到tombstoned,将trace输出到指定路径(通常在/data/anr下),接下来进行下一轮等待。因为art将SIGQUIT进行了拦截处理用来输出trace,因此并不会像linux进程一样默认会退出。
在Watchdog(4) Trace生成过程一文中,我们分析了发生Watchdog时抓Java trace的流程,从实现看是system_server发送了SIGQUIT到目标进程,然后触发SignalCatcher去进行dump,后者完成dump后会连接tombstoned去输出trace内容。本篇主要来讲SignalCatcher接收SIGQUIT,并产生Java trace生成的流程。
生成trace命令
在一些有现场机器的时候,我们可能会再抓一个Java trace,看看最新的状态。如下命令是比较常见的,不过通常需要root权限:
- debuggerd -j $pid
输出指定进程的 java traces,可以重定向输出
usage: debuggerd [-bj] PID
-b, --backtrace just a backtrace rather than a full tombstone
-j collect java traces
- kill -3 $pid
直接发送 SIGQUIT , 在 /data/anr/ 生成trace
示例:
$ adb root
restarting adbd as root
$ adb shell pidof system_server
516
$ adb shell debuggerd -j 516 > system_server_trace.txt
$ adb shell kill -3 516
$ adb shell ls /data/anr
trace_02
$ adb pull /data/anr/trace_02
trace 生成流程
从之前描述可知,SignalCatcher是生成Java trace的关键一环,我们首先来分析它。
SignalCatcher的启动
在Android中,应用/system_server是由zygote进程启动的。在启动进程之后,会执行SpecializeCommon对进程进行专门化处理,而在此流程里会去启动 SignalCatcher。下面来看看这个流程:
/// @frameworks/base/core/jni/com_android_internal_os_Zygote.cpp
// Utility routine to specialize a zygote child process.
static void SpecializeCommon(JNIEnv* env, uid_t uid, gid_t gid, jintArray gids, jint runtime_flags,
jobjectArray rlimits, jlong permitted_capabilities,
jlong effective_capabilities, jint mount_external,
jstring managed_se_info, jstring managed_nice_name,
bool is_system_server, bool is_child_zygote,
jstring managed_instruction_set, jstring managed_app_data_dir,
bool is_top_app, jobjectArray pkg_data_info_list,
jobjectArray allowlisted_data_info_list, bool mount_data_dirs,
bool mount_storage_dirs)
const char* process_name = is_system_server ? "system_server" : "zygote";
auto fail_fn = std::bind(ZygoteFailure, env, process_name, managed_nice_name, _1);
auto extract_fn = std::bind(ExtractJString, env, process_name, managed_nice_name, _1);
...
SetGids(env, gids, is_child_zygote, fail_fn);
SetRLimits(env, rlimits, fail_fn);
...
if (setresgid(gid, gid, gid) == -1)
fail_fn(CREATE_ERROR("setresgid(%d) failed: %s", gid, strerror(errno)));
// Must be called when the new process still has CAP_SYS_ADMIN, in this case,
// before changing uid from 0, which clears capabilities. The other
// alternative is to call prctl(PR_SET_NO_NEW_PRIVS, 1) afterward, but that
// breaks SELinux domain transition (see b/71859146). As the result,
// privileged syscalls used below still need to be accessible in app process.
SetUpSeccompFilter(uid, is_child_zygote);
// Must be called before losing the permission to set scheduler policy.
SetSchedulerPolicy(fail_fn, is_top_app);
if (setresuid(uid, uid, uid) == -1)
fail_fn(CREATE_ERROR("setresuid(%d) failed: %s", uid, strerror(errno)));
// dumpable, 和抓 core dump 相关
// The "dumpable" flag of a process, which controls core dump generation, is
// overwritten by the value in /proc/sys/fs/suid_dumpable when the effective
// user or group ID changes. See proc(5) for possible values. In most cases,
// the value is 0, so core dumps are disabled for zygote children. However,
// when running in a Chrome OS container, the value is already set to 2,
// which allows the external crash reporter to collect all core dumps. Since
// only system crashes are interested, core dump is disabled for app
// processes. This also ensures compliance with CTS.
int dumpable = prctl(PR_GET_DUMPABLE);
if (dumpable == -1)
ALOGE("prctl(PR_GET_DUMPABLE) failed: %s", strerror(errno));
RuntimeAbort(env, __LINE__, "prctl(PR_GET_DUMPABLE) failed");
if (dumpable == 2 && uid >= AID_APP)
if (prctl(PR_SET_DUMPABLE, 0, 0, 0, 0) == -1)
ALOGE("prctl(PR_SET_DUMPABLE, 0) failed: %s", strerror(errno));
RuntimeAbort(env, __LINE__, "prctl(PR_SET_DUMPABLE, 0) failed");
// Set process properties to enable debugging if required.
if ((runtime_flags & RuntimeFlags::DEBUG_ENABLE_JDWP) != 0) // JDWP
EnableDebugger();
if ((runtime_flags & RuntimeFlags::PROFILE_FROM_SHELL) != 0)
// simpleperf needs the process to be dumpable to profile it.
if (prctl(PR_SET_DUMPABLE, 1, 0, 0, 0) == -1)
ALOGE("prctl(PR_SET_DUMPABLE) failed: %s", strerror(errno));
RuntimeAbort(env, __LINE__, "prctl(PR_SET_DUMPABLE, 1) failed");
HeapTaggingLevel heap_tagging_level;
...
// ASAN
bool forceEnableGwpAsan = false;
switch (runtime_flags & RuntimeFlags::GWP_ASAN_LEVEL_MASK)
default:
case RuntimeFlags::GWP_ASAN_LEVEL_NEVER:
break;
case RuntimeFlags::GWP_ASAN_LEVEL_ALWAYS:
forceEnableGwpAsan = true;
[[fallthrough]];
case RuntimeFlags::GWP_ASAN_LEVEL_LOTTERY:
android_mallopt(M_INITIALIZE_GWP_ASAN, &forceEnableGwpAsan, sizeof(forceEnableGwpAsan));
// Now that we've used the flag, clear it so that we don't pass unknown flags to the ART
// runtime.
runtime_flags &= ~RuntimeFlags::GWP_ASAN_LEVEL_MASK;
...
SetCapabilities(permitted_capabilities, effective_capabilities, permitted_capabilities,
fail_fn);
...
const char* se_info_ptr = se_info.has_value() ? se_info.value().c_str() : nullptr;
const char* nice_name_ptr = nice_name.has_value() ? nice_name.value().c_str() : nullptr;
// selinux
if (selinux_android_setcontext(uid, is_system_server, se_info_ptr, nice_name_ptr) == -1)
fail_fn(CREATE_ERROR("selinux_android_setcontext(%d, %d, \\"%s\\", \\"%s\\") failed", uid,
is_system_server, se_info_ptr, nice_name_ptr));
// 设置线程名
// Make it easier to debug audit logs by setting the main thread's name to the
// nice name rather than "app_process".
if (nice_name.has_value())
SetThreadName(nice_name.value());
else if (is_system_server)
SetThreadName("system_server");
// Unset the SIGCHLD handler, but keep ignoring SIGHUP (rationale in SetSignalHandlers).
UnsetChldSignalHandler(); // Sets the SIGCHLD handler back to default behavior in zygote children.
...
// 调用 Zygote 的 callPostForkChildHooks
env->CallStaticVoidMethod(gZygoteClass, gCallPostForkChildHooks, runtime_flags,
is_system_server, is_child_zygote, managed_instruction_set);
// Reset the process priority to the default value.
setpriority(PRIO_PROCESS, 0, PROCESS_PRIORITY_DEFAULT);
...
Zygote#callPostForkChildHooks
/// @frameworks/base/core/java/com/android/internal/os/Zygote.java
// This function is called from native code in com_android_internal_os_Zygote.cpp
@SuppressWarnings("unused")
private static void callPostForkChildHooks(int runtimeFlags, boolean isSystemServer,
boolean isZygote, String instructionSet)
ZygoteHooks.postForkChild(runtimeFlags, isSystemServer, isZygote, instructionSet);
ZygoteHooks#postForkChild
/// @dalvik/system/ZygoteHooks.java
/**
* Called by the zygote in the child process after every fork.
*
* @param runtimeFlags The runtime flags to apply to the child process.
* @param isSystemServer Whether the child process is system server.
* @param isChildZygote Whether the child process is a child zygote.
* @param instructionSet The instruction set of the child, used to determine
* whether to use a native bridge.
*
* @hide
*/
@SystemApi(client = MODULE_LIBRARIES)
@libcore.api.CorePlatformApi(status = libcore.api.CorePlatformApi.Status.STABLE)
public static void postForkChild(int runtimeFlags, boolean isSystemServer,
boolean isChildZygote, String instructionSet)
// 进入 native 方法
nativePostForkChild(token, runtimeFlags, isSystemServer, isChildZygote, instructionSet);
Math.setRandomSeedInternal(System.currentTimeMillis());
// Enable memory-mapped coverage if JaCoCo is in the boot classpath. system_server is
// skipped due to being persistent and having its own coverage writing mechanism.
if (!isSystemServer && enableMemoryMappedDataMethod != null)
try
enableMemoryMappedDataMethod.invoke(null);
catch (ReflectiveOperationException e)
throw new RuntimeException(e);
// Hook for all child processes post forking.
private static native void nativePostForkChild(long token, int runtimeFlags,
boolean isSystemServer, boolean isZygote,
String instructionSet);
ZygoteHooks_nativePostForkChild
/// @art/runtime/native/dalvik_system_ZygoteHooks.cc
static void ZygoteHooks_nativePostForkChild(JNIEnv* env,
jclass,
jlong token,
jint runtime_flags,
jboolean is_system_server,
jboolean is_zygote,
jstring instruction_set)
...
Runtime* runtime = Runtime::Current();
...
if (instruction_set != nullptr && !is_system_server)
ScopedUtfChars isa_string(env, instruction_set);
InstructionSet isa = GetInstructionSetFromString(isa_string.c_str());
Runtime::NativeBridgeAction action = Runtime::NativeBridgeAction::kUnload;
if (isa != InstructionSet::kNone && isa != kRuntimeISA)
action = Runtime::NativeBridgeAction::kInitialize;
runtime->InitNonZygoteOrPostFork(env, is_system_server, is_zygote, action, isa_string.c_str());
else // 调用 runtime 的 InitNonZygoteOrPostFork
runtime->InitNonZygoteOrPostFork(
env,
is_system_server,
is_zygote,
Runtime::NativeBridgeAction::kUnload,
/*isa=*/ nullptr,
profile_system_server);
Runtime::InitNonZygoteOrPostFork
void Runtime::InitNonZygoteOrPostFork(JNIEnv* env, bool is_system_server,
// This is true when we are initializing a child-zygote. It requires
// native bridge initialization to be able to run guest native code in
// doPreload().
bool is_child_zygote, NativeBridgeAction action, const char* isa, bool profile_system_server)
...
// Create the thread pools.
heap_->CreateThreadPool();
// Avoid creating the runtime thread pool for system server since it will not be used and would
// waste memory.
if (!is_system_server)
ScopedTrace timing("CreateThreadPool");
constexpr size_t kStackSize = 64 * KB;
constexpr size_t kMaxRuntimeWorkers = 4u;
const size_t num_workers =
std::min(static_cast<size_t>(std::thread::hardware_concurrency()), kMaxRuntimeWorkers);
MutexLock mu(Thread::Current(), *Locks::runtime_thread_pool_lock_);
CHECK(thread_pool_ == nullptr);
thread_pool_.reset(new ThreadPool("Runtime", num_workers, /*create_peers=*/false, kStackSize));
thread_pool_->StartWorkers(Thread::Current());
// Reset the gc performance data and metrics at zygote fork so that the events from
// before fork aren't attributed to an app.
heap_->ResetGcPerformanceInfo();
GetMetrics()->Reset();
...
StartSignalCatcher(); // 启动 SignalCatcher
...
// Start the JDWP thread. If the command-line debugger flags specified "suspend=y",
// this will pause the runtime (in the internal debugger implementation), so we probably want
// this to come last.
GetRuntimeCallbacks()->StartDebugger();
Runtime::StartSignalCatcher
/// @art/runtime/runtime.cc
void Runtime::StartSignalCatcher()
if (!is_zygote_)
signal_catcher_ = new SignalCatcher();
SignalCatcher
/// @art/runtime/signal_catcher.cc
SignalCatcher::SignalCatcher()
: lock_("SignalCatcher lock"),
cond_("SignalCatcher::cond_", lock_),
thread_(nullptr)
SetHaltFlag(false);
// 创建 Signal Catcher 线程, 启动后调用 Run 函数
// Create a raw pthread; its start routine will attach to the runtime.
CHECK_PTHREAD_CALL(pthread_create, (&pthread_, nullptr, &Run, this), "signal catcher thread");
Thread* self = Thread::Current();
MutexLock mu(self, lock_);
while (thread_ == nullptr) // 等待新线程创建并attach到runtime
cond_.Wait(self);
SignalCatcher::Run
/// @art/runtime/signal_catcher.cc
void* SignalCatcher::Run(void* arg)
SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);
CHECK(signal_catcher != nullptr);
Runtime* runtime = Runtime::Current();
CHECK(runtime->AttachCurrentThread("Signal Catcher", true, runtime->GetSystemThreadGroup(),
!runtime->IsAotCompiler())); // attach到runtime
Thread* self = Thread::Current();
DCHECK_NE(self->GetState(), kRunnable);
MutexLock mu(self, signal_catcher->lock_);
signal_catcher->thread_ = self;
signal_catcher->cond_.Broadcast(self);
// Set up mask with signals we want to handle.
SignalSet signals;
signals.Add(SIGQUIT); // 将 SIGQUIT 信号加入信号集
signals.Add(SIGUSR1);
while (true)
// 等待信号集的信号发生
int signal_number = signal_catcher->WaitForSignal(self, signals);
if (signal_catcher->ShouldHalt())
runtime->DetachCurrentThread();
return nullptr;
switch (signal_number)
case SIGQUIT:
signal_catcher->HandleSigQuit(); // 处理 quit 信号 3
break;
case SIGUSR1:
signal_catcher->HandleSigUsr1();
break;
default:
LOG(ERROR) << "Unexpected signal %d" << signal_number;
break;
SignalCatcher处理 SIGQUIT
当 SignalCatcher 接收到信号 SIGQUIT 时,会去调用signal_catcher->HandleSigQuit执行dump操作
SignalCatcher::HandleSigQuit
/// @art/runtime/signal_catcher.cc
void SignalCatcher::HandleSigQuit()
Runtime* runtime = Runtime::Current();
std::ostringstream os;
os << "\\n" << "----- pid " << getpid() << " at " << GetIsoDate() << " -----\\n";
DumpCmdLine(os); // 输出 /proc/self/cmdline
// Note: The strings "Build fingerprint:" and "ABI:" are chosen to match the format used by
// debuggerd. This allows, for example, the stack tool to work.
std::string fingerprint = runtime->GetFingerprint(); // 输出 fingerprint
os << "Build fingerprint: '" << (fingerprint.empty() ? "unknown" : fingerprint) << "'\\n";
os << "ABI: '" << GetInstructionSetString(runtime->GetInstructionSet()) << "'\\n";
os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\\n";
runtime->DumpForSigQuit(os); // runtime dump 操作
if ((false))
std::string maps;
if (android::base::ReadFileToString("/proc/self/maps", &maps))
os << "/proc/self/maps:\\n" << maps;
os << "----- end " << getpid() 以上是关于Android 12 Java trace 生成过程分析的主要内容,如果未能解决你的问题,请参考以下文章
Android Runtime | Trace文件的生成机制