Android 12 Java trace 生成过程分析

Posted pecuyu

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Android 12 Java trace 生成过程分析相关的知识,希望对你有一定的参考价值。

概述

在分析一些android问题,比如ANR或Watchdog冻屏时,需要拿到相关进程的Java trace,然后分析是哪出了问题。但是这个Java trace是怎么生成的呢?在Android中的Java进程一般都是运行在art虚拟机之上的,而要拿到相关进程的Java trace,则需要它来完成相关dump操作。根据代码实现也能说明这一点,在art里面运行了一个 SignalCatcher 线程,专门用来处理这个逻辑。

SignalCatcher 线程启动后,会循环等待 SIGQUIT (信号3)的发生,当收到SIGQUIT后,会触发art去执行dump操作。当完成dump后,会通过socket连接到tombstoned,将trace输出到指定路径(通常在/data/anr下),接下来进行下一轮等待。因为art将SIGQUIT进行了拦截处理用来输出trace,因此并不会像linux进程一样默认会退出。

Watchdog(4) Trace生成过程一文中,我们分析了发生Watchdog时抓Java trace的流程,从实现看是system_server发送了SIGQUIT到目标进程,然后触发SignalCatcher去进行dump,后者完成dump后会连接tombstoned去输出trace内容。本篇主要来讲SignalCatcher接收SIGQUIT,并产生Java trace生成的流程。

生成trace命令

在一些有现场机器的时候,我们可能会再抓一个Java trace,看看最新的状态。如下命令是比较常见的,不过通常需要root权限:

  • debuggerd -j $pid
    输出指定进程的 java traces,可以重定向输出
  usage: debuggerd [-bj] PID

  -b, --backtrace    just a backtrace rather than a full tombstone
  -j                 collect java traces
  • kill -3 $pid
    直接发送 SIGQUIT , 在 /data/anr/ 生成trace

示例:

$ adb root
restarting adbd as root
$ adb shell pidof system_server
516
$ adb shell debuggerd -j 516 > system_server_trace.txt
$ adb shell kill -3 516
$ adb shell ls /data/anr
trace_02
$ adb pull /data/anr/trace_02

trace 生成流程

从之前描述可知,SignalCatcher是生成Java trace的关键一环,我们首先来分析它。

SignalCatcher的启动

在Android中,应用/system_server是由zygote进程启动的。在启动进程之后,会执行SpecializeCommon对进程进行专门化处理,而在此流程里会去启动 SignalCatcher。下面来看看这个流程:

/// @frameworks/base/core/jni/com_android_internal_os_Zygote.cpp
// Utility routine to specialize a zygote child process.
static void SpecializeCommon(JNIEnv* env, uid_t uid, gid_t gid, jintArray gids, jint runtime_flags,
                             jobjectArray rlimits, jlong permitted_capabilities,
                             jlong effective_capabilities, jint mount_external,
                             jstring managed_se_info, jstring managed_nice_name,
                             bool is_system_server, bool is_child_zygote,
                             jstring managed_instruction_set, jstring managed_app_data_dir,
                             bool is_top_app, jobjectArray pkg_data_info_list,
                             jobjectArray allowlisted_data_info_list, bool mount_data_dirs,
                             bool mount_storage_dirs) 
    const char* process_name = is_system_server ? "system_server" : "zygote";
    auto fail_fn = std::bind(ZygoteFailure, env, process_name, managed_nice_name, _1);
    auto extract_fn = std::bind(ExtractJString, env, process_name, managed_nice_name, _1);
    ...
    SetGids(env, gids, is_child_zygote, fail_fn);
    SetRLimits(env, rlimits, fail_fn);
    ...
    if (setresgid(gid, gid, gid) == -1) 
        fail_fn(CREATE_ERROR("setresgid(%d) failed: %s", gid, strerror(errno)));
    

    // Must be called when the new process still has CAP_SYS_ADMIN, in this case,
    // before changing uid from 0, which clears capabilities.  The other
    // alternative is to call prctl(PR_SET_NO_NEW_PRIVS, 1) afterward, but that
    // breaks SELinux domain transition (see b/71859146).  As the result,
    // privileged syscalls used below still need to be accessible in app process.
    SetUpSeccompFilter(uid, is_child_zygote);

    // Must be called before losing the permission to set scheduler policy.
    SetSchedulerPolicy(fail_fn, is_top_app);

    if (setresuid(uid, uid, uid) == -1) 
        fail_fn(CREATE_ERROR("setresuid(%d) failed: %s", uid, strerror(errno)));
    
    // dumpable, 和抓 core dump 相关
    // The "dumpable" flag of a process, which controls core dump generation, is
    // overwritten by the value in /proc/sys/fs/suid_dumpable when the effective
    // user or group ID changes. See proc(5) for possible values. In most cases,
    // the value is 0, so core dumps are disabled for zygote children. However,
    // when running in a Chrome OS container, the value is already set to 2,
    // which allows the external crash reporter to collect all core dumps. Since
    // only system crashes are interested, core dump is disabled for app
    // processes. This also ensures compliance with CTS.
    int dumpable = prctl(PR_GET_DUMPABLE);
    if (dumpable == -1) 
        ALOGE("prctl(PR_GET_DUMPABLE) failed: %s", strerror(errno));
        RuntimeAbort(env, __LINE__, "prctl(PR_GET_DUMPABLE) failed");
    

    if (dumpable == 2 && uid >= AID_APP) 
        if (prctl(PR_SET_DUMPABLE, 0, 0, 0, 0) == -1) 
            ALOGE("prctl(PR_SET_DUMPABLE, 0) failed: %s", strerror(errno));
            RuntimeAbort(env, __LINE__, "prctl(PR_SET_DUMPABLE, 0) failed");
        
    

    // Set process properties to enable debugging if required.
    if ((runtime_flags & RuntimeFlags::DEBUG_ENABLE_JDWP) != 0)  // JDWP
        EnableDebugger();
    
    if ((runtime_flags & RuntimeFlags::PROFILE_FROM_SHELL) != 0) 
        // simpleperf needs the process to be dumpable to profile it.
        if (prctl(PR_SET_DUMPABLE, 1, 0, 0, 0) == -1) 
            ALOGE("prctl(PR_SET_DUMPABLE) failed: %s", strerror(errno));
            RuntimeAbort(env, __LINE__, "prctl(PR_SET_DUMPABLE, 1) failed");
        
    

    HeapTaggingLevel heap_tagging_level;
    ...
    // ASAN
    bool forceEnableGwpAsan = false;
    switch (runtime_flags & RuntimeFlags::GWP_ASAN_LEVEL_MASK) 
        default:
        case RuntimeFlags::GWP_ASAN_LEVEL_NEVER:
            break;
        case RuntimeFlags::GWP_ASAN_LEVEL_ALWAYS:
            forceEnableGwpAsan = true;
            [[fallthrough]];
        case RuntimeFlags::GWP_ASAN_LEVEL_LOTTERY:
            android_mallopt(M_INITIALIZE_GWP_ASAN, &forceEnableGwpAsan, sizeof(forceEnableGwpAsan));
    
    // Now that we've used the flag, clear it so that we don't pass unknown flags to the ART
    // runtime.
    runtime_flags &= ~RuntimeFlags::GWP_ASAN_LEVEL_MASK;
    ...
    SetCapabilities(permitted_capabilities, effective_capabilities, permitted_capabilities,
                    fail_fn);
    ...
    const char* se_info_ptr = se_info.has_value() ? se_info.value().c_str() : nullptr;
    const char* nice_name_ptr = nice_name.has_value() ? nice_name.value().c_str() : nullptr;
    // selinux
    if (selinux_android_setcontext(uid, is_system_server, se_info_ptr, nice_name_ptr) == -1) 
        fail_fn(CREATE_ERROR("selinux_android_setcontext(%d, %d, \\"%s\\", \\"%s\\") failed", uid,
                             is_system_server, se_info_ptr, nice_name_ptr));
    
    // 设置线程名
    // Make it easier to debug audit logs by setting the main thread's name to the
    // nice name rather than "app_process".
    if (nice_name.has_value()) 
        SetThreadName(nice_name.value());
     else if (is_system_server) 
        SetThreadName("system_server");
    

    // Unset the SIGCHLD handler, but keep ignoring SIGHUP (rationale in SetSignalHandlers).
    UnsetChldSignalHandler(); // Sets the SIGCHLD handler back to default behavior in zygote children.
    ...
    // 调用 Zygote 的 callPostForkChildHooks
    env->CallStaticVoidMethod(gZygoteClass, gCallPostForkChildHooks, runtime_flags,
                              is_system_server, is_child_zygote, managed_instruction_set);

    // Reset the process priority to the default value.
    setpriority(PRIO_PROCESS, 0, PROCESS_PRIORITY_DEFAULT);
    ...

Zygote#callPostForkChildHooks

/// @frameworks/base/core/java/com/android/internal/os/Zygote.java
// This function is called from native code in com_android_internal_os_Zygote.cpp
@SuppressWarnings("unused")
private static void callPostForkChildHooks(int runtimeFlags, boolean isSystemServer,
        boolean isZygote, String instructionSet) 
    ZygoteHooks.postForkChild(runtimeFlags, isSystemServer, isZygote, instructionSet);

ZygoteHooks#postForkChild

/// @dalvik/system/ZygoteHooks.java
/**
 * Called by the zygote in the child process after every fork.
 *
 * @param runtimeFlags The runtime flags to apply to the child process.
 * @param isSystemServer Whether the child process is system server.
 * @param isChildZygote Whether the child process is a child zygote.
 * @param instructionSet The instruction set of the child, used to determine
 *                       whether to use a native bridge.
 *
 * @hide
 */
@SystemApi(client = MODULE_LIBRARIES)
@libcore.api.CorePlatformApi(status = libcore.api.CorePlatformApi.Status.STABLE)
public static void postForkChild(int runtimeFlags, boolean isSystemServer,
        boolean isChildZygote, String instructionSet) 
    // 进入 native 方法     
    nativePostForkChild(token, runtimeFlags, isSystemServer, isChildZygote, instructionSet);

    Math.setRandomSeedInternal(System.currentTimeMillis());

    // Enable memory-mapped coverage if JaCoCo is in the boot classpath. system_server is
    // skipped due to being persistent and having its own coverage writing mechanism.
    if (!isSystemServer && enableMemoryMappedDataMethod != null) 
      try 
        enableMemoryMappedDataMethod.invoke(null);
       catch (ReflectiveOperationException e) 
        throw new RuntimeException(e);
      
    


// Hook for all child processes post forking.
private static native void nativePostForkChild(long token, int runtimeFlags,
                                               boolean isSystemServer, boolean isZygote,
                                               String instructionSet);

ZygoteHooks_nativePostForkChild

/// @art/runtime/native/dalvik_system_ZygoteHooks.cc
static void ZygoteHooks_nativePostForkChild(JNIEnv* env,
                                            jclass,
                                            jlong token,
                                            jint runtime_flags,
                                            jboolean is_system_server,
                                            jboolean is_zygote,
                                            jstring instruction_set) 
  ...                                            
  Runtime* runtime = Runtime::Current();
  ...
  if (instruction_set != nullptr && !is_system_server) 
    ScopedUtfChars isa_string(env, instruction_set);
    InstructionSet isa = GetInstructionSetFromString(isa_string.c_str());
    Runtime::NativeBridgeAction action = Runtime::NativeBridgeAction::kUnload;
    if (isa != InstructionSet::kNone && isa != kRuntimeISA) 
      action = Runtime::NativeBridgeAction::kInitialize;
    
    runtime->InitNonZygoteOrPostFork(env, is_system_server, is_zygote, action, isa_string.c_str());
   else  // 调用 runtime 的 InitNonZygoteOrPostFork
    runtime->InitNonZygoteOrPostFork(
        env,
        is_system_server,
        is_zygote,
        Runtime::NativeBridgeAction::kUnload,
        /*isa=*/ nullptr,
        profile_system_server);
  

Runtime::InitNonZygoteOrPostFork

void Runtime::InitNonZygoteOrPostFork(JNIEnv* env, bool is_system_server,
    // This is true when we are initializing a child-zygote. It requires
    // native bridge initialization to be able to run guest native code in
    // doPreload().
    bool is_child_zygote, NativeBridgeAction action, const char* isa, bool profile_system_server) 
  ...
  // Create the thread pools.
  heap_->CreateThreadPool();
  // Avoid creating the runtime thread pool for system server since it will not be used and would
  // waste memory.
  if (!is_system_server) 
    ScopedTrace timing("CreateThreadPool");
    constexpr size_t kStackSize = 64 * KB;
    constexpr size_t kMaxRuntimeWorkers = 4u;
    const size_t num_workers =
        std::min(static_cast<size_t>(std::thread::hardware_concurrency()), kMaxRuntimeWorkers);
    MutexLock mu(Thread::Current(), *Locks::runtime_thread_pool_lock_);
    CHECK(thread_pool_ == nullptr);
    thread_pool_.reset(new ThreadPool("Runtime", num_workers, /*create_peers=*/false, kStackSize));
    thread_pool_->StartWorkers(Thread::Current());
  

  // Reset the gc performance data and metrics at zygote fork so that the events from
  // before fork aren't attributed to an app.
  heap_->ResetGcPerformanceInfo();
  GetMetrics()->Reset();
  ...
  StartSignalCatcher(); // 启动 SignalCatcher

  ...
  // Start the JDWP thread. If the command-line debugger flags specified "suspend=y",
  // this will pause the runtime (in the internal debugger implementation), so we probably want
  // this to come last.
  GetRuntimeCallbacks()->StartDebugger();

Runtime::StartSignalCatcher

/// @art/runtime/runtime.cc
void Runtime::StartSignalCatcher() 
  if (!is_zygote_) 
    signal_catcher_ = new SignalCatcher();
  

SignalCatcher

/// @art/runtime/signal_catcher.cc
SignalCatcher::SignalCatcher()
    : lock_("SignalCatcher lock"),
      cond_("SignalCatcher::cond_", lock_),
      thread_(nullptr) 
  SetHaltFlag(false);
  // 创建 Signal Catcher 线程, 启动后调用 Run 函数
  // Create a raw pthread; its start routine will attach to the runtime.
  CHECK_PTHREAD_CALL(pthread_create, (&pthread_, nullptr, &Run, this), "signal catcher thread");

  Thread* self = Thread::Current();
  MutexLock mu(self, lock_);
  while (thread_ == nullptr)  // 等待新线程创建并attach到runtime
    cond_.Wait(self);
  

SignalCatcher::Run

/// @art/runtime/signal_catcher.cc
void* SignalCatcher::Run(void* arg) 
  SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);
  CHECK(signal_catcher != nullptr);

  Runtime* runtime = Runtime::Current();
  CHECK(runtime->AttachCurrentThread("Signal Catcher", true, runtime->GetSystemThreadGroup(),
                                     !runtime->IsAotCompiler())); // attach到runtime

  Thread* self = Thread::Current();
  DCHECK_NE(self->GetState(), kRunnable);
  
    MutexLock mu(self, signal_catcher->lock_);
    signal_catcher->thread_ = self;
    signal_catcher->cond_.Broadcast(self);
  

  // Set up mask with signals we want to handle.
  SignalSet signals;
  signals.Add(SIGQUIT); // 将 SIGQUIT 信号加入信号集
  signals.Add(SIGUSR1);

  while (true) 
    // 等待信号集的信号发生
    int signal_number = signal_catcher->WaitForSignal(self, signals);
    if (signal_catcher->ShouldHalt()) 
      runtime->DetachCurrentThread();
      return nullptr;
    

    switch (signal_number) 
    case SIGQUIT:
      signal_catcher->HandleSigQuit(); // 处理 quit 信号 3
      break;
    case SIGUSR1:
      signal_catcher->HandleSigUsr1();
      break;
    default:
      LOG(ERROR) << "Unexpected signal %d" << signal_number;
      break;
    
  

SignalCatcher处理 SIGQUIT

当 SignalCatcher 接收到信号 SIGQUIT 时,会去调用signal_catcher->HandleSigQuit执行dump操作

SignalCatcher::HandleSigQuit

/// @art/runtime/signal_catcher.cc
void SignalCatcher::HandleSigQuit() 
  Runtime* runtime = Runtime::Current();
  std::ostringstream os;
  os << "\\n" << "----- pid " << getpid() << " at " << GetIsoDate() << " -----\\n";

  DumpCmdLine(os);  // 输出 /proc/self/cmdline

  // Note: The strings "Build fingerprint:" and "ABI:" are chosen to match the format used by
  // debuggerd. This allows, for example, the stack tool to work.
  std::string fingerprint = runtime->GetFingerprint(); // 输出 fingerprint
  os << "Build fingerprint: '" << (fingerprint.empty() ? "unknown" : fingerprint) << "'\\n";
  os << "ABI: '" << GetInstructionSetString(runtime->GetInstructionSet()) << "'\\n";

  os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\\n";

  runtime->DumpForSigQuit(os); // runtime dump 操作

  if ((false)) 
    std::string maps;
    if (android::base::ReadFileToString("/proc/self/maps", &maps)) 
      os << "/proc/self/maps:\\n" << maps;
    
  
  os << "----- end " << getpid() 以上是关于Android 12 Java trace 生成过程分析的主要内容,如果未能解决你的问题,请参考以下文章

Android 12 Java trace 生成过程分析

Android 12 Watchdog Trace生成过程

Android 12 Watchdog Trace生成过程

Android Runtime | Trace文件的生成机制

Android Runtime | Trace文件的生成机制

Android ANR log trace日志文件分析