Android ANR之traces日志线程参数解析大全

Posted Winston Wood

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Android ANR之traces日志线程参数解析大全相关的知识,希望对你有一定的参考价值。

        ANR故障是android开发中的重点难点问题,而traces.txt日志的分析则是解决ANR问题的关键所在。有很多人对于traces.txt日志中的诸多线程状态参数一知半解,不知所云。本文章对这些状态参数结合源码进行全面的解读,相信对于解决ANR问题大有裨益(本文并不提供解决ANR问题本身的思路和方法)。

        执行 adb shell kill -3 pid或者当进程发生anr、native crash等故障的时候,系统会生成traces日志文件,默认输出到 /data/anr/ 目录下。traces日志主要由发生anr时的资源使用情况以及各个线程的状态组成。

----- pid 8072 at 2020-03-22 10:23:06 -----
Cmd line: com.zte.camera
Build fingerprint: 'ZTE/GEN_CN_P439S01/P439S01:9/PKQ1.180929.001/20181106.134337:userdebug/test-keys'
ABI: 'arm'
...
Heap: 41% free, 11MB/20MB; 186365 objects
...
Total memory 20MB
Max memory 512MB
Zygote space size 1584KB
...
suspend all histogram:	Sum: 191us 99% C.I. 4us-62us Avg: 21.222us Max: 62us
DALVIK THREADS (24):
"Signal Catcher" daemon prio=5 tid=3 Runnable
  | group="system" sCount=0 dsCount=0 flags=0 obj=0x14540088 self=0xeb6cb200
  | sysTid=8078 nice=0 cgrp=default sched=0/0 handle=0xe5dcb970
  | state=R schedstat=( 59094373 1072863 12 ) utm=4 stm=1 core=4 HZ=100
  | stack=0xe5cd0000-0xe5cd2000 stackSize=1010KB
  | held mutexes= "mutator lock"(shared held)
  native: #00 pc 002d975f  /system/lib/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, int, BacktraceMap*, char const*, art::ArtMethod*, void*, bool)+134)
  native: #01 pc 0036e91b  /system/lib/libart.so (art::Thread::DumpStack(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, bool, BacktraceMap*, bool) const+210)
  native: #02 pc 0036b0d3  /system/lib/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, bool, BacktraceMap*, bool) const+34)
  native: #03 pc 00383d89  /system/lib/libart.so (art::DumpCheckpoint::Run(art::Thread*)+624)
  native: #04 pc 0037e06f  /system/lib/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*)+314)
  native: #05 pc 0037d767  /system/lib/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, bool)+758)
  native: #06 pc 0037d39f  /system/lib/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char>>&)+614)
  native: #07 pc 00357669  /system/lib/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char>>&)+120)
  native: #08 pc 00360689  /system/lib/libart.so (art::SignalCatcher::HandleSigQuit()+1040)
  native: #09 pc 0035f81b  /system/lib/libart.so (art::SignalCatcher::Run(void*)+246)
  native: #10 pc 00071db1  /system/lib/libc.so (__pthread_start(void*)+22)
  native: #11 pc 0001de65  /system/lib/libc.so (__start_thread+24)
  (no managed stack frames)

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x74cb56d8 self=0xeb6ca000
  | sysTid=8072 nice=-10 cgrp=default sched=0/0 handle=0xeffe9494
  | state=S schedstat=( 1457862132 142332556 1571 ) utm=114 stm=31 core=4 HZ=100
  | stack=0xff195000-0xff197000 stackSize=8MB
  | held mutexes=
  kernel: __switch_to+0x90/0xe8
  kernel: futex_wait_queue_me+0xc4/0x13c
  kernel: futex_wait+0xe4/0x204
  kernel: do_futex+0x168/0x9a0
  kernel: compat_SyS_futex+0xf0/0x174
  kernel: __sys_trace+0x4c/0x4c
  native: #00 pc 00019e8c  /system/lib/libc.so (syscall+28)

其中以下片段为线程的各项状态参数,是本文将要解析的部分。

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x74cb56d8 self=0xeb6ca000
  | sysTid=8072 nice=-10 cgrp=default sched=0/0 handle=0xeffe9494
  | state=S schedstat=( 1457862132 142332556 1571 ) utm=114 stm=31 core=4 HZ=100
  | stack=0xff195000-0xff197000 stackSize=8MB
  | held mutexes=

        输出线程的状态参数所涉及的核心代码为Thread::DumpState函数(代码位于/art/runtime/thread.cc文件),以下深入全面解析各行参数的涵义:

(1)"main" prio=5 tid=1 Native

  • "main" 

       线程名。"main"表示该线程为主线程;"Binder:8072_2"表示为线程与进程8072关联的Binder线程;"Signal Catcher" daemon表示为守护线程Signal Catcher。

os << '"' << *thread->tlsPtr_.name << '"';
  • prio=5

       线程优先级。通过java.lang.Thread.setPriority设置,取值范围 [1, 10],低优先级 -> 高优先级。

os << " prio=" << priority
  • tid=1

       进程内部线程id。取值一般比较小。

os << " tid=" << thread->GetThreadId();
  • Native

       线程状态。具体取值及意义如下图。

os  << " " << thread->GetState();

(2)group="main" sCount=1 dsCount=0 flags=1 obj=0x74cb56d8 self=0xeb6ca000

  • group="main"

       线程所属的线程组。Java中使用ThreadGroup来表示线程组,它可以对一批线程进行分类管理,Java允许程序直接对线程组进行控制。默认情况下,所有的线程都属于主线程组。我们也可以给线程设置分组。

os << "  | group=\\"" << group_name << "\\"";
  • sCount=1

       线程挂起次数。

  os << " sCount=" << thread->tls32_.suspend_count;
  • dsCount=0 

       用于调试的线程挂起次数。

os << " dsCount=" << thread->tls32_.debug_suspend_count;
  • flags=1
os << " flags=" << thread->tls32_.state_and_flags.as_struct.flags;
  • obj=0x74cb56d8

       当前线程关联的java线程对象。

os << " obj=" << reinterpret_cast<void*>(thread->tlsPtr_.opeer);
  • self=0xeb6ca000

       当前线程地址。

os << " self=" << reinterpret_cast<const void*>(thread) << "\\n";

(3)sysTid=8072 nice=-10 cgrp=default sched=0/0 handle=0xeffe9494

  • sysTid=8072

      线程号,Linux内核分配的id。在系统中是唯一编号的。

  os << "  | sysTid=" << tid;
  • nice=-10

       线程的调度优先级。通过android.os.Process.setThreadPriority设置,取值范围 [-20, 19],高优先级 -> 低优先级。

    os << " nice=" << getpriority(PRIO_PROCESS, tid);
  • cgrp=default

       程所属的进程调度组。Control groups,是Linux内核的一个功能,用来限制,控制与分离一个进程组群的资源(如CPU、内存、磁盘输入输出等)。

os << " cgrp=" << scheduler_group_name;
  • sched=0/0

      分别标志了线程的调度策略和优先级

os << " sched=" << policy << "/" << sp.sched_priority;
  • handle=0xeffe9494

       线程的处理函数地址

  os << " handle=" << reinterpret_cast<void*>(thread->tlsPtr_.pthread_self);

(4)state=S schedstat=( 1457862132 142332556 1571 ) utm=114 stm=31 core=4 HZ=100

  • state=S

       线程的调度状态。  

  os << "  | state=" << native_thread_state;
  • schedstat=( 1457862132 142332556 1571 )

       CPU调度时间统计,分别表示线程在cpu上执行的时间、线程的等待时间和线程执行的时间片长度。可通过adb shell cat /proc/[pid]/task/[tid]/schedstat查看。

  os << " schedstat=( " << scheduler_stats << " )";
  • utm=114 stm=31

       用户态/内核态下使用CPU时间(单位是jiffies)。adb shell cat /proc/[pid]/task/[tid]/stat查看。

  os << " utm=" << utime
     << " stm=" << stime;
  • core=4

      最后执行该线程的cpu核的序号 。

  os << " core=" << task_cpu;
  • HZ=100

      全局变量jiffies用来记录自系统启动以来产生的中断总数。系统启动时,内核将该变量初始化为0,之后每次时钟中断处理程序都会增加该变量的值。一秒内时钟中断的次数等于Hz。Tick是HZ的倒数,表示每发生一次时钟中断所需的时间。HZ=100,那么Tick=10ms。每秒增加的jiffies数等于Hz。系统运行时间以秒为单位,等于jiffies/Hz。

  os << " HZ=" << sysconf(_SC_CLK_TCK) << "\\n";

(5)stack=0xff195000-0xff197000 stackSize=8MB

  • stack=0xff195000-0xff197000

        线程栈的地址区间

    os << "  | stack=" << reinterpret_cast<void*>(thread->tlsPtr_.stack_begin) << "-"
        << reinterpret_cast<void*>(thread->tlsPtr_.stack_end);
  • stackSize=8MB

       线程栈的大小。stack地址区间的距离(0xff197000-0xff195000 = 8192KB)。Android中主线程栈大小默认为8M,子线程栈大小稍微小于1M。

    os << PrettySize(thread->tlsPtr_.stack_size) << "\\n";

(6)held mutexes=

 线程所持有mutex类型。分为独占锁exclusive和共享锁shared两类。

  os << "  | held mutexes=";
    for (size_t i = 0; i < kLockLevelCount; ++i) 
      if (i != kMonitorLock) 
        BaseMutex* mutex = thread->GetHeldMutex(static_cast<LockLevel>(i));
        if (mutex != nullptr) 
          os << " \\"" << mutex->GetName() << "\\"";
          if (mutex->IsReaderWriterMutex()) 
            ReaderWriterMutex* rw_mutex = down_cast<ReaderWriterMutex*>(mutex);
            if (rw_mutex->GetExclusiveOwnerTid() == tid) 
              os << "(exclusive held)";
             else 
              os << "(shared held)";
            
          
        
      
    

以下为/art/runtime/thread.cc->Thread::DumpState函数的主要代码。如需查看完整源码,请移步http://androidxref.com/9.0.0_r3/xref/art/runtime/thread.cc

void Thread::DumpState(std::ostream& os, const Thread* thread, pid_t tid) 
  std::string group_name;
  int priority;
  bool is_daemon = false;
  Thread* self = Thread::Current();
  if (thread != nullptr) 
    os << '"' << *thread->tlsPtr_.name << '"';
    if (is_daemon) 
      os << " daemon";
    
    os << " prio=" << priority
       << " tid=" << thread->GetThreadId()
       << " " << thread->GetState();
    if (thread->IsStillStarting()) 
      os << " (still starting up)";
    
    os << "\\n";
   else 
    os << '"' << ::art::GetThreadName(tid) << '"'
       << " prio=" << priority
       << " (not attached)\\n";
  

  if (thread != nullptr) 
    MutexLock mu(self, *Locks::thread_suspend_count_lock_);
    os << "  | group=\\"" << group_name << "\\""
       << " sCount=" << thread->tls32_.suspend_count
       << " dsCount=" << thread->tls32_.debug_suspend_count
       << " flags=" << thread->tls32_.state_and_flags.as_struct.flags
       << " obj=" << reinterpret_cast<void*>(thread->tlsPtr_.opeer)
       << " self=" << reinterpret_cast<const void*>(thread) << "\\n";
  

  os << "  | sysTid=" << tid
     << " nice=" << getpriority(PRIO_PROCESS, tid)
     << " cgrp=" << scheduler_group_name;
  if (thread != nullptr) 
    int policy;
    sched_param sp;
#if !defined(__APPLE__)
    policy = sched_getscheduler(tid);
#else
    CHECK_PTHREAD_CALL(pthread_getschedparam, (thread->tlsPtr_.pthread_self, &policy, &sp), __FUNCTION__);
#endif
    os << " sched=" << policy << "/" << sp.sched_priority
       << " handle=" << reinterpret_cast<void*>(thread->tlsPtr_.pthread_self);
  
  os << "\\n";
  // Grab the scheduler stats for this thread.
  std::string scheduler_stats;
  if (ReadFileToString(StringPrintf("/proc/self/task/%d/schedstat", tid), &scheduler_stats)) 
    scheduler_stats.resize(scheduler_stats.size() - 1);  // Lose the trailing '\\n'.
   else 
    scheduler_stats = "0 0 0";
  

  char native_thread_state = '?';
  int utime = 0;
  int stime = 0;
  int task_cpu = 0;
  GetTaskStats(tid, &native_thread_state, &utime, &stime, &task_cpu);

  os << "  | state=" << native_thread_state
     << " schedstat=( " << scheduler_stats << " )"
     << " utm=" << utime
     << " stm=" << stime
     << " core=" << task_cpu
     << " HZ=" << sysconf(_SC_CLK_TCK) << "\\n";
  if (thread != nullptr) 
    os << "  | stack=" << reinterpret_cast<void*>(thread->tlsPtr_.stack_begin) << "-"
        << reinterpret_cast<void*>(thread->tlsPtr_.stack_end) << " stackSize="
        << PrettySize(thread->tlsPtr_.stack_size) << "\\n";
    // Dump the held mutexes.
    os << "  | held mutexes=";
    for (size_t i = 0; i < kLockLevelCount; ++i) 
      if (i != kMonitorLock) 
        BaseMutex* mutex = thread->GetHeldMutex(static_cast<LockLevel>(i));
        if (mutex != nullptr) 
          os << " \\"" << mutex->GetName() << "\\"";
          if (mutex->IsReaderWriterMutex()) 
            ReaderWriterMutex* rw_mutex = down_cast<ReaderWriterMutex*>(mutex);
            if (rw_mutex->GetExclusiveOwnerTid() == tid) 
              os << "(exclusive held)";
             else 
              os << "(shared held)";
            
          
        
      
    
    os << "\\n";
  

希望能给大家带来帮助。

如有错误,欢迎指正。


参考资料:

http://gityuan.com/2016/11/26/art-trace/

http://androidxref.com/9.0.0_r3/xref/art/runtime/thread.cc

https://www.jianshu.com/p/cf504cf98750

https://blog.csdn.net/a740169405/article/details/98178287

https://www.cnblogs.com/hebao0514/p/4510498.html

https://blog.csdn.net/l460133921/article/details/51134213

https://blog.csdn.net/u010154760/article/details/45312471

 

以上是关于Android ANR之traces日志线程参数解析大全的主要内容,如果未能解决你的问题,请参考以下文章

Android 如何处理Anr (借用Logcat和Trace 日志)

Android ANR log trace日志文件分析

如何分析解决Android ANR

Android 进阶——性能优化之因Handler引起句柄泄漏导致ANR的定位和解决

Android 进阶——性能优化之因Handler引起句柄泄漏导致ANR的定位和解决

Android 进阶——性能优化之因Handler引起句柄泄漏导致ANR的定位和解决