MTK Sensor越界导致的系统重启问题分析报告

Posted YYPapa

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了MTK Sensor越界导致的系统重启问题分析报告相关的知识,希望对你有一定的参考价值。

【NE现场】

打开12306应用后做一些操作,和容易出现系统重启。dropbox中有好多system_server的tombstone文件:

./[email protected]1449222028760.txt:12:pid: 10466, tid: 10493, name: android.bg  >>> system_server <<<
./[email protected]1449455808867.txt:12:pid: 5992, tid: 6053, name: AlarmManager  >>> system_server <<<
./[email protected]1449222028730.txt:12:pid: 10466, tid: 10494, name: ActivityManager  >>> system_server <<<
./[email protected]1449455808843.txt:12:pid: 5992, tid: 6014, name: SensorService  >>> system_server <<<
./[email protected]1449457509508.txt:12:pid: 11012, tid: 11887, name: Binder_E  >>> system_server <<<
./[email protected]1449229865122.txt:12:pid: 18238, tid: 18260, name: SensorService  >>> system_server <<<

可以看到每次crash的线程都不一样!甚至backtrace也不一样:

@[email protected]
pid: 10466, tid: 10493, name: android.bg  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xa00000070
...
backtrace:
    #00 pc 0000000000029f70  /system/lib64/libbinder.so (android::IPCThreadState::flushCommands()+4)
    #01 pc 0000000000009c60  /data/dalvik-cache/arm64/[email protected]@boot.oat
@[email protected]
pid: 5992, tid: 6053, name: AlarmManager  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x553d89e3c0
...
backtrace:
    #00 pc 0000000000030bc4  /system/lib64/libbinder.so (int android::Parcel::writeAligned<int>(int)+80)
    #01 pc 00000000000d3e0c  /system/lib64/libandroid_runtime.so
    #02 pc 0000000000109630  /data/dalvik-cache/arm64/[email protected]@boot.oat
@[email protected]
pid: 10466, tid: 10494, name: ActivityManager  >>> system_server <<<
signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0x7f0000000a
...
backtrace:
    #00 pc 0000007f0000000a  <unknown>
@[email protected]
pid: 5992, tid: 6014, name: SensorService  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xa00000068
backtrace:
    #00 pc 000000000000f508  /system/lib64/libsensorservice.so
    #01 pc 0000000000010e90  /system/lib64/libsensorservice.so
    #02 pc 00000000000179c0  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+188)
    #03 pc 000000000009277c  /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+96)
    #04 pc 0000000000017224  /system/lib64/libutils.so
    #05 pc 000000000001cbb0  /system/lib64/libc.so (__pthread_start(void*)+52)    #06 pc 0000000000019044  /system/lib64/libc.so (__start_thread+16)
@[email protected]
pid: 11012, tid: 11887, name: Binder_E  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
backtrace:
    #00 pc 0000000000014070  /system/lib64/libutils.so (android::RefBase::decStrong(void const*) const+236)
    #01 pc 000000000002f694  /system/lib64/libbinder.so (android::Parcel::releaseObjects()+84)
    #02 pc 000000000002f6e8  /system/lib64/libbinder.so (android::Parcel::freeDataNoInit()+60)
    #03 pc 000000000002f744  /system/lib64/libbinder.so (android::Parcel::ipcSetDataReference(unsigned char const*, unsigned long, unsigned long long const*, unsigned long, void (*)(android::Parcel*, unsigned char const*, unsigned long, unsigned long long const*, unsigned long, void*), void*)+40)
    #04 pc 000000000002a44c  /system/lib64/libbinder.so (android::IPCThreadState::executeCommand(int)+700)
    #05 pc 000000000002a6c8  /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+92)
    #06 pc 000000000002a73c  /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+76)
    #07 pc 0000000000031d68  /system/lib64/libbinder.so
    #08 pc 00000000000179c0  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+188)
    #09 pc 000000000009277c  /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+96)
    #10 pc 0000000000017224  /system/lib64/libutils.so
    #11 pc 000000000001cbb0  /system/lib64/libc.so (__pthread_start(void*)+52)
    #12 pc 0000000000019044  /system/lib64/libc.so (__start_thread+16)
@[email protected]
pid: 18238, tid: 18260, name: SensorService  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7f3d798d38
backtrace:
    #00 pc 000000000000dbc0  /system/lib64/libsensorservice.so
    #01 pc 000000000000ed44  /system/lib64/libsensorservice.so
    #02 pc 000000000000f3a8  /system/lib64/libsensorservice.so
    #03 pc 0000000000011078  /system/lib64/libsensorservice.so
    #04 pc 00000000000179c0  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+188)
    #05 pc 000000000009277c  /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+96)
    #06 pc 0000000000017224  /system/lib64/libutils.so
    #07 pc 000000000001cbb0  /system/lib64/libc.so (__pthread_start(void*)+52)
    #08 pc 0000000000019044  /system/lib64/libc.so (__start_thread+16)

这种backtrace都不一样的问题很可能就是内存问题了,所谓内存问题指的就是野指针或内存越界。

 

【问题分析】

分析内存问题的第一步就是排查NE现场寄存器指向的内存值附近有没有规律。比如:

@[email protected]
pid: 10466, tid: 10493, name: android.bg  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xa00000070
    x0   000000557e996710  x1   0000000a00000068  x2   0000007f79c31ba0  x3   0000000000000000
    x4   000000006fc46a80  x5   0000000000000001  x6   0000000000000000  x7   000000557e4f527c
    x8   0000000000000000  x9   000000557e4f5278  x10  0000000000000000  x11  0000000000000000
    x12  0000000000000000  x13  0000000000430000  x14  0000000000550000  x15  0000000000430000
    x16  0000007f8f640320  x17  0000007f8eff4f6c  x18  0000007f8c1d0470  x19  000000000000000a
    x20  0000007f8f590e9c  x21  000000557e9b14d0  x22  000000001354bf40  x23  000000006fdf61f0
    x24  0000007f79c31b20  x25  0000000012da2040  x26  0000000000000000  x27  00000000000f52be
    x28  0000000000000000  x29  0000000000358c82  x30  0000000072639c64
    sp   0000007f79c31680  pc   0000007f8eff4f70  pstate 0000000080000000
backtrace:
    #00 pc 0000000000029f70  /system/lib64/libbinder.so (android::IPCThreadState::flushCommands()+4)
    #01 pc 0000000000009c60  /data/dalvik-cache/arm64/[email protected]@boot.oat

查看#00层代码:

$ aarch64-linux-android-objdump -D symbols/system/lib64/libbinder.so
0000000000029f6c <_ZN7android14IPCThreadState13flushCommandsEv>:
   29f6c:       f9400001        ldr     x1, [x0]
=> 29f70:       b9400822        ldr     w2, [x1,#8]
   29f74:       6b1f005f        cmp     w2, wzr
   29f78:       5400006d        b.le    29f84 <_ZN7android14IPCThreadState13flushCommandsEv+0x18>
   29f7c:       52800001        mov     w1, #0x0                        // #0
   29f80:       17ffd9ec        b       20730 <[email protected]>
   29f84:       d65f03c0        ret

x1值是x0地址中取来的,0x0000000a00000068显然不是一个合法地址。可能是x0地址被覆盖了。

查看x0附近的内存值:

memory near x0:
    000000557e9966f0 0000007f8f01e748 0000000000000000
    000000557e996700 0000000000000000 0000000000000007
    000000557e996710 0000000a00000068 0000007f0000000a
    000000557e996720 00000438234cda2a 3de6de183dc20f78
    000000557e996730 000000003d9e2680 0000000000000008
    000000557e996740 0000000000000000 0000000000000000
    000000557e996750 0000000000000000 0000000000000000
    000000557e996760 0000000000010001 0000000000000000
    000000557e996770 000000557e996840 0000000a00000068
    000000557e996780 000000550000000a 000004382647caaa
    000000557e996790 3de6de183dc20f78 000000003d9e2680
    000000557e9967a0 0000000000000000 0000000000000000
    000000557e9967b0 0000000000000000 0000000000000000
    000000557e9967c0 0000007f8e010001 0000000000000000
    000000557e9967d0 0000000000000000 000028e200000040
    000000557e9967e0 0000000a00000068 000000550000000a

发现一个明显的规律:

0x0000000a00000068出现了多次,而且间隔都是13*8=104字节,每个块的结构都很相似,很可能这个数据是个结构体数组。

 

继续分析下一个tombstone:

@[email protected]
pid: 5992, tid: 6053, name: AlarmManager  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x553d89e3c0
    x0   00000055837f4dc0  x1   0000000000000004  x2   0000000000000440  x3   0000000000020000
    x4   0000000000000444  x5   000000553d89df80  x6   0000000000000000  x7   000000558336b27c
    x8   0000000000000000  x9   000000558336b278  x10  0000000000000000  x11  0000000000000000
    x12  0000000000000000  x13  0000000000430000  x14  0000000000550000  x15  0000000000430000
    x16  0000007f7e502478  x17  0000007f7e4dbb74  x18  0000007f7b6b0470  x19  00000055837f4dc0
    x20  000000008080005c  x21  0000005583838590  x22  000000008080005c  x23  000000006fd15b08
    x24  000000008080005c  x25  000000003000003a  x26  0000000012d5fe80  x27  0000000012c47400
    x28  000000000000005c  x29  0000007f68090bd0  x30  0000007f7ea66e10
    sp   0000007f68090bd0  pc   0000007f7e4dbbc4  pstate 0000000080000000

 backtrace:
    #00 pc 0000000000030bc4  /system/lib64/libbinder.so (int android::Parcel::writeAligned<int>(int)+80)
    #01 pc 00000000000d3e0c  /system/lib64/libandroid_runtime.so
    #02 pc 0000000000109630  /data/dalvik-cache/arm64/[email protected]@boot.oat

查看#00层代码:

$ aarch64-linux-android-objdump -D symbols/system/lib64/libbinder.so
0000000000030b74 <_ZN7android6Parcel12writeAlignedIiEEiT_>:
   30b74:       a9be7bfd        stp     x29, x30, [sp,#-32]!
   30b78:       910003fd        mov     x29, sp
   30b7c:       a90153f3        stp     x19, x20, [sp,#16]
   30b80:       aa0003f3        mov     x19, x0
   30b84:       2a0103f4        mov     w20, w1
   30b88:       f9401002        ldr     x2, [x0,#32]
   30b8c:       f9400c03        ldr     x3, [x0,#24]
   30b90:       91001044        add     x4, x2, #0x4
   30b94:       eb03009f        cmp     x4, x3
   30b98:       54000109        b.ls    30bb8 <_ZN7android6Parcel12writeAlignedIiEEiT_+0x44>
   30b9c:       d2800081        mov     x1, #0x4                        // #4
   30ba0:       97ffbbe4        bl      1fb30 <[email protected]>
   30ba4:       34000080        cbz     w0, 30bb4 <_ZN7android6Parcel12writeAlignedIiEEiT_+0x40>
   30ba8:       a94153f3        ldp     x19, x20, [sp,#16]
   30bac:       a8c27bfd        ldp     x29, x30, [sp],#32
   30bb0:       d65f03c0        ret
   30bb4:       f9401262        ldr     x2, [x19,#32]
   30bb8:       f9400665        ldr     x5, [x19,#8]
   30bbc:       aa1303e0        mov     x0, x19
   30bc0:       d2800081        mov     x1, #0x4                        // #4
=> 30bc4:       b82268b4        str     w20, [x5,x2]
   30bc8:       a94153f3        ldp     x19, x20, [sp,#16]
   30bcc:       a8c27bfd        ldp     x29, x30, [sp],#32
   30bd0:       17ffbf3c        b       208c0 <[email protected]>

NE的原因是x5值非法,而x5是从x19中取出来的,查看x19值:

memory near x19:
    00000055837f4da0 0000000200000004 0000000a00000068
    00000055837f4db0 000000550000000a 0000015263c04a64
    00000055837f4dc0 3d5762703e10ca1c 000000553d89df80
    00000055837f4dd0 0000000000000440 0000000000020000
    00000055837f4de0 0000000000000440 0000000000000000
    00000055837f4df0 0000000000000000 0000000000000000
    00000055837f4e00 0000000000000000 0000000000010001
    00000055837f4e10 0000000a00000068 000000550000000a
    00000055837f4e20 0000015266bb3ae4 3d5762703e10ca1c
    00000055837f4e30 0000007f3d89df80 000000558379c020
    00000055837f4e40 000000558382ea70 0000656c62617300
    00000055837f4e50 0000000000000000 00000000000000d3
    00000055837f4e60 0000007f7eb1dc28 0000000000000000
    00000055837f4e70 0000007f7c8f2348 0000000a00000068
    00000055837f4e80 000000020000000a 0000015269b62b64
    00000055837f4e90 3d5762703e10ca1c 000000003d89df80

被破坏的现场合前面一个现场几乎相同!

后面几个就直接在tombstone里搜索0000000a00000068,发现每一个都是有0000000a00000068, 且间隔都是104字节。

@[email protected]
memory near x1:
    000000557e996a50 0000000a00000068 000000000000000a
    000000557e996a60 000004383b245e2a 3de6de183dc20f78
    000000557e996a70 000000003d9e2680 0000000000000000
    000000557e996a80 0000000000000007 00000000000f8327
    000000557e996a90 0000007f89b05070 0000000000000000
    000000557e996aa0 0000007f79a26000 0000007f00000000
    000000557e996ab0 0000000000000000 0000000a00000068
    000000557e996ac0 000000000000000a 000004383e1f4eaa
    000000557e996ad0 3de6de183dc20f78 000000003d9e2680
    000000557e996ae0 0000000012d94a90 0000000000000000
    000000557e996af0 0000007f79a24000 0000000000103000
    000000557e996b00 0000000000000000 0000000000000000
    000000557e996b10 0000000000000000 0000000000000000
    000000557e996b20 0000000a00000068 000000000000000a
    000000557e996b30 00000438411a3f2a 3de6de183dc20f78
    000000557e996b40 000000553d9e2680 000000557e915720
@[email protected]
memory near x9:
    000000558380c828 000000007614b620 7461003b72656e65
    000000558380c838 0000000000000000 000000558380d530
    000000558380c848 0000000a00000068 000000550000000a
    000000558380c858 0000015d3d53dc64 3d5762703e10ca1c
    000000558380c868 000000003d89df80 0000007f7c8f1fd8
    000000558380c878 000000020000008d 0000000200000004
    000000558380c888 00000000753a770c 0000000000000000
    000000558380c898 0000000000000234 0000000000000000
    000000558380c8a8 0000000000000000 0000000000000000
    000000558380c8b8 0000000000000000 0000007f7c7f9860
    000000558380c8c8 0000000000000101 00000000753a770c
    000000558380c8d8 0000000000000000 0000000000000234
    000000558380c8e8 0000000000000000 0000000000000000
    000000558380c8f8 0000000000000000 000000558337a320
    000000558380c908 0000000000000001 00000000d6000025
    000000558380c918 0000000000000000 0000000000000000
@[email protected]
memory near x19:
    0000005594c52190 3d7e9cb83e33b9be 000000003d567500
    0000005594c521a0 0000000000000000 0000000000000000
    0000005594c521b0 0000000100000000 0000000000000000
    0000005594c521c0 0000000000000000 0000000000000160
    0000005594c521d0 0000000000000000 0000007f9d76b470
    0000005594c521e0 0000000a00000068 000000110000000a
    0000005594c521f0 000001d7e54f0ade 3d7e9cb83e33b9be
    0000005594c52200 000000003d567500 0000000000000000
    0000005594c52210 0000000000000000 0000005594bc6700
    0000005594c52220 0000000000000000 0000000000000000
    0000005594c52230 0000000000000081 0000005500000000
    0000005594c52240 0000007f9e6baf60 0000000a00000068
    0000005594c52250 0000007f0000000a 000001d7e849fb5e
    0000005594c52260 3d7e9cb83e33b9be 0000007f3d567500
    0000005594c52270 0000007f9e6baf00 0000000000110000
    0000005594c52280 0000000000000000 0000000000000000
@[email protected]
memory near x0:
    0000005599644de0 0000000a00000068 000000550000000a
    0000005599644df0 0000066c78c6a111 3d8b65883e0ed844
    0000005599644e00 0000007f3d798d00 0000005599648310
    0000005599644e10 0000005599644ca8 0000005599644cd8
    0000005599644e20 0000000400000004 420ba3d700000000
    0000005599644e30 40c333333c2f4f0e 0000000300000000
    0000005599644e40 0000000000000000 0000000a00000068
    0000005599644e50 000000550000000a 0000066c7bc19191
    0000005599644e60 3d8b65883e0ed844 0000007f3d798d00
    0000005599644e70 0000000000000080 0000000000000053
    0000005599644e80 0000000000000051 0000000000000046
    0000005599644e90 0000005599644ed0 0000007f97c18000
    0000005599644ea0 0000000000000000 0000007f97c18000
    0000005599644eb0 0000000a00000068 000000010000000a
    0000005599644ec0 0000066c7ebc8211 3d8b65883e0ed844
    0000005599644ed0 61642f613d798d00 6361632d6b69766c

至此基本上能确定就是这个大小为104字节,带有0x0000000a00000068这个pattern的结构体数组覆盖正常内存导致的。

下一步就是要确定这个0x0000000a00000068属于哪个结构体。

 

从tombstone的maps数据中可以知道每次出现问题的地址都是堆内存,因此覆盖和被覆盖的内存也应该都是malloc出来的。

@[email protected]
    ...
    00000055996ba000-0000005599d24fff rw-  6729728  [heap]
    ...

 

现在我们可以用hook工具hook free函数,然后再free时判断内存里是否有0000000a00000068,如果有这个pattern就打印调用栈。

代码如下:

#if defined(__aarch64__)
extern "C" void free(void* p);
extern "C" void inject_free(void* p) {
    if(p != NULL) {
        size_t* head = (size_t*)((char*)p-sizeof(size_t));    //这里通过heap chunk的head信息得出指针的大小
        size_t size = *head & ~7,i=0;

        int* p_int = (int*)p;
        while(i < size/sizeof(int)) {
            if(*(p_int+i)== 0x00000068 && *(p_int+i+1)==0x0000000a ) {    //这里判断内存中是否有pattern
                LOGD("size=%llu",size);
                size_t j = 0;
                while(j < size/sizeof(int)) {
                    LOGD("p_int[%d]=%08x",j,*(p_int+j));
                    j++;
                }
                dump_java_stack();
                 dump_native_stack();
            }
            i++;
        }
    }
    free(p);  //这里再调用真正的free
}
#endif

打开12306复现问题,发现退出应用时,有如下日志:

# logcat |grep INJECT
D/INJECT  (  912): size=102352
D/INJECT  (  912): p_int[0]=00000068
D/INJECT  (  912): p_int[1]=0000000a
D/INJECT  (  912): p_int[2]=0000000a
D/INJECT  (  912): p_int[3]=00000055
D/INJECT  (  912): p_int[4]=0eaa916c
D/INJECT  (  912): p_int[5]=0000493a
D/INJECT  (  912): p_int[6]=bdaadb00
D/INJECT  (  912): p_int[7]=bcd10f00
D/INJECT  (  912): p_int[8]=be057140
D/INJECT  (  912): p_int[9]=00000000
...
D/INJECT  (  912): p_int[25586]=00007041
D/INJECT  (  912): p_int[25587]=00000000
D/INJECT  (  912): #00 pc 0000000000000a24  /system/lib64/libinjectapis.so (inject_free+236)
D/INJECT  (  912): #01 pc 000000000000fc28  /system/lib64/libsensorservice.so
D/INJECT  (  912): #02 pc 000000000000fcf4  /system/lib64/libsensorservice.so
D/INJECT  (  912): #03 pc 00000000000141b4  /system/lib64/libutils.so (android::RefBase::decStrong(void const*) const+560)
D/INJECT  (  912): #04 pc 0000000000029858  /system/lib64/libbinder.so (android::IPCThreadState::processPendingDerefs()+140)
D/INJECT  (  912): #05 pc 000000000002a734  /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+68)
D/INJECT  (  912): #06 pc 0000000000031d68  /system/lib64/libbinder.so
D/INJECT  (  912): #07 pc 00000000000179c0  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+188)
D/INJECT  (  912): #08 pc 000000000009277c  /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+96)
D/INJECT  (  912): #09 pc 0000000000017224  /system/lib64/libutils.so
D/INJECT  (  912): #10 pc 000000000001cbb0  /system/lib64/libc.so (__pthread_start(void*)+52)
D/INJECT  (  912): #11 pc 0000000000019044  /system/lib64/libc.so (__start_thread+16)

看来是抓到这个罪魁祸首了!

用addr2line看看是哪个文件哪一行:

$ aarch64-linux-android-addr2line -f -e symbols/system/lib64/libsensorservice.so fc28
_ZN7android13SensorService21SensorEventConnectionD1Ev
/home/mi/disk/2-v6-l-hermes-dev/frameworks/native/services/sensorservice/SensorService.cpp:1021
@frameworks/native/services/sensorservice/SensorService.cpp
SensorService::SensorEventConnection::
~SensorEventConnection() { ALOGD_IF(DEBUG_CONNECTIONS, "~SensorEventConnection(%p)", this); mService->cleanupConnection(this); if (mEventCache != NULL) { delete mEventCache; } }

就是delete mEventCache语句释放带有0000000a00000068的内存,看看它是被创建的代码:

@frameworks/native/services/sensorservice/SensorService.cpp

status_t SensorService::SensorEventConnection::sendEvents(
        sensors_event_t const* buffer, size_t numEvents,
        sensors_event_t* scratch,
        SensorEventConnection const * const * mapFlushEventsToConnections) {
            ...
            mEventCache = new sensors_event_t[mMaxCacheSize];

这个mEventCache确实是个结构体数组指针。这个new sensors_event_t的定义如下:

@hardware/libhardware/include/hardware/sensors.h

typedef struct sensors_event_t {
    /* must be sizeof(struct sensors_event_t) */
    int32_t version;
    /* sensor identifier */
    int32_t sensor;
    /* sensor type */
    int32_t type;
    /* reserved */
    int32_t reserved0;
    /* time is in nanosecond */
    int64_t timestamp;
    union {
        ...
    }
    /* Reserved flags for internal use. Set to zero. */
    uint32_t flags;
     uint32_t reserved1[3];
} sensors_event_t;

0x0000000a00000068中的68就是version,也就是这个结构体的大小,0x68=104字节,完全符合pattern。

接下来就是排查mEventCache,首先排除野指针,因为复现问题前并没有打印free的log,那只能是内存越界了。

再SensorService.cpp中所有能写mEventCache的地方加了log,发现复现问题时也没有相关log打印出来。

又在原生机器试了一下,发现原生没有这个问题,所以可能不是framework的SensorService代码问题。

 sensors_event_t是hardware的头文件里定义的,由次可以判断sensors_event_t这个数据应该是hardware层给framework的buffer里写的。

有是只有MTk机器上出现的问题,所以问题很可能是出在hardware层。

 

在vendor下搜索sensors_event_t,会发现有很多sensor的实现,不太好定位问题:

vendor$ cgrep sensors_event_t
./mediatek/proprietary/hardware/sensor/InPocket.h:45:    sensors_event_t mPendingEvent;
./mediatek/proprietary/hardware/sensor/InPocket.h:50:    virtual int readEvents(sensors_event_t* data, int count);
./mediatek/proprietary/hardware/sensor/Activity.cpp:45:    mPendingEvent.version = sizeof(sensors_event_t);
./mediatek/proprietary/hardware/sensor/Activity.cpp:225:int ActivitySensor::readEvents(sensors_event_t* data, int count)
./mediatek/proprietary/hardware/sensor/Shake.cpp:46:    mPendingEvent.version = sizeof(sensors_event_t);
./mediatek/proprietary/hardware/sensor/Shake.cpp:216:int ShakeSensor::readEvents(sensors_event_t* data, int count)
./mediatek/proprietary/hardware/sensor/PickUp.h:45:    sensors_event_t mPendingEvent;
./mediatek/proprietary/hardware/sensor/PickUp.h:50:    virtual int readEvents(sensors_event_t* data, int count);
./mediatek/proprietary/hardware/sensor/Hwmsen.h:96:    sensors_event_t mPendingEvents[numSensors];
...
./mediatek/proprietary/hardware/sensor/PickUp.cpp:215:int PickUpSensor::readEvents(sensors_event_t* data, int count)
./mediatek/proprietary/hardware/sensor/GlanceGesture.cpp:47:    mPendingEvent.version = sizeof(sensors_event_t);
./mediatek/proprietary/hardware/sensor/GlanceGesture.cpp:208:int GlanceGestureSensor::readEvents(sensors_event_t* data, int count)
./mediatek/proprietary/hardware/sensor/FaceDown.cpp:44:    mPendingEvent.version = sizeof(sensors_event_t);
./mediatek/proprietary/hardware/sensor/FaceDown.cpp:213:int FaceDownSensor::readEvents(sensors_event_t* data, int count)
./mediatek/proprietary/hardware/sensor/AmbienteLight.cpp:62:    mPendingEvent.version = sizeof(sensors_event_t);
./mediatek/proprietary/hardware/sensor/AmbienteLight.cpp:250:int AmbiLightSensor::readEvents(sensors_event_t* data, int count)
./mediatek/proprietary/hardware/sensor/HeartRate.h:44:    sensors_event_t mPendingEvent;
./mediatek/proprietary/hardware/sensor/HeartRate.h:49:    virtual int readEvents(sensors_event_t* data, int count);
./mediatek/proprietary/hardware/sensor/Proximity.h:64:    sensors_event_t mPendingEvent;
./mediatek/proprietary/hardware/sensor/Proximity.h:69:    virtual int readEvents(sensors_event_t* data, int count);
./mediatek/proprietary/hardware/sensor/GlanceGesture.h:45:    sensors_event_t mPendingEvent;
./mediatek/proprietary/hardware/sensor/GlanceGesture.h:50:    virtual int readEvents(sensors_event_t* data, int count);
./mediatek/proprietary/hardware/sensor/Proximity.cpp:64:    mPendingEvent.version = sizeof(sensors_event_t);
./mediatek/proprietary/hardware/sensor/Proximity.cpp:253:int ProximitySensor::readEvents(sensors_event_t* data, int count)

0x0000000a00000068中的0xa是sensors_event_t结构中的sensor的值,这个看起来是sensor ID,这些ID是在device下定义的。

@device/mediatek/common/kernel-headers/linux/hwmsensor.h
#define SENSOR_TYPE_LINEAR_ACCELERATION 10
#define SENSOR_TYPE_ROTATION_VECTOR     11    
...
#define SENSOR_TYPE_SIGNIFICANT_MOTION  17
#define SENSOR_TYPE_STEP_DETECTOR       18
...
#define ID_BASE                         0
...
#define ID_ACCELEROMETER               (ID_BASE+SENSOR_TYPE_ACCELEROMETER-1)
#define ID_LINEAR_ACCELERATION         (ID_BASE+SENSOR_TYPE_LINEAR_ACCELERATION-1)
#define ID_ROTATION_VECTOR             (ID_BASE+SENSOR_TYPE_ROTATION_VECTOR-1)            
#define ID_GRAVITY                     (ID_BASE+SENSOR_TYPE_GRAVITY-1)
#define ID_GYROSCOPE                   (ID_BASE+SENSOR_TYPE_GYROSCOPE-1)

这个ID是ID_ROTATION_VECTOR,这样可以进一步缩小范围

vendor$ cgrep ID_ROTATION_VECTOR
./mediatek/proprietary/hardware/sensor/nusensors.cpp:145:            case ID_ROTATION_VECTOR:
./mediatek/proprietary/hardware/sensor/hwmsen_chip_info.c:766:            .handle     = ID_ROTATION_VECTOR+ID_OFFSET,
./mediatek/proprietary/hardware/sensor/BstSensor.cpp:207:        case ID_ROTATION_VECTOR:
./mediatek/proprietary/hardware/sensor/BstSensor.cpp:265:        case ID_ROTATION_VECTOR:
./mediatek/proprietary/hardware/sensor/BstSensor.cpp:352:            case ID_ROTATION_VECTOR:
./mediatek/proprietary/hardware/sensor/BstSensor.cpp:411:            case ID_ROTATION_VECTOR:

这里由于对sensors_event_t不熟悉,所以后面对ID的判断其实是很虚的,所以没有继续往下跟。决定直接上coredump。

由于此时已经找到了稳定的复现路径,所以抓取coredump也容易了,一共抓了5组coredump。

对每一份做了内存分析,发现都是大量的数据被覆盖,至少是几百K,

而且看起来出问题的时候还再写(出问题时结束位置——104字节对齐不明显),起始位置也没有堆chunk的head。

其中有一个对应的tombstone很可疑:

pid: 15788, tid: 15810, name: SensorService  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xa00000080
backtrace:
    #00 pc 00000000000096a0  /system/lib64/hw/sensors.mt6795.so (sensors_poll_context_t::pollEvents(sensors_event_t*, int)+400)
    #01 pc 000000000000a474  /system/lib64/libsensorservice.so
    #02 pc 0000000000010eb0  /system/lib64/libsensorservice.so
    #03 pc 00000000000179c0  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+188)
    #04 pc 000000000009277c  /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+96)
    #05 pc 0000000000017224  /system/lib64/libutils.so
    #06 pc 000000000001cbb0  /system/lib64/libc.so (__pthread_start(void*)+52)
    #07 pc 0000000000019044  /system/lib64/libc.so (__start_thread+16)

这个backtrace #00的sensors.mt6795.so库就是sensor hardware的代码,而当前还是确实也在操作sensors_event_t结构的指针。

加载该tombstone对应的coredump:

(gdb) core core-SensorService-15788

(gdb) bt
#0  0x0000007f6a79d6a0 in sensors_poll_context_t::pollEvents (this=0x5575a43b20, data=0x5575a60c78, count=-653) at vendor/mediatek/proprietary/hardware/sensor/nusensors.cpp:470
#1  0x0000007f6a7e6478 in android::SensorDevice::poll ([email protected]=0x5575a863a0, buffer=0x5575a49b30, [email protected]=256) at frameworks/native/services/sensorservice/SensorDevice.cpp:123
#2  0x0000007f6a7eceb4 in android::SensorService::threadLoop (this=0x5575a43880) at frameworks/native/services/sensorservice/SensorService.cpp:409
#3  0x0000007f7f99c9c4 in android::Thread::_threadLoop ([email protected]=0x5575a438a0) at system/core/libutils/Threads.cpp:776
#4  0x0000007f7fc9f780 in android::AndroidRuntime::javaThreadShell (args=<optimized out>) at frameworks/base/core/jni/AndroidRuntime.cpp:1258
#5  0x0000007f7f99c228 in thread_data_t::trampoline (t=<optimized out>) at system/core/libutils/Threads.cpp:101
#6  0x0000007f7fe56bb4 in __pthread_start (arg=0x7f7ba4e000, [email protected]=<error reading variable: value has been optimized out>) at bionic/libc/bionic/pthread_create.cpp:141
#7  0x0000007f7fe53048 in __start_thread (fn=<optimized out>, arg=<optimized out>) at bionic/libc/bionic/clone.cpp:41
#8  0x0000000000000000 in ?? ()

backtrace中,#0里的count是-653,而#1的count值是256。

也 就是说android::SensorDevice::poll()调用sensors_poll_context_t::pollEvents() 时,count还是256,但在 sensors_poll_context_t::pollEvents()做处理时变成了负数。

一般count不应该是负数,所以很可能 sensors_poll_context_t::pollEvents()的逻辑有问题,

加上这个函数又是上面列出的重点怀疑的文件nusensors.cpp中定义的。所以这里出问题的可能性很大。

@vendor/mediatek/proprietary/hardware/sensor/nusensors.cpp

int sensors_poll_context_t::pollEvents(sensors_event_t* data, int count)
{
    int nbEvents = 0;
    int n = 0;
    //ALOGE("pollEvents count =%d",count );
    do {
        // see if we have some leftover from the last poll()
        for (int i=0 ; count && i<numSensorDrivers ; i++) {
            SensorBase* const sensor(mSensors[i]);
            if ((mPollFds[i].revents & POLLIN) || (sensor->hasPendingEvents())) {
                int nb = sensor->readEvents(data, count);
                ...
                //if(nb < 0||nb > count)
                    //ALOGE("pollEvents count error nb:%d, count:%d, nbEvents:%d", nb, count, nbEvents);//for sensor NE debug
                count -= nb;
                nbEvents += nb;
                data += nb;
                //if(nb < 0||nb > count)
                //    ALOGE("pollEvents count error nb:%d, count:%d, nbEvents:%d", nb, count, nbEvents);//for sensor NE debug
            }
        }
        ...
    } while (n && count);
    return nbEvents;
}

关键是下面这一句:

nb = sensor->readEvents(data, count);

这句话的意思应该是往data指向的地址中填写event,count是要读取的event个数,nb返回的是已读取的event个数。

这种方法类似于read()函数的,while循环能保证读取到count个数的event。

而这时如果nb返回异常呢?当然,从gdb中可以确定nb确实异常了。

然而我们sensors_poll_context_t::pollEvents()中没有对这个异常值做任何的保护,所以就会出问题。

当count为负数时sensors_poll_context_t::pollEvents()的while循环很难会退出,所以越界覆盖的buffer也应该很长。

当覆盖的内存刚好被别的线程访问,那个线程就会crash,而此时应该有线程在执行sensors_poll_context_t::pollEvents()函数。

 

再看看其他core:

(gdb) core core-InputReader-32381

(gdb) bt
#0  art::Mutex::ExclusiveLock ([email protected]=0xa00000068, [email protected]=0x5597cc1f80) at art/runtime/base/mutex.cc:311
#1  0x0000007fafb410e0 in MutexLock (mu=..., self=0x5597cc1f80, this=<synthetic pointer>) at art/runtime/base/mutex.h:423
#2  art::gc::allocator::RosAlloc::AllocFromRun (this=0x5597c1efe0, [email protected]=0x5597cc1f80, [email protected]=20, bytes_allocated=0x75f8c7f8, [email protected]=0x7f9c42f088)
    at art/runtime/gc/allocator/rosalloc.cc:672
#3  0x0000007fafd57650 in Alloc<true> (bytes_allocated=0x7f9c42f088, size=20, self=0x5597cc1f80, this=<optimized out>) at art/runtime/gc/allocator/rosalloc-inl.h:33
#4  AllocCommon<true> (usable_size=0x7f9c42f080, bytes_allocated=0x7f9c42f078, num_bytes=20, self=0x5597cc1f80, this=<optimized out>) at art/runtime/gc/space/rosalloc_space-inl.h:57
#5  AllocNonvirtual (usable_size=0x7f9c42f080, bytes_allocated=0x7f9c42f078, num_bytes=20, self=0x5597cc1f80, this=<optimized out>) at art/runtime/gc/space/rosalloc_space.h:71
#6  TryToAllocate<false, false> (usable_size=0x7f9c42f080, bytes_allocated=0x7f9c42f078, alloc_size=20, allocator_type=art::gc::kAllocatorTypeRosAlloc, self=0x5597cc1f80, this=<optimized out>)
    at art/runtime/gc/heap-inl.h:208
#7  AllocObjectWithAllocator<false, false, art::VoidFunctor> (pre_fence_visitor=..., allocator=art::gc::kAllocatorTypeRosAlloc, byte_count=20, klass=0x6fce4650, self=0x5597cc1f80, this=<optimized out>)
    at art/runtime/gc/heap-inl.h:80
#8  Alloc<false, false> (allocator_type=art::gc::kAllocatorTypeRosAlloc, self=0x5597cc1f80, this=<optimized out>) at art/runtime/mirror/class-inl.h:561
#9  AllocObjectFromCodeInitialized<false> (method=<optimized out>, allocator_type=art::gc::kAllocatorTypeRosAlloc, self=0x5597cc1f80, klass=<optimized out>)
    at art/runtime/entrypoints/entrypoint_utils-inl.h:170

上面这个是被覆盖的线程,可以用info thread查看同一时刻其他线程的状态:

(gdb) info thread
  Id   Target Id         Frame
  100  LWP 1525          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  99   LWP 32456         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  98   LWP 1586          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  97   LWP 581           nanosleep () at bionic/libc/arch-arm64/syscalls/nanosleep.S:7
  96   LWP 2156          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  95   LWP 32465         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  94   LWP 32464         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  ...
  21   LWP 32483         __accept4 () at bionic/libc/arch-arm64/syscalls/__accept4.S:7
  20   LWP 32444         recvmsg () at bionic/libc/arch-arm64/syscalls/recvmsg.S:7
  19   LWP 32403         sensors_poll_context_t::pollEvents (this=0x5597c20eb0, data=0x5597c21ca0, count=-402) at vendor/mediatek/proprietary/hardware/sensor/nusensors.cpp:470
  18   LWP 32413         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  ...
  5    LWP 1106          __ppoll () at bionic/libc/arch-arm64/syscalls/__ppoll.S:7
  4    LWP 923           __ioctl () at bionic/libc/arch-arm64/syscalls/__ioctl.S:7
  3    LWP 32441         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  2    LWP 32467         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
* 1    LWP 32442         art::Mutex::ExclusiveLock ([email protected]=0xa00000068, [email protected]=0x5597cc1f80) at art/runtime/base/mutex.cc:311

上面19号线程可以看出,它正在执行sensors_poll_context_t::pollEvents(),同时count值也是负数。

 

再看一个:

(gdb) core  core-SensorEventAckR-15999

(gdb) bt
#0  android::Looper::pollInner (this=0x558c9e0bd0, timeoutMillis=<optimized out>) at system/core/libutils/Looper.cpp:286
#1  0x0000007f9e1cc884 in android::Looper::pollOnce (this=0x558c9e0bd0, [email protected]=-1, [email protected]=0x0, [email protected]=0x0, [email protected]=0x0)
    at system/core/libutils/Looper.cpp:191
#2  0x0000007f89014708 in pollOnce (timeoutMillis=-1, this=<optimized out>) at system/core/include/utils/Looper.h:265
#3  android::SensorService::SensorEventAckReceiver::threadLoop (this=0x7f88fbf9e8) at frameworks/native/services/sensorservice/SensorService.cpp:559
#4  0x0000007f9e1c89c4 in android::Thread::_threadLoop ([email protected]=0x7f88fbf9e8) at system/core/libutils/Threads.cpp:776
#5  0x0000007f9e4cb780 in android::AndroidRuntime::javaThreadShell (args=<optimized out>) at frameworks/base/core/jni/AndroidRuntime.cpp:1258
#6  0x0000007f9e1c8228 in thread_data_t::trampoline (t=<optimized out>) at system/core/libutils/Threads.cpp:101
#7  0x0000007f9e682bb4 in __pthread_start (arg=0x7f95f99000, [email protected]=<error reading variable: value has been optimized out>) at bionic/libc/bionic/pthread_create.cpp:141
#8  0x0000007f9e67f048 in __start_thread (fn=<optimized out>, arg=<optimized out>) at bionic/libc/bionic/clone.cpp:41
#9  0x0000000000000000 in ?? ()
(gdb) info thread
  Id   Target Id         Frame
  101  LWP 17316         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  100  LWP 17588         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  99   LWP 16075         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  98   LWP 16080         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  97   LWP 20823         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  96   LWP 16082         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  95   LWP 19034         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  ...
  11   LWP 16036         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  10   LWP 16392         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  9    LWP 16021         0x0000007f88fc96a0 in sensors_poll_context_t::pollEvents (this=0x558c5cbe40, data=0x558ca1fcc0, count=-3472) at vendor/mediatek/proprietary/hardware/sensor/nusensors.cpp:470
  8    LWP 16107         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  7    LWP 16031         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  6    LWP 16034         0x0000007f9e738104 in ?? ()
  5    LWP 16104         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  4    LWP 16061         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  3    LWP 16060         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  2    LWP 16085         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
* 1    LWP 16024         android::Looper::pollInner (this=0x558c9e0bd0, timeoutMillis=<optimized out>) at system/core/libutils/Looper.cpp:286

加入debug code:

int sensors_poll_context_t::pollEvents(sensors_event_t* data, int count)
{
    int nbEvents = 0;
    int n = 0;
    //ALOGE("pollEvents count =%d",count );do {
        // see if we have some leftover from the last poll()
        for (int i=0 ; count && i<numSensorDrivers ; i++) {
            SensorBase* const sensor(mSensors[i]);
            if ((mPollFds[i].revents & POLLIN) || (sensor->hasPendingEvents())) {
                int nb = sensor->readEvents(data, count);
                if(nb < 0 || nb > count) {
                    nb += *(int*)0;    //运行到这里时,触发native crash
                }
                ...

重新抓取coredump:

(gdb) bt
#0  sensors_poll_context_t::pollEvents (this=0x55827f7610, data=0x5582c7b170, count=256) at vendor/mediatek/proprietary/hardware/sensor/nusensors.cpp:474
#1  0x0000007f92352478 in android::SensorDevice::poll (this=this@entry=0x5582be0ee0, buffer=0x5582c7b170, [email protected]=256) at frameworks/native/services/sensorservice/SensorDevice.cpp:123
#2  0x0000007f92358eb4 in android::SensorService::threadLoop (this=0x55827f7370) at frameworks/native/services/sensorservice/SensorService.cpp:409
#3  0x0000007fa75089c4 in android::Thread::_threadLoop ([email protected]=0x55827f7390) at system/core/libutils/Threads.cpp:776
#4  0x0000007fa780b780 in android::AndroidRuntime::javaThreadShell (args=<optimized out>) at frameworks/base/core/jni/AndroidRuntime.cpp:1258
#5  0x0000007fa7508228 in thread_data_t::trampoline (t=<optimized out>) at system/core/libutils/Threads.cpp:101
#6  0x0000007fa79c2bb4 in __pthread_start (arg=0x7fa35b9000, [email protected]=<error reading variable: value has been optimized out>) at bionic/libc/bionic/pthread_create.cpp:141
#7  0x0000007fa79bf048 in __start_thread (fn=<optimized out>, arg=<optimized out>) at bionic/libc/bionic/clone.cpp:41
#8  0x0000000000000000 in ?? ()

查看变量:

(gdb) info locals
nb = 2235
sensor = <optimized out>
i = 5
nbEvents = 0
n = 1

nb值(2235)确实比count值(256)大!

再看看是哪个sensor:

(gdb) p mSensors

$2 = {0x5582c2db80, 0x55827f80c0, 0x5582c22dd0, 0x5582c23ec0, 0x5582c27fa0, 0x5582be02b0, 0x5582c268b0, 0x5582c251c0, 0x5582c296a0, 0x5582c2ada0, 0x5582c2c490, 0x5582c30560, 0x5582c33340, 0x5582c31c50,

  0x5582c70a60, 0x5582c71b40, 0x5582c73230, 0x5582c74920, 0x5582c76010, 0x5582c77700, 0x5582c78df0}

i值是5,那sensor就是0x5582be02b0了。

(gdb) p *(SensorBase*) 0x5582be02b0

$3 = {_vptr.SensorBase = 0x7f923317c0 <vtable for BstSensor+16>, dev_name = 0x0, data_name = 0x7f9231c698 "/data/misc/sensor/fifo_dat", dev_fd = -1, data_fd = 40}

找到虚函数表指针为0x7f923317c0,看下虚函数表中的内容:

(gdb) x /16gx 0x7f923317c0
0x7f923317c0 <_ZTV9BstSensor+16>:    0x0000007f92317e58    0x0000007f92317e94
0x7f923317d0 <_ZTV9BstSensor+32>:    0x0000007f923181a4    0x0000007f92309ab8
0x7f923317e0 <_ZTV9BstSensor+48>:    0x0000007f92309ab0    0x0000007f92317eb8
0x7f923317f0 <_ZTV9BstSensor+64>:    0x0000007f92318020    0x0000007f92317e18
0x7f92331800 <_ZTV9BstSensor+80>:    0x0000007f92317e20    0x0000000000000001
0x7f92331810:    0x000000000000283c    0x0000000000000001
0x7f92331820:    0x0000000000002844    0x0000000000000001
0x7f92331830:    0x0000000000002851    0x0000000000000001

可以知道,这个sensor类是BstSensor。又挨个试得知对应的readEvents为0x0000007f923181a4。

(gdb) disassemble 0x0000007f923181a4
Dump of assembler code for function BstSensor::readEvents(sensors_event_t*, int):
   0x0000007f923181a4 <+0>:    stp    x29, x30, [sp,#-176]!
   0x0000007f923181a8 <+4>:    cmp    w2, wzr
   0x0000007f923181ac <+8>:    mov    x29, sp
   0x0000007f923181b0 <+12>:    stp    x19, x20, [sp,#16]

看源码:

@vendor/mediatek/proprietary/hardware/sensor/BstSensor.cpp

int BstSensor::readEvents(sensors_event_t *pdata, int count)
{
        ...while (rslt < count) {
 
                err = read(data_fd, &sensor_data, sizeof(sensor_data));
                ...
                delta_mod= (float)(time-bst_last_ts[index])/(float)(bst_delay[index]);
 
                /* if delta_mode > x.5, loopcount = x+1 */
                loopcount = (int)(delta_mod+0.5);
                ...for(int i=1; i<=loopcount; i++) {
 
                    pdata_cur->version = sizeof(*pdata_cur);
                    pdata_cur->sensor = sensor;
                    pdata_cur->timestamp = time - (loopcount-i)*bst_delay[index];
 
                    switch (sensor) {
                    ...case ID_ROTATION_VECTOR:
                            pdata_cur->data[0] = sensor_data.data.data[0];
                            pdata_cur->data[1] = sensor_data.data.data[1];
                            pdata_cur->data[2] = sensor_data.data.data[2];
                            pdata_cur->data[3] = sensor_data.data.data[3];
                            pdata_cur->data[4] = sensor_data.data.data[4];
                            pdata_cur->type = SENSOR_TYPE_ROTATION_VECTOR;
                            break;
                    default:
                            ALOGE("<BST> " "Invalid data pkt");
                            return rslt;
                    }
 
                    rslt++;
                    pdata_cur++;
                }
                bst_last_ts[index] = time;
        }
 
        return rslt;
}

关键是这个loopcount大于传入buffer的大小count的时候,返回值有可能大于count。这里可能需要做保护!

 

【解决方案】

diff --git a/proprietary/hardware/sensor/BstSensor.cpp b/proprietary/hardware/sensor/BstSensor.cpp
index cc21b37..ea84bea 100755
--- a/proprietary/hardware/sensor/BstSensor.cpp
+++ b/proprietary/hardware/sensor/BstSensor.cpp
@@ -320,7 +320,9 @@ int BstSensor::readEvents(sensors_event_t *pdata, int count)
         sensors_event_t *pdata_cur;
         int loopcount;
         float delta_mod;
+        int count_bk;
 
+        count_bk = count;
         if (count > 1) {
                 count = 1;
         }
@@ -388,7 +390,10 @@ int BstSensor::readEvents(sensors_event_t *pdata, int count)
                         bst_last_ts[index] = time;
                         loopcount = 1;
                 }
-
+                if(loopcount >=count_bk)
+                {
+                        loopcount = count_bk;
+                }
                 for(int i=1; i<=loopcount; i++) {

 

以上是关于MTK Sensor越界导致的系统重启问题分析报告的主要内容,如果未能解决你的问题,请参考以下文章

MTK平台Sensor Bring Up

MTK

高通sensor库和Linker的死锁问题分析报告

MTK 平台上如何给 camera 添加一种 preview size

mtk camera faq

I2C协议和驱动框架分析