深入理解Dalvik虚拟机- 解释器的运行机制

Posted threepigs

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了深入理解Dalvik虚拟机- 解释器的运行机制相关的知识,希望对你有一定的参考价值。

Dalvik的指令执行是解释器+JIT的方式,解释器就是虚拟机来对Javac编译出来的字节码,做译码、执行,而不是转化成CPU的指令集,由CPU来做译码,执行。可想而知,解释器的效率是相对较低的,所以出现了JIT(Just In Time),JIT是将执行次数较多的函数,做即时编译,在运行时刻,编译成本地目标代码,JIT可以看成是解释器的一个补充优化。再之后又出现了Art虚拟机的AOT(Ahead Of Time)模式,做静态编译,在Apk安装的时候就会做字节码的编译,从而效率直逼静态语言。

Java所有的方法都是类方法,因此Dalvik的字节码执行就两种,一是类的Method,包括静态和非静态,两者的差距也就是有没有this参数,二就是类的初始化代码,就是类加载的时候,成员变量的初始化以及显式的类初始化块代码。

其中类的初始化代码在dalvik/vm/oo/Class.cpp的dvmInitClass:
bool dvmInitClass(ClassObject* clazz)
{
    ...
    dvmLockObject(self, (Object*) clazz);
    ...
    android_atomic_release_store(CLASS_INITIALIZING,
                                 (int32_t*)(void*)&clazz->status);
    dvmUnlockObject(self, (Object*) clazz);
    ...
    initSFields(clazz);

    /* Execute any static initialization code.
     */
    method = dvmFindDirectMethodByDescriptor(clazz, "<clinit>", "()V");
    if (method == NULL) {
        LOGVV("No <clinit> found for %s", clazz->descriptor);
    } else {
        LOGVV("Invoking %s.<clinit>", clazz->descriptor);
        JValue unused;
        dvmCallMethod(self, method, NULL, &unused);
    }
    ...
}

从代码可见,类初始化的主要代码逻辑包括:

    类对象加锁,所以类的加载是单线程的

    初始化static成员(initSFields)

    调用<cinit>,静态初始化块

类的初始化块代码在<cinit>的成员函数里。可见Dalvik的字节码解释,本质上还是类成员函数的解释执行。

虚拟机以Method作为解释器的执行单元,其入口就统一为dvmCallMethod,该函数的定义在dalvik/vm/interp/Stack.cpp里。

void dvmCallMethod(Thread* self, const Method* method, Object* obj,
    JValue* pResult, ...)
{
    va_list args;
    va_start(args, pResult);
    dvmCallMethodV(self, method, obj, false, pResult, args);
    va_end(args);
}

void dvmCallMethodV(Thread* self, const Method* method, Object* obj,
    bool fromJni, JValue* pResult, va_list args)
{
   ...
    if (dvmIsNativeMethod(method)) {
        TRACE_METHOD_ENTER(self, method);
        /*
         * Because we leave no space for local variables, "curFrame" points
         * directly at the method arguments.
         */
        (*method->nativeFunc)((u4*)self->interpSave.curFrame, pResult,
                              method, self);
        TRACE_METHOD_EXIT(self, method);
    } else {
        dvmInterpret(self, method, pResult);
    }
   …
}

Java的Method有native函数和非native函数,native的函数的代码段是在so里,是本地指令集而非虚拟机的字节码。

虚拟机以Method作为解释器的执行单元,其入口就统一为dvmCallMethod,该函数的定义在dalvik/vm/interp/Stack.cpp里。

void dvmCallMethod(Thread* self, const Method* method, Object* obj,
    JValue* pResult, ...)
{
    va_list args;
    va_start(args, pResult);
    dvmCallMethodV(self, method, obj, false, pResult, args);
    va_end(args);
}

void dvmCallMethodV(Thread* self, const Method* method, Object* obj,
    bool fromJni, JValue* pResult, va_list args)
{
   ...
	    if (dvmIsNativeMethod(method)) {
        TRACE_METHOD_ENTER(self, method);
        /*
         * Because we leave no space for local variables, "curFrame" points
         * directly at the method arguments.
         */
        (*method->nativeFunc)((u4*)self->interpSave.curFrame, pResult,
                              method, self);
        TRACE_METHOD_EXIT(self, method);
    } else {
        dvmInterpret(self, method, pResult);
 }
   …
}

如果method是个native的函数,那么就直接调用nativeFunc这个函数指针,否则就调用dvmInterpret代码,dvmInterpret就是解释器的入口。

如果把Dalvik函数执行的调用栈画出来,我们会更清楚整个流程。

public class HelloWorld {

    public int foo(int i, int j){
        int k = i + j;
        return k;
    }

    public static void main(String[] args) {
        System.out.print(new HelloWorld().foo(1, 2));
    }
}



Dalvik虚拟机有两个栈,一个Java栈,一个是VM的native栈,vm的栈是OS的函数调用栈,Java的栈则是由VM管理的栈,每次在dvmCallMethod的时候,在Method执行之前,会调用dvmPushInterpFrame(java→java)或者dvmPushJNIFrame(java→native),JNI的Frame比InterpFrame少了局部变量的栈空间,native函数的局部变量是在vm的native栈里,由OS负责压栈出栈。DvmCallMethod结束的时候会调用dvmPopFrame做Java Stack的出栈。

所以Java Method的执行就是dvmInterpret函数对这个Method的字节码做解析,函数的实参与局部变量都在Java的Stack里获取。SaveBlock是StackSaveArea数据结构,里面包含了当前函数对应的栈信息,包括返回地址等。而Native  Method的执行就是Method的nativeFunc的执行,实参和局部变量都是在VM的native stack里。

Method的nativeFunc是native函数的入口,dalvik虚拟机上的java 的函数hook技术,都是通过改变Method的属性,SET_METHOD_FLAG(method, ACC_NATIVE),伪装成native函数,再设置nativeFunc作为钩子函数,从而实现hook功能。很显然,hook了的method不再具有多态性。

nativeFunc的默认函数是dvmResolveNativeMethod(vm/Native.cpp)


void dvmResolveNativeMethod(const u4* args, JValue* pResult,
    const Method* method, Thread* self)
{
    ClassObject* clazz = method->clazz;

    /*
     * If this is a static method, it could be called before the class
     * has been initialized.
     */
    if (dvmIsStaticMethod(method)) {
        if (!dvmIsClassInitialized(clazz) && !dvmInitClass(clazz)) {
            assert(dvmCheckException(dvmThreadSelf()));
            return;
        }
    } else {
        assert(dvmIsClassInitialized(clazz) ||
               dvmIsClassInitializing(clazz));
    }

    /* start with our internal-native methods */
    DalvikNativeFunc infunc = dvmLookupInternalNativeMethod(method);
    if (infunc != NULL) {
        /* resolution always gets the same answer, so no race here */
        IF_LOGVV() {
            char* desc = dexProtoCopyMethodDescriptor(&method->prototype);
            LOGVV("+++ resolved native %s.%s %s, invoking",
                clazz->descriptor, method->name, desc);
            free(desc);
        }
        if (dvmIsSynchronizedMethod(method)) {
            ALOGE("ERROR: internal-native can't be declared 'synchronized'");
            ALOGE("Failing on %s.%s", method->clazz->descriptor, method->name);
            dvmAbort();     // harsh, but this is VM-internal problem
        }
        DalvikBridgeFunc dfunc = (DalvikBridgeFunc) infunc;
        dvmSetNativeFunc((Method*) method, dfunc, NULL);
        dfunc(args, pResult, method, self);
        return;
    }

    /* now scan any DLLs we have loaded for JNI signatures */
    void* func = lookupSharedLibMethod(method);
    if (func != NULL) {
        /* found it, point it at the JNI bridge and then call it */
        dvmUseJNIBridge((Method*) method, func);
        (*method->nativeFunc)(args, pResult, method, self);
        return;
    }

    IF_ALOGW() {
        char* desc = dexProtoCopyMethodDescriptor(&method->prototype);
        ALOGW("No implementation found for native %s.%s:%s",
            clazz->descriptor, method->name, desc);
        free(desc);
    }

    dvmThrowUnsatisfiedLinkError("Native method not found", method);
}

dvmResolveNativeMethod首先会调用dvmLookupInternalNativeMethod查询这个函数是否预置的函数,主要是查下面的函数集:

static DalvikNativeClass gDvmNativeMethodSet[] = {
    { "Ljava/lang/Object;",               dvm_java_lang_Object, 0 },
    { "Ljava/lang/Class;",                dvm_java_lang_Class, 0 },
    { "Ljava/lang/Double;",               dvm_java_lang_Double, 0 },
    { "Ljava/lang/Float;",                dvm_java_lang_Float, 0 },
    { "Ljava/lang/Math;",                 dvm_java_lang_Math, 0 },
    { "Ljava/lang/Runtime;",              dvm_java_lang_Runtime, 0 },
    { "Ljava/lang/String;",               dvm_java_lang_String, 0 },
    { "Ljava/lang/System;",               dvm_java_lang_System, 0 },
    { "Ljava/lang/Throwable;",            dvm_java_lang_Throwable, 0 },
    { "Ljava/lang/VMClassLoader;",        dvm_java_lang_VMClassLoader, 0 },
    { "Ljava/lang/VMThread;",             dvm_java_lang_VMThread, 0 },
    { "Ljava/lang/reflect/AccessibleObject;",
            dvm_java_lang_reflect_AccessibleObject, 0 },
    { "Ljava/lang/reflect/Array;",        dvm_java_lang_reflect_Array, 0 },
    { "Ljava/lang/reflect/Constructor;",
            dvm_java_lang_reflect_Constructor, 0 },
    { "Ljava/lang/reflect/Field;",        dvm_java_lang_reflect_Field, 0 },
    { "Ljava/lang/reflect/Method;",       dvm_java_lang_reflect_Method, 0 },
    { "Ljava/lang/reflect/Proxy;",        dvm_java_lang_reflect_Proxy, 0 },
    { "Ljava/util/concurrent/atomic/AtomicLong;",
            dvm_java_util_concurrent_atomic_AtomicLong, 0 },
    { "Ldalvik/bytecode/OpcodeInfo;",     dvm_dalvik_bytecode_OpcodeInfo, 0 },
    { "Ldalvik/system/VMDebug;",          dvm_dalvik_system_VMDebug, 0 },
    { "Ldalvik/system/DexFile;",          dvm_dalvik_system_DexFile, 0 },
    { "Ldalvik/system/VMRuntime;",        dvm_dalvik_system_VMRuntime, 0 },
    { "Ldalvik/system/Zygote;",           dvm_dalvik_system_Zygote, 0 },
    { "Ldalvik/system/VMStack;",          dvm_dalvik_system_VMStack, 0 },
    { "Lorg/apache/harmony/dalvik/ddmc/DdmServer;",
            dvm_org_apache_harmony_dalvik_ddmc_DdmServer, 0 },
    { "Lorg/apache/harmony/dalvik/ddmc/DdmVmInternal;",
            dvm_org_apache_harmony_dalvik_ddmc_DdmVmInternal, 0 },
    { "Lorg/apache/harmony/dalvik/NativeTestTarget;",
            dvm_org_apache_harmony_dalvik_NativeTestTarget, 0 },
    { "Lsun/misc/Unsafe;",                dvm_sun_misc_Unsafe, 0 },
    { NULL, NULL, 0 },
};

不是内置的话,就会加载so库,查询对应的native函数,查询的规则就是我们熟知的了,com.xx.Helloworld.foobar对应com_xx_Helloworld_foobar。要注意的是,这个函数并不是nativeFunc,接下来的dvmUseJNIBridge调用里,dvmCallJNIMethod会作为nativeFunc,这个函数主要需要将之前提到的java stack frame里的ins实参,转译成jni的函数调用参数。xposed/dexposed就会自己设置自己的nativeFun自己接管native函数的执行。

dvmInterpret是解释器的代码入口,代码位置在interp/Interp.cpp
void dvmInterpret(Thread* self, const Method* method, JValue* pResult)
{
    InterpSaveState interpSaveState;
    ExecutionSubModes savedSubModes;
    . . . 
    interpSaveState = self->interpSave;
    self->interpSave.prev = &interpSaveState; 
    . . . 

    self->interpSave.method = method;
    self->interpSave.curFrame = (u4*) self->interpSave.curFrame;
    self->interpSave.pc = method->insns;
    . . .
    typedef void (*Interpreter)(Thread*);
    Interpreter stdInterp;
    if (gDvm.executionMode == kExecutionModeInterpFast)
        stdInterp = dvmMterpStd;
#if defined(WITH_JIT)
    else if (gDvm.executionMode == kExecutionModeJit ||
             gDvm.executionMode == kExecutionModeNcgO0 ||
             gDvm.executionMode == kExecutionModeNcgO1)
        stdInterp = dvmMterpStd;
#endif
    else
        stdInterp = dvmInterpretPortable;

    // Call the interpreter
    (*stdInterp)(self);
    *pResult = self->interpSave.retval;

    /* Restore interpreter state from previous activation */
    self->interpSave = interpSaveState;
#if defined(WITH_JIT)
    dvmJitCalleeRestore(calleeSave);
#endif
    if (savedSubModes != kSubModeNormal) {
        dvmEnableSubMode(self, savedSubModes);
    }
}

Thread的一个很重要的field就是interpSave,是InterpSaveState类型的,里面包含了当前函数,pc,当前栈帧等重要的变量,dvmInterpret一开始调用的时候就会初始化。

Dalvik解释器有两个,一个是dvmInterpretPortable,一个是 dvmMterpStd。两者的区别在于,前者是从c++实现,后者是汇编实现。
dvmInterpretPortable是在vm/mterp/out/InterpC-portable.cpp中定义


void dvmInterpretPortable(Thread* self)
{
    . . .
    DvmDex* methodClassDex;     // curMethod->clazz->pDvmDex
    JValue retval;

    /* core state */
    const Method* curMethod;    // method we're interpreting
    const u2* pc;               // program counter
    u4* fp;                     // frame pointer
    u2 inst;                    // current instruction
    /* instruction decoding */
    u4 ref;                     // 16 or 32-bit quantity fetched directly
    u2 vsrc1, vsrc2, vdst;      // usually used for register indexes
    /* method call setup */
    const Method* methodToCall;
    bool methodCallRange;

    /* static computed goto table */
    DEFINE_GOTO_TABLE(handlerTable);
    /* copy state in */
    curMethod = self->interpSave.method;
    pc = self->interpSave.pc;
    fp = self->interpSave.curFrame;
    retval = self->interpSave.retval;   

    methodClassDex = curMethod->clazz->pDvmDex;

    . . . 
   
    FINISH(0);                  /* fetch and execute first instruction */
/*--- start of opcodes ---*/

/* File: c/OP_NOP.cpp */
HANDLE_OPCODE(OP_NOP)
    FINISH(1);
OP_END

/* File: c/OP_MOVE.cpp */
HANDLE_OPCODE(OP_MOVE /*vA, vB*/)
    vdst = INST_A(inst);
    vsrc1 = INST_B(inst);
    ILOGV("|move%s v%d,v%d %s(v%d=0x%08x)",
        (INST_INST(inst) == OP_MOVE) ? "" : "-object", vdst, vsrc1,
        kSpacing, vdst, GET_REGISTER(vsrc1));
    SET_REGISTER(vdst, GET_REGISTER(vsrc1));
    FINISH(1);
OP_END
…..
}

解释器的指令执行是通过跳转表来实现,DEFINE_GOTO_TABLE(handlerTable)定义了指令Op的goto表。
FINISH(0),则表示从第一条指令开始执行,


# define FINISH(_offset) {                                                  \\
        ADJUST_PC(_offset);                                                 \\
        inst = FETCH(0);                                                    \\
        if (self->interpBreak.ctl.subMode) {                                \\
            dvmCheckBefore(pc, fp, self);                                   \\
        }                                                                   \\
        goto *handlerTable[INST_INST(inst)];                                \\
    }

#define FETCH(_offset)     (pc[(_offset)])

FETCH(0)获得当前要执行的指令,通过查跳转表handlerTable来跳转到这条指令的执行点,就是函数后面的HANDLE_OPCODE的定义。

后者是针对不同平台做过优化的解释器。
dvmMterpStd会做汇编级的优化,dvmMterpStdRun的入口就是针对不同的平台指令集,有对应的解释器代码,比如armv7 neon对应的代码就在mterp/out/InterpAsm-armv7-a-neon.S。

dvmMterpStdRun:
#define MTERP_ENTRY1 \\
    .save {r4-r10,fp,lr}; \\
    stmfd   sp!, {r4-r10,fp,lr}         @ save 9 regs
#define MTERP_ENTRY2 \\
    .pad    #4; \\
    sub     sp, sp, #4                  @ align 64

    .fnstart
    MTERP_ENTRY1
    MTERP_ENTRY2

    /* save stack pointer, add magic word for debuggerd */
    str     sp, [r0, #offThread_bailPtr]  @ save SP for eventual return

    /* set up "named" registers, figure out entry point */
    mov     rSELF, r0                   @ set rSELF
    LOAD_PC_FP_FROM_SELF()              @ load rPC and rFP from "thread"
    ldr     rIBASE, [rSELF, #offThread_curHandlerTable] @ set rIBASE
    . . .
    /* start executing the instruction at rPC */
    FETCH_INST()                        @ load rINST from rPC
    GET_INST_OPCODE(ip)                 @ extract opcode from rINST
    GOTO_OPCODE(ip)                     @ jump to next instruction
    . . .

#define rPC     r4
#define rFP     r5
#define rSELF   r6
#define rINST   r7
#define rIBASE  r8

非jit的情况下,先是FETCH_INST把pc的指令加载到rINST寄存器,之后GET_INST_OPCODE获得操作码 and     _reg, rINST, #255,是把rINST的低16位给ip寄存器,GOTO_OPCODE跳转到对应的地址。

#define GOTO_OPCODE(_reg)       add     pc, rIBASE, _reg, lsl #6

rIBASE 指向的curHandlerTable是跳转表的首地址,GOTO_OPCODE(ip)就将pc的地址指向该指令对应的操作码所在的跳转表地址。

static Thread* allocThread(int interpStackSize)
#ifndef DVM_NO_ASM_INTERP
    thread->mainHandlerTable = dvmAsmInstructionStart;
    thread->altHandlerTable = dvmAsmAltInstructionStart;
    thread->interpBreak.ctl.curHandlerTable = thread->mainHandlerTable;
#endif

可见dvmAsmInstructionStart就是跳转表的入口,定义在dvmMterpStdRun里,
你可以在这里找到所有的Java字节码的指令对应的解释器代码。

比如new操作符对应的代码如下,先加载Thread.interpSave.methodClassDex,这是一个DvmDex指针,随后加载 DvmDex的pResClasses来查找类是否加载过,如果没加载过,那么跳转到 LOP_NEW_INSTANCE_resolve去加载类,如果加载过,就是类的初始化以及AllocObject的处理。LOP_NEW_INSTANCE_resolve就是调用clazz的dvmResolveClass加载。
/* ------------------------------ */
    .balign 64
.L_OP_NEW_INSTANCE: /* 0x22 */
/* File: armv5te/OP_NEW_INSTANCE.S */
    /*
     * Create a new instance of a class.
     */
    /* new-instance vAA, class@BBBB */
    ldr     r3, [rSELF, #offThread_methodClassDex]    @ r3<- pDvmDex
    FETCH(r1, 1)                        @ r1<- BBBB
    ldr     r3, [r3, #offDvmDex_pResClasses]    @ r3<- pDvmDex->pResClasses
    ldr     r0, [r3, r1, lsl #2]        @ r0<- resolved class
#if defined(WITH_JIT)
    add     r10, r3, r1, lsl #2         @ r10<- &resolved_class
#endif
    EXPORT_PC()                         @ req'd for init, resolve, alloc
    cmp     r0, #0                      @ already resolved?
    beq     .LOP_NEW_INSTANCE_resolve         @ no, resolve it now
.LOP_NEW_INSTANCE_resolved:   @ r0=class
    ldrb    r1, [r0, #offClassObject_status]    @ r1<- ClassStatus enum
    cmp     r1, #CLASS_INITIALIZED      @ has class been initialized?
    bne     .LOP_NEW_INSTANCE_needinit        @ no, init class now
.LOP_NEW_INSTANCE_initialized: @ r0=class
    mov     r1, #ALLOC_DONT_TRACK       @ flags for alloc call
    bl      dvmAllocObject              @ r0<- new object
    b       .LOP_NEW_INSTANCE_finish          @ continue


.LOP_NEW_INSTANCE_needinit:
    mov     r9, r0                      @ save r0
    bl      dvmInitClass                @ initialize class
    cmp     r0, #0                      @ check boolean result
    mov     r0, r9                      @ restore r0
    bne     .LOP_NEW_INSTANCE_initialized     @ success, continue
    b       common_exceptionThrown      @ failed, deal with init exception

    /*
     * Resolution required.  This is the least-likely path.
     *
     *  r1 holds BBBB
     */
.LOP_NEW_INSTANCE_resolve:
    ldr     r3, [rSELF, #offThread_method] @ r3<- self->method
    mov     r2, #0                      @ r2<- false
    ldr     r0, [r3, #offMethod_clazz]  @ r0<- method->clazz
    bl      dvmResolveClass             @ r0<- resolved ClassObject ptr
    cmp     r0, #0                      @ got null?
    bne     .LOP_NEW_INSTANCE_resolved        @ no, continue
    b       common_exceptionThrown      @ yes, handle exception


作者简介:

田力,网易彩票Android端创始人,小米视频创始人,现任roobo技术经理、视频云技术总监

欢迎关注微信公众号 磨剑石,定期推送技术心得以及源码分析等文章,谢谢