Solidity内嵌汇编学习

Posted 2023-02-28 MateZero

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Solidity内嵌汇编学习相关的知识，希望对你有一定的参考价值。

很多人在学习Solidity时会跳过内嵌汇编这一章，当然我也不例外。但随着我们相关开发的深入，有时会无法避免和内嵌汇编打交道。这时，攻克内嵌汇编也许是一种更好的选择。凡事俱怕认真二字，当我们认真研究后，一些乍一看比较难的问题就慢慢的不难了。

我们今天以Solidity 0.8.7官方文档为例，学习内嵌汇编的第一个简单示例：GetCode.sol。

下面先看官方文档中的源码：

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.4.16 <0.9.0;

library GetCode 
    function at(address _addr) public view returns (bytes memory o_code) 
        assembly 
            // retrieve the size of the code, this needs assembly
            let size := extcodesize(_addr)
            // allocate output byte array - this could also be done without assembly
            // by using o_code = new bytes(size)
            o_code := mload(0x40)
            // new "memory end" including padding
            mstore(0x40, add(o_code, and(add(add(size, 0x20), 0x1f), not(0x1f))))
            // store length in memory
            mstore(o_code, size)
            // actually retrieve the code, this needs assembly
            extcodecopy(_addr, add(o_code, 0x20), 0, size)

该库的功能是获取另一个地址的代码并保存在一个bytes变量中。

由于本次学习涉及到了动态数据类型bytes，因此需要稍微知道一下Solidity的内存分布，我们这里直接贴出官方文档的相关内容：

Solidity reserves four 32-byte slots, with specific byte ranges (inclusive of endpoints) being used as follows:

0x00 - 0x3f (64 bytes): scratch space for hashing methods
0x40 - 0x5f (32 bytes): currently allocated memory size (aka. free memory pointer)
0x60 - 0x7f (32 bytes): zero slot

Scratch space can be used between statements (i.e. within inline assembly). The zero slot is used as initial value for dynamic memory arrays and should never be written to (the free memory pointer points to 0x80 initially).

这里我们只要知道，自由内存指针的值是存在固定的以0x40起始的一个字节里。其值初始大小为0x80，也就是我们分配内存从0x80地址开始。

上面的库函数不方便测试，因此，借助hardhat，我们改写了一下方便测试:

pragma solidity ^0.8.0;
import "hardhat/console.sol";

contract A 

contract GetCode 
    function getCodeTest(address _addr) public view returns(bytes memory o_code) 
        uint pointer;
        uint length;
        bytes32 value1;
        bytes32 value2;
        assembly 
            // retrieve the size of the code, this needs assembly
            let size := extcodesize(_addr)
            // allocate output byte array - this could also be done without assembly
            // by using o_code = new bytes(size)
            o_code := mload(0x40)   //0x80
            // new "memory end" including padding
            // and(add(add(size, 0x20), 0x1f), not(0x1f)) = trunc((code_size + 32 + 32 - 1) / 32) * 32
            mstore(0x40, add(o_code, and(add(add(size, 0x20), 0x1f), not(0x1f)))) 
            // store length in memory
            mstore(o_code, size)
            // actually retrieve the code, this needs assembly
            extcodecopy(_addr, add(o_code, 0x20), 0, size)
        
        assembly 
            pointer := mload(0x40)
            length := mload(0x80)
            value1 := mload(0xa0)
            value2 := mload(0xc0)
        
        /**  0x40  => 0xe0  (0x80 + 0x60) //自由内存指针
        *    0x60 => zero slot
             0x80 => 63 // 长度前缀
             0xa0 =>  // 前半部分 0x6080604052600080fdfea2646970667358221220ad1bbad09d41f2213b969ef0
             0xc0 =>  // 后半部分 0x728767ee6ac4b4ed5af6a01c4511fa370f5e8c6d64736f6c6343000804003300
             0xe0 => 新内存的的起点
        *
         */
        console.log("pointer :%s",pointer);
        console.log("length :%s",length);
        console.logBytes32(value1);
        console.logBytes32(value2);

下面来具体学习一下内嵌汇编中的操作

读取外部地址的代码大小，注意是以字节为单位的，本例中为合约A的代码大小，为63.
读取自由内存指针指向的位置，此例为0x80，注意mload代表从某地址开始读取32字节。那么我们为什么不直接用0x80而还要读一次呢。因为有的时候会进行其它内存分配操作或者函数参数中包含了memory数据等，此其值就不是0x80了。本例是刚好没有相关操作或者数据，所以才为初始值0x80。因此这里一定要用mload(0x40)的方法获取。
因为bytes在内存保存时会有一个长度前缀（32字节），所以需要将获取的size加上32 再对 32取整（取能包含它的最小的32整数倍）。本例中 63 + 32 = 95。我们口算一下，需要32 * 3 = 96字节才能保存变量o_code。这里，相应的计算公式为：trunc((code_size + 32 + 32 - 1) / 32) * 32，转化为对应的内嵌操作就为and(add(add(size, 0x20), 0x1f), not(0x1f))。这时我们得到96。
接下为，将旧指针地址与新计算的大小（96）相加，得到新的指针地址，并保存在0x40开始的一个字节中，这里使用的是mstore。
原指针地址开始存放o_code，首先是长度前缀，所以先保存长度到一个字节中。
从原地址进行代码复制，因为长度前缀占用了一个字节，所以这里是从add(o_code, 0x20)进行保存。extcodecopy的函数说明请阅读官方文档中的介绍。

这里我们重点讲一下 and(add(add(size, 0x20), 0x1f), not(0x1f))操作。这里分两步看：

该操作的功能。这里是为了得到包含指定大小的最小的能被32整除的数字。这个很好理解，假定我们大小为95字节（包含了长度前缀），那么我们需要多少个字节(solidity中，通常以一个word,32字节为操作单位，所以必须是32的整数倍）才能保存它呢。很显然，我们都知道是96。但是怎么计算呢？我们如果使用javascript去实现，应该为
Math.ceil(v/32) * 32。但是Solidity中除法为地板除，因此计算方式为 Math.floor((v + 31)/32) * 32 ，也就是公式： y = (x + 31 ) /32 * 32。这样，当x刚好能被32整除时，得到的结果便是x，如果有任何余数，那么得到的结果会是一个比x大的最小的32的倍数。
至于为什么这里 + 31，是为了有余数时地板除总能+1。如果你+30，那么余数为1的时候便有问题。
为什么会有and和not操作。接着上面的公式来 y = (x + 31 ) /32 * 32。这里我们假定z = x + 31，那么可以简化为 y = z /32 * 32。而我们知道，在Solidity中，unit 除以2就是右移一位，除于32就是右移5位，相应的乘于32就是左移5位。那么一个uint先右移5位再左移5位，会得到什么结果呢，会导致它的低五位全部清零。我们举一个简单的例子：z = 0xFF = 0b11111111。那么它先右移5位，得到z = 0b111，再左移5位，得到z = 0b11100000，相当于把它的低五位清除了。因此，我们只要把z的低五位清除就能得到 z /32 * 32 的效果。那么清除某一位最快捷的方式是将该位与0 相与（and），其它位与1相与（保留）。于是我们只要z and 111...11100000相与就可以了。可以看到not(0x1f)正是前面所有的位为1，后面5位为0的数。所以该公式进行了优化，得到了y = and(z ,not(0x1f)), 将先除后乘变成了直接清除后五位。

接下来我们增加的内嵌汇编操作是打印出相应的值进行验证。分别为:

自由指针地址：这里为0x80 + 0x60 = 0xe0。(初始值0x80 + 96）
代码长度：从长度前缀word(32字节）中获取，这里是63
value1，代码的第一部分，也就是前32字节。
value2，代码的第二部分，也就是后31字节 + 补0

最后我们打印出相应的值进行验证。

我们的单元测试文件为

const  ethers  = require("hardhat");

describe("GetCode", () => 
  it("GetCode Test", async () => 
    const A = await ethers.getContractFactory("A");
    const a = await A.deploy();
    await a.deployed();
    const GetCode = await ethers.getContractFactory("GetCode");
    const instance = await GetCode.deploy();
    await instance.deployed();
    const result = await instance.getCodeTest(a.address);
    console.log();
    console.log(result);
  );
);

运行单元测试，我们得到类似结果：

Compiled 1 Solidity file successfully


  GetCode
pointer :224
length :63
0x6080604052600080fdfea2646970667358221220e5457b554ed9901ed12a8d40
0xc71d05b4591105f6c4f1304ca9e68525d329e35664736f6c6343000804003300

0x6080604052600080fdfea2646970667358221220e5457b554ed9901ed12a8d40c71d05b4591105f6c4f1304ca9e68525d329e35664736f6c63430008040033
    ✔ GetCode Test (916ms)


  1 passing (918ms)

可以看到，我们的结果是和打印出的值相符的。

好了，今天的学习就到这里结了。重点是Solidity内存分配，bytes类型的变量在内存中的保存（保存的是一个起始地址，因为包含有长度前缀，真正内容是从起始地址加32字节开始的），重置自由指针地址（否则有可能读到污染数据）。

由于水平有限，难免有错误之处，恳请读者批评指正。

以上是关于Solidity内嵌汇编学习的主要内容，如果未能解决你的问题，请参考以下文章

Solidity内嵌汇编学习

Solidity汇编开发简明教程

Ok6410裸机驱动学习C语言内嵌汇编