如何使用静态数组的结束指针作为循环条件比较 x86 中的地址？

Posted 2023-02-16

技术标签:

【中文标题】如何使用静态数组的结束指针作为循环条件比较 x86 中的地址？【英文标题】：How to compare addresses in x86, using the end pointer of a static array as a loop condition? 【发布时间】：2021-07-21 04:17:46 【问题描述】：

从头开始编程中的一个挑战问题是“修改程序以使用结束地址而不是数字 0 来知道何时停止。”

我发现很难做到这一点，因为到目前为止这本书只介绍了movl、cmpl、incl（以及寻址模式）和jmp 指令。基本上下面代码 sn-p 中的所有内容都是到目前为止所介绍的。我发现的所有解决方案都包含本书中尚未介绍的说明。下面的代码从集合中找出最大值。

.section .data
data_items:             #These are the data items
.long 3,67,34,222,45,75,54,34,44,33,22,11,66,0

.section .text
.globl _start
_start:
    movl $0, %edi                   # move 0 into the index register
    movl data_items(,%edi,4), %eax  # load the first byte of data
    movl %eax, %ebx                 # since this is the first item, %eax is
                                    # the biggest
start_loop:                     # start loop
    cmpl $0, %eax                   # check to see if we’ve hit the end
    je loop_exit
    incl %edi                       # load next value
    movl data_items(,%edi,4), %eax
    cmpl %ebx, %eax                 # compare values
    jle start_loop                  # jump to loop beginning if the new
                                    # one isn’t bigger
    movl %eax, %ebx                 # move the value as the largest
    jmp start_loop                  # jump to loop beginning
loop_exit:
    # %ebx is the status code for the exit system call
    # and it already has the maximum number
    movl $1, %eax   #1 is the exit() syscall
    int $0x80

请注意，此问题与要求修改程序以使用长度计数而不是数字 0 的后续问题明显不同。在我看来，数组中最后一个数字的地址应该存储在寄存器中然后与指针的地址进行比较。我想不出一个适合这本书进展的方法，因为这本书到目前为止只介绍了最基本的内容。

【问题讨论】：

它没有引入 LEA，它可以很容易地计算指向单过去元素的指针吗？ Why are loops always compiled into "do...while" style (tail jump)? 使用循环条件为 while(ptr < end_ptr) 的示例。将是 Using ending address to stop a loop 的副本（至少有一个格式正确的源版本，并声明它来自 PGU 书）。除了问答没有很好的答案，只是以错误的方式完成了硬编码的长度。他还没有将 lea 引入文本，但我在发布这个问题之前找到的大多数解决方案都涉及到这一点。接下来的问题是“修改程序以使用长度计数而不是数字 0 来知道何时停止”。这似乎与您提供的链接更相似。可能他希望你在数组末尾贴上另一个标签，并将cmp $data_end, %edi / jne .loop 作为循环条件。（使用 EDI 作为指针而不是索引。）好的，第二个链接乍一看似乎回答了我的问题，谢谢 【参考方案1】：

您可以只使用mov 和cmp 来执行此操作，而不需要lea 来计算结束指针。（无论如何，您在任何地方都没有长度可用于 LEA）。

您应该在数组末尾添加一个新标签，以便您可以引用内存中的那个位置（也就是地址）。并从数组中删除终止 0，因为我们'重新使用地址而不是标记值。

.section .data
data_items:
  .long 3,67,34,222,45,75,54,34,44,33,22,11,66     # ,0   remove the sentinel / terminator
data_items_end:                                  # and add this new label

您不需要在寄存器中使用该地址；您可以使用cmp $data_items_end, %reg 将其用作立即数，链接器将正确的字节填充到机器代码中，就像它为您的mov data_items(,%edi,4), %eax 所做的那样。（cmp symbol, %reg 将与该地址处的内存进行比较。$symbol 是直接地址，在 AT&T 语法中。）

您在寄存器中做需要的是开始地址，因此您可以递增和取消引用它。（对于一个函数需要一个指针+长度，你可以在寄存器中计算结束地址。）

_start:
    mov  $data_items, %edi       # int *ptr = &data_items[0]
    mov  (%edi), %ebx            # current max
   # setting %eax is unnecessary here, it's always written before being read in this and the original version
loop_start:
    add  $4, %edi                # ptr++  (4 byte elements)
    cmp  $data_items_end, %edi
    je   loop_exit               # if (ptr == endp) break
    ...                  # compare with (%edi) and update %ebx if greater.
    jmp  loop_start
  ...

更有效的是dowhile loop structure like compilers use，尤其是因为您知道数组包含超过 1 个元素，因此您无需检查循环体应该运行 0 次的情况。请注意，除了 cmp/jcc 之外，没有每次都必须执行的无条件 jmp。

_start:
    mov  $data_items, %edi       # int *ptr = &data_items[0]
    mov  (%edi), %ebx            # current max

loop_start:                    # do
    add  $4, %edi                # ptr++;  (4 byte elements)
  ## maybe update max:
    mov  (%edi), %eax            # tmp = *ptr;
    cmp  %ebx, %eax
    cmovg %eax, %ebx             # max = (tmp > max) ? tmp : max;
  ## end of loop body

    cmp  $data_items_end, %edi
    jne  loop_start            # while(ptr != endp)
## end of loop, but nothing jumps here so no label is needed.

    mov  $1, %eax
    int  $0x80             # SYS_exit(%ebx)

我使用了cmp/cmovg（条件移动）而不是分支，因为它键入的指令较少，并且在循环内没有分支，因此更容易查看循环结构。

循环和指针的其他例子：

Assembly Language (x86): How to create a loop to calculate Fibonacci sequence - 将指针+长度作为参数并使用 LEA 计算结束指针的函数。（x86-64 NASM 语法） How to check an "array's length" in Assembly Language (ASM), - 根据 .long 静态数组的长度定义汇编时常量，而不是在末尾放置标签。 Copying to arrays in NASM - 编写在两个数组上循环的高效循环的一些技巧，例如索引一个相对于另一个仍然只使用一个增量，但避免索引寻址模式。或者将负索引向上计数到零，因此您仍然可以在内存中向前循环，但仍然不需要单独的 cmp 指令，只需 inc / jnz。

【讨论】：

以上是关于如何使用静态数组的结束指针作为循环条件比较 x86 中的地址？的主要内容，如果未能解决你的问题，请参考以下文章