为啥 dlmalloc 分配的块头包含 4 个字节的先前分配的块 [关闭]

Posted

技术标签:

【中文标题】为啥 dlmalloc 分配的块头包含 4 个字节的先前分配的块 [关闭]【英文标题】:why dlmalloc allocated chunk header contains 4 bytes of previous allocated chunk [closed]为什么 dlmalloc 分配的块头包含 4 个字节的先前分配的块 [关闭] 【发布时间】:2016-08-04 08:37:30 【问题描述】:

我正在使用称为 doug lea 内存分配器的动态内存分配器,它使用最适合的方法在堆上分配内存。该算法是更多其他算法的基础,但我发现在分配块的情况下,该块的标头包含前一个块的最后 4 个字节的数据。我检查了算法解释,但找不到原因。我想知道分配 4 个字节的前一个块的目的是什么? 我还想出了一个解释,因为 .dtors 部分在其他块中的分配用于同步和正确使用空间,但想知道细节。

this is the figure of chunks of dlmalloc algorithm

上图包含已分配块和空闲块的结构。在空闲块中,前 4 个字节包含前一个块的大小,但在分配块中,前四个字节包含 前一个分配块的最后四个字节的用户数据,这对我来说似乎有点混乱我想知道在当前块中仅分配先前分配的块的四个字节的目的是什么。

【问题讨论】:

您能否发布更多详细信息,特别是已分配块的内存地址、标头地址(可能是分配块后面的几个字节)以及标头的十六进制转储?我最初会怀疑是对齐问题,而不是 dlmalloc 被破坏。 (在为structs 的(数组)分配空间时,请务必让sizeof 运算符确定所需的大小,而不是事后猜测它(例如,“手动”对组件的总大小求和) ").) 【参考方案1】:

是的,块确实重叠。曾几何时,内存非常昂贵。 这是 dlmalloc、ptmalloc 和 glibc malloc 中的一个特性。

代码中有相当不错的解释:

This struct declaration is misleading (but accurate and necessary).
It declares a "view" into memory allowing access to necessary
fields at known offsets from a given base. See explanation below.

struct malloc_chunk 

 INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */
 INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */

 struct malloc_chunk* fd;         /* double links -- used only if free. */
 struct malloc_chunk* bk;
;

malloc_chunk 详细信息:

(The following includes lightly edited explanations by Colin Plumb.)

Chunks of memory are maintained using a `boundary tag' method as
described in e.g., Knuth or Standish.  (See the paper by Paul
Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
survey of such techniques.)  Sizes of free chunks are stored both
in the front of each chunk and at the end.  This makes
consolidating fragmented chunks into bigger chunks very fast.  The
size fields also hold bits representing whether chunks are free or
in use.

An allocated chunk looks like this:


 chunk->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Size of previous chunk, if allocated            | |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Size of chunk, in bytes                         |P|
  mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             User data starts here...                          .
        .                                                               .
        .             (malloc_usable_space() bytes)                     .
        .                                                               |
next  ->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Size of chunk                                     |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


Where "chunk" is the front of the chunk for the purpose of most of
the malloc code, but "mem" is the pointer that is returned to the
user.  "Nextchunk" is the beginning of the next contiguous chunk.

Chunks always begin on even word boundries, so the mem portion
(which is returned to the user) is also on an even word boundary, and
thus at least double-word aligned.

Free chunks are stored in circular doubly-linked lists, and look like this:

chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Size of previous chunk                            |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
`head:' |             Size of chunk, in bytes                         |P|
  mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Forward pointer to next chunk in list             |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Back pointer to previous chunk in list            |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Unused space (may be 0 bytes long)                .
        .                                                               .
        .                                                               |
 next-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
`foot:' |             Size of chunk, in bytes                           |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The P (PREV_INUSE) bit, stored in the unused low-order bit of the
chunk size (which is always a multiple of two words), is an in-use
bit for the *previous* chunk.  If that bit is *clear*, then the
word before the current chunk size contains the previous chunk
size, and can be used to find the front of the previous chunk.
The very first chunk allocated always has this bit set,
preventing access to non-existent (or non-owned) memory. If
prev_inuse is set for any given chunk, then you CANNOT determine
the size of the previous chunk, and might even get a memory
addressing fault when trying to do so.

Note that the `foot' of the current chunk is actually represented
as the prev_size of the NEXT chunk. This makes it easier to
deal with alignments etc but can be very confusing when trying
to extend or adapt this code.

The two exceptions to all this are

 1. The special chunk `top' doesn't bother using the
    trailing size field since there is no next contiguous chunk
    that would have to index off it. After initialization, `top'
    is forced to always exist.  If it would become less than
    MINSIZE bytes long, it is replenished.

 2. Chunks allocated via mmap, which have the second-lowest-order
    bit (IS_MMAPPED) set in their size fields.  Because they are
    allocated one-by-one, each must contain its own trailing size field.

【讨论】:

不,块重叠,这与节省内存无关。 malloc_chunk 结构指向前一个块的最后一个单词,纯粹而简单,因为它是在 C 中表示边界标记结构的最简单的方法。【参考方案2】:

我没有专门研究过dlmalloc,但这里有一个可能的解释:

在具有需要 16 字节对齐的对象的架构上(英特尔 SSE 也是如此),返回的地址必须是 16 的倍数。如果标头有 12 字节的信息,包含块的大小,以及一些链接信息到将块与前一个块合并,标头可能被定义为长度为 16,前四个字节用于前一个分配块的结尾。如果这个前一个块是空闲的,这个空间可以被分配器用于优化。

【讨论】:

以上是关于为啥 dlmalloc 分配的块头包含 4 个字节的先前分配的块 [关闭]的主要内容,如果未能解决你的问题,请参考以下文章

PoW挖矿算法原理及其在比特币以太坊中的实现

dlmalloc和jemalloc内存分配流程总结

为啥函数参数在 x86 上占用至少 4 个字节的堆栈?

为啥 R 中的逻辑(布尔值)需要 4 个字节?

dlmalloc 2.8.6 源代码具体解释

Boost.Container `dlmalloc` 和 `jemalloc`