JVM深层系列「官方技术翻译」《A FIRST LOOK INTO ZGC》初探JVM-ZGC垃圾回收器

Posted 浩宇の天尚

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了JVM深层系列「官方技术翻译」《A FIRST LOOK INTO ZGC》初探JVM-ZGC垃圾回收器相关的知识,希望对你有一定的参考价值。

ZGC introduction(ZGC的介绍)

原文

ZGC is a new garbage collector recently open-sourced by Oracle for the OpenJDK. It was mainly written by Per Liden.

ZGC is similar to Shenandoah or Azul’s C4 that focus on reducing pause-times while still compacting the heap. Although I won’t give a full introduction here, “compacting the heap” just means moving the still-alive objects to the start (or some other region) of the heap. This helps to reduce fragmentation but usually this also means that the whole application (that includes all of its threads) needs to be halted while the GC does its magic, this is usually referred to as stopping the world. Only when the GC is finished, the application can be resumed. In GC literature the application is often called mutator, since from the GC’s point of view the application mutates the heap. Depending on the size of the heap such a pause could take several seconds, which could be quite problematic for interactive applications.


译文

ZGC是一个新兴的垃圾回收器,刚刚被Oracle官方在OpenJDK的方式进行开源传播,它主要是被Per Liden进行开发完成的。

ZGC与ShenandoahGC、Azul’s C4的垃圾回收器比较像,他们主要的功能特性集中于在减少GC的时候所产生的停顿的时间并且还可以进行压缩堆内存阶段,尽管在这里我们无法完整的介绍他,“整理或者压缩heap内存”它仅仅代表着会进行移动那些还存活的对象到其他的region区域的堆内存中。这种方式将会大大减少内存碎片,但是带来的代价就是,它会进行停止中断整个应用程序(包括了所有的虚拟机栈的线程),整个过程通常会指定为我们所熟知的“STW”,当只有GC完成后,所有的线程和应用线程将会被恢复执行。在GC的技术文化中,为了区别GC的线程,用户的应用线程经常会被称之为“Mutator线程”,在GC回收器的角度而言,应用程序会改变堆内存中对象的分布。根据堆的大小,这种暂停可能需要几秒钟的时间,这对于频繁注重用户交互式应用程序来说是一个很大的问题。


ZGC Technical options(ZGC技术选项)

原文

There are several ways to reduce pause times:

  • The GC can employ multiple threads while compacting (parallel compaction).
  • Compaction work can also be split across multiple pauses (incremental compaction).
  • Compact the heap concurrently to the running application without stopping it (or just for a short time) (concurrent compaction).
  • No compaction of the heap at all (an approach taken by e.g. Go’s GC).

ZGC uses concurrent compaction to keep pauses to a minimum, this is certainly not obvious to implement so I want to describe how this works. Why is this complicated?

  • You need to copy an object to another memory address, at the same time another thread could read from or write into the old object.
  • If copying succeeded there might still be arbitrary many references somewhere in the heap to the old object address that need to be updated to the new address.

I should also mention that although concurrent compaction seems to be the best solution to reduce pause time of the alternatives given above, there are definitely some tradeoffs involved. So if you don’t care about pause times, you might be better off using a GC that focuses on throughput instead.

译文

这里将会有几种比较重要的方式进行减少暂停的时间。

  1. GC回收器可以在压缩阶段,进行(压缩)时使用多个线程。
  2. 压缩整理内存的工作也可以分为多个暂停小的阶段(增量压缩算法)。
  3. 压缩整理堆内存的时候和应用线程处于并发处理阶段,它机会不会暂停或者暂停非常短的时间(并发整理)
  4. 放弃整理内存的阶段,压根不压缩堆内存,会产生碎片(例如Go的GC采用的方法)

ZGC使用并发压缩将应用线程暂停的时间降低到最小,这里面的工作阶段和实现方式非常的复杂,所以接下来我们开始介绍一下他的工作流程和特性。为什么会这么复杂?

当发生GC回收对象的时候,以及进行“标记复制和标记整理的时候”,肯定会将一个对象复制到另一个内存地址,同时应用线程也可以完成对这个对象的读取或写入。

虽然大家都提到说“并发压缩”似乎是上述备选方案中,减少暂停时间的最佳解决方案,但还是要伴随实际场景和实际情况进行取舍才行。因此,如果您不关心GC暂停时间或者用户体验度,那么最好使用关注吞吐量的GC。


GC barriers(GC的屏障机制)

原文

The key to understanding how ZGC does concurrent compaction is the load barrier (often called read barrier in GC literature). Although I have an own section about ZGC’s load-barrier, I want to give a short overview since not all readers might be familiar with them. If a GC has load-barriers, the GC needs to do some additional action when reading a reference from the heap. Basically in Java this happens every time you see some code like obj.field. A GC could also need a write/store-barrier for operations like obj.field = value. Both operations are special since they read from or write into the heap. The names are a bit confusing, but GC barriers are different from memory barriers used in CPUs or compilers.

Both reading and writing in the heap is extremely common, so both GC-barriers need to be super efficient. That means just a few assembly instructions in the common case. Read barriers are an order of magnitude more likely than write-barriers (although this can certainly vary depending on the application), so read-barriers are even more performance-sensitive. Generational GC’s for example usually get by with just a write barrier, no read barrier needed. ZGC needs a read barrier but no write barrier. For concurrent compaction I haven’t seen a solution without read barriers.

Another factor to consider: Even if a GC needs some type of barrier, they might “only” be required when reading or writing references in the heap. Reading or writing primitives like int or double might not require the barrier.

译文

理解ZGC回收器,它是如何进行“并发压缩”的关键是加载屏障(在GC文献中通常称为读取屏障)虽然“我”(Per Liden)有一个 [dinfuehr实现屏障]关于ZGC的加载屏障,在此做一个简短的概述,因为并非所有开发者都熟悉它们。而对于GC的加载障碍,那么GC在从堆读取引用时需要执行一些额外的操作,就像我们的AOP切面进行前置拦截方法,在方法之前之前进行执行一些固定的操作。

在我们平时开发Java程序的时候,当你看到像“obj.field”这样的代码时,都会调用产生‘load barrier’的操作控制。

此外GC还需要一个‘write/store barrier’,用于像’obj.field=value’这样的操作,这两种操作都是比较特殊的,因为它们从堆中读取或写入,此外GC屏障不同于CPU或编译器中使用的内存屏障。

堆内存中的对象的读取和写入都非常常见,因此这两种GC屏障都需要非常高效。这意味着在普通情况下只需要一些汇编指令。读屏障比写屏障的可能性大一个数量级(尽管这当然会因应用程序而异),因此读屏障对性能更为敏感。例如,分代GC通常只需要一个写屏障,而不需要读屏障。例如,ZGC需要读屏障,但不需要写屏障。对于并发压缩,我还没有看到没有读取障碍的解决方案。

另一个需要考虑的因素是:即使GC需要某种类型的屏障,它们也可能“仅”在读取或写入堆中的引用时需要。读或写诸如’int’或’double’之类的原语可能不需要障碍


Reference coloring(染色指针)

原文

The key to understanding ZGC is reference coloring. ZGC stores additional metadata in heap references. On x64 a reference is 64-bit wide (ZGC doesn’t support compressed oops or class pointers at the moment), but today’s hardware actually limits a reference to 48-bit for virtual memory addresses. Although to be exact only 47-bit, since bit 47 determines the value of bits 48-63 (for our purpose those bits are always 0).

ZGC reserves the first 42-bits for the actual address of the object (referenced to as offset in the source code). 42-bit addresses give you a theoretical heap limitation of 4TB in ZGC. The remaining bits are used for these flags: finalizable, remapped, marked1 and marked0 (one bit is reserved for future use). There is a really nice ASCII drawing in ZGC’s source that shows all these bits:

 6                 4 4 4  4 4                                             0
 3                 7 6 5  2 1                                             0
+-------------------+-+----+-----------------------------------------------+
|00000000 00000000 0|0|1111|11 11111111 11111111 11111111 11111111 11111111|
+-------------------+-+----+-----------------------------------------------+
|                   | |    |
|                   | |    * 41-0 Object Offset (42-bits, 4TB address space)
|                   | |
|                   | * 45-42 Metadata Bits (4-bits)  0001 = Marked0
|                   |                                 0010 = Marked1
|                   |                                 0100 = Remapped
|                   |                                 1000 = Finalizable
|                   |
|                   * 46-46 Unused (1-bit, always zero)
|
* 63-47 Fixed (17-bits, always zero)

Having metadata information in heap references does make dereferencing more expensive, since the address needs to be masked to get the real address (without metainformation). ZGC employs a nice trick to avoid this: When reading from memory exactly one bit of marked0, marked1 or remapped is set. When allocating a page at offset x, ZGC maps the same page to 3 different address:

  1. for marked0: (0b0001 << 42) | x
  2. for marked1: (0b0010 << 42) | x
  3. for remapped: (0b0100 << 42) | x

ZGC therefore just reserves 16TB of address space (but not actually uses all of this memory) starting at address 4TB. Here is another nice drawing from ZGC’s source:

  +--------------------------------+ 0x0000140000000000 (20TB)
  |         Remapped View          |
  +--------------------------------+ 0x0000100000000000 (16TB)
  |     (Reserved, but unused)     |
  +--------------------------------+ 0x00000c0000000000 (12TB)
  |         Marked1 View           |
  +--------------------------------+ 0x0000080000000000 (8TB)
  |         Marked0 View           |
  +--------------------------------+ 0x0000040000000000 (4TB)

At any point of time only one of these 3 views is in use. So for debugging the unused views can be unmapped to better verify correctness.

译文

要想理解ZGC的关键是染色指针的技术实现。ZGC内部存储额外的元数据在堆引用中(Card Table、RSet等)。在x64的服务器上,它的引用指针是64位宽的(ZGC目前不支持压缩oops或类指针:Klass pointer),但今天的硬件实际上将虚拟内存地址的引用限制为48位,虽然准确地说只有47位,但从48-63位的值总是0,如下面的模型所示。

ZGC保留对象实际地址的前42位(在源代码中被称为offset)。42位地址在ZGC中为我们提供了4TB的理论堆限制。其余的位用于这些标志:finalizableremappedmarked1marked0(保留一位供将来使用)。ZGC的source中有一个非常好的ASCII绘图,显示所有这些位:

 6                 4 4 4  4 4                                             0
 3                 7 6 5  2 1                                             0
+-------------------+-+----+-----------------------------------------------+
|00000000 00000000 0|0|1111|11 11111111 11111111 11111111 11111111 11111111|
+-------------------+-+----+-----------------------------------------------+
|                   | |    |
|                   | |    * 41-0 Object Offset (42-bits, 4TB address space)
|                   | |
|                   | * 45-42 Metadata Bits (4-bits)  0001 = Marked0
|                   |                                 0010 = Marked1
|                   |                                 0100 = Remapped
|                   |                                 1000 = Finalizable
|                   |
|                   * 46-46 Unused (1-bit, always zero)
|
* 63-47 Fixed (17-bits, always zero)

在堆引用中包含元数据信息确实会使清除引用的成本更高,因为需要屏蔽地址才能获得真实地址(没有元信息)。ZGC使用了一个很好的技巧来避免这种情况:当从内存中读取’marked0’的一位时,设置’marked1’或’remaped’。在偏移量’x’处分配页面时,ZGC将同一页面映射到3个不同的地址。

  1. for marked0: (0b0001 << 42) | x
  2. for marked1: (0b0010 << 42) | x
  3. for remapped: (0b0100 << 42) | x

因此,ZGC仅从地址4TB开始保留16TB的地址空间(但实际上并不使用所有的内存)。

  +--------------------------------+ 0x0000140000000000 (20TB)
  |         Remapped View          |
  +--------------------------------+ 0x0000100000000000 (16TB)
  |     (Reserved, but unused)     |
  +--------------------------------+ 0x00000c0000000000 (12TB)
  |         Marked1 View           |
  +--------------------------------+ 0x0000080000000000 (8TB)
  |         Marked0 View           |
  +--------------------------------+ 0x0000040000000000 (4TB)

Pages & Physical & Virtual Memory(内存页&物理内存&虚拟内存)

原文

Shenandoah separates the heap into a large number of equally-sized regions. An object usually does not span multiple regions, except for large objects that do not fit into a single region. Those large objects need to be allocated in multiple contiguous regions. I quite like this approach because it is so simple.

ZGC is quite similar to Shenandoah in this regard. In ZGC’s parlance regions are called pages. The major difference to Shenandoah: Pages in ZGC can have different sizes (but always a multiple of 2MB on x64). There are 3 different page types in ZGC: small (2MB size), medium (32MB size) and large (some multiple of 2MB). Small objects (up to 256KB size) are allocated in small pages, medium-sized objects (up to 4MB) are allocated in medium pages. Objects larger than 4MB are allocated in large pages. Large pages can only store exactly one object, in constrast to small or medium pages. Somewhat confusingly large pages can actually be smaller than medium pages (e.g. for a large object with a size of 6MB).

Another nice property of ZGC is, that it also differentiates between physical and virtual memory. The idea behind this is that there usually is plenty of virtual memory available (always 4TB in ZGC) while physical memory is more scarce. Physical memory can be expanded up to the maximum heap size (set with -Xmx for the JVM), so this tends to be much less than the 4 TB of virtual memory. Allocating a page of a certain size in ZGC means allocating both physical and virtual memory. With ZGC the physical memory doesn’t need to be contiguous - only the virtual memory space. So why is this actually a nice property?

Allocating a contiguous range of virtual memory should be easy, since we usually have more than enough of it. But it is quite easy to imagine a situation where we have 3 free pages with size 2MB somewhere in the physical memory, but we need 6MB of contiguous memory for a large object allocation. There is enough free physical memory but unfortunately this memory is non-contiguous. ZGC is able to map this non-contiguous physical pages to a single contiguous virtual memory space. If this wasn’t possible, we would have run out of memory.

On Linux the physical memory is basically an anonymous file that is only stored in RAM (and not on disk), ZGC uses memfd_create to create it. The file can then be extended with ftruncate, ZGC is allowed to extend the physical memory (= the anonymous file) up to the maximum heap size. Physical memory is then mmaped into the virtual address space.

译文

Shenandoah将堆划分为大量大小相同的区域。对象通常不跨越多个区域,但不适合单个区域的大型对象除外。这些大型对象需要分配到多个连续区域中。我非常喜欢这种方法,因为它非常简单。

ZGC与Shenandoah非常相似,按照ZGC的说法,这些区域称为内存页与Shenandoah的主要区别是:ZGC中的页面可以有不同的大小(但在x64上总是2MB的倍数)。ZGC中有3种不同的页面类型:(2MB大小)、(32MB大小)和(2MB的一些倍数)。

小对象(最大256KB)在小页面中分配,中等大小的对象(最大4MB)在中等页面中分配,大于4MB的对象分配在大页面中。与小型或中型内存页面相比,大型页面只能存储一个对象。有些令人困惑的是,大页面实际上可能比中等页面小(例如,对于大小为6MB的大对象)。

分配一个连续的虚拟内存范围应该很容易,因为我们通常有足够的虚拟内存。但是很容易想象这样一种情况,即物理内存中有3个大小为2MB的空闲页,但是我们需要6MB的连续内存来分配一个大对象。有足够的可用物理内存,但不幸的是,此内存是非连续的。ZGC能够将非连续的物理页复制到单个连续的虚拟内存空间。如果这不可能,我们的内存就会用完(发生OOM、或者频繁出现FullGC)。

在Linux上,物理内存基本上是一个不可见的匿名程序服务(意思是对开发者来讲是透明、黑盒的),它只存储在RAM中(而不是磁盘上),ZGC使用memfd_create创建它。然后可以使用ftruncate扩展该文件,ZGC允许将物理内存扩展到最大堆大小。物理内存是mmap被插入虚拟地址空间。

Marking & Relocating objects

原文

A collection is split into two major phases: marking & relocating. (Actually there are more than those two phases but see the source for more details).

A GC cycle starts with the marking phase, which marks all reachable objects. At the end of this phase we know which objects are still alive and which are garbage. ZGC stores this information in the so called live map for each page. A live map is a bitmap that stores whether the object at the given index is strongly-reachable and/or final-reachable (for objects with a finalize-method).

During the marking-phase the load-barrier in application-threads pushes unmarked references into a thread-local marking buffer. As soon as this buffer is full, the GC threads can take ownership of this buffer and recursively traverse all reachable objects from this buffer. Marking in an application thread just pushes the reference into a buffer, the GC threads are responsible for walking the object graph and updating the live map.

After marking ZGC needs to relocate all live objects in the relocation set. The relocation set is a set of pages, that were chosen to be evacuated based on some criteria after marking (e.g. those page with the most amount of garbage). An object is either relocated by a GC thread or an application thread (again through the load-barrier). ZGC allocates a forwarding table for each page in the relocation set. The forwarding table is basically a hash map that stores the address an object has been relocated to (if the object has already been relocated).

The advantage with ZGC’s approach is that we only need to allocate space for the forwarding pointer for pages in the relocation set. Shenandoah in comparison stores the forwarding pointer in the object itself for each and every object, which has some memory overhead.

The GC threads walk over the live objects in the relocation set and relocate all those objects that haven’t been relocated yet. It could even happen that an application thread and a GC thread try to relocate the same object at the same time, in this case the first thread to relocate the object wins. ZGC uses an atomic CAS-operation to determine a winner.

While not marking the load-barrier relocates or remaps all references loaded from the heap. That ensure that every new reference the mutator sees, already points to the newest copy of an object. Remapping an object means looking up the new object address in the forwarding table.

The relocation phase is finished as soon as the GC threads are finished walking the relocation set. Although that means all objects have been relocated, there will generally still be references into the relocation set, that need to be remapped to their new addresses. These reference will then be healed by trapping load-barriers or if this doesn’t happen soon enough by the next marking cycle. That means marking also needs to inspect the forward table to remap (but not relocate - all objects are guaranteed to be relocated) objects to their new addresses.

This also explains why there are two marking bits (marked0 and marked1) in an object reference. The marking phase alternates between the marked0 and marked1 bit. After the relocation phase there may still be references that haven’t been remapped and thus have still the bit from the last marking cycle set. If the new marking phase would use the same marking bit, the load-barrier would detect this reference as already marked.

译文

垃圾收集主要分为两个主要阶段:标记和重新定位。实际上不止这两个阶段,GC循环从标记阶段开始,标记所有可到达的对象。在这个阶段结束时,我们知道哪些对象仍然存在,哪些是垃圾。

ZGC将此信息存储在每个页面的所谓实时引用图谱关系中。实时引用关系是位图,存储给定索引处的对象是否强可访问和/或最终可访问(对于具有“finalize”-方法的对象)

在标记阶段,load屏障会在应用程序内线程将未标记的引用推入线程本地标记缓冲区。一旦此缓冲区已满,GC线程就可以获得此缓冲区的所有权,并递归地遍历此缓冲区中所有可访问的对象。在应用程序线程中进行标记只是将引用推入缓冲区,GC线程负责遍历对象图并更新活动映射。

标记ZGC后,需要重新定位已经标记集合中的所有活跃状态的对象。重新定位集是一组物理内存页,这些内存页在标记后根据某些标准(例如,垃圾量最多的内存页)选择进行分散均化(防止饥饿或者不平衡)。对象由GC线程或应用程序线程重新定位再次通过
[load barrier](https://dinfuehr.github.io/#load-barrier)

ZGC为重定位集中的每个页面分配一个转发表。转发表基本上是一个散列映射,用于存储对象已重新定位到的地址(如果对象已重新定位)。

ZGC方法的优点是,我们只需要为重定位集中的内存页面分配转发指针的空间相比之下,Shenandoah将每个对象的转发指针存储在对象本身中,这有一些内存开销。

GC线程遍历重新定位集中的活动对象,并重新定位所有尚未重新定位的对象。应用程序线程和GC线程甚至可能同时尝试重新定位同一对象,在这种情况下,第一个重新定位对象的线程获胜。ZGC使用原子CAS机制确定胜利者的操作。

然而它并不标记[读取屏障](https://dinfuehr.github.io/#load-barrier)重新定位或重新映射从堆加载的所有引用。确保mutator看到的每个新引用都已指向对象的最新副本。重新映射对象意味着在转发表中查找新对象地址。

GC线程完成遍历重定位集后,重定位阶段即告结束。尽管这意味着所有对象都已重新定位,但通常仍会有到重新定位集中的引用,需要重新映射到它们的新地址。然后,如果在下一个标记周期之前,这些参考将通过捕获读屏障来修复,或者如果这种情况没有很快发生。这意味着标记还需要检查转发表,以便将对象重新映射(但不重新定位-所有对象都保证重新定位)到其新地址。

这也解释了为什么在对象引用中有两个标记位(marked0marked1)。标记阶段在“marked0”和“marked1”位之间交替进行。在重新定位阶段之后,可能仍然存在尚未“重新映射”的引用,因此仍然存在上一个标记周期设置的位。如果新的标记阶段将使用相同的标记位,则读取屏障将检测到此已标记的参考。

Load-Barrier(读取屏障)

ZGC needs a so called load-barrier (also referred to as read-barrier) when reading a reference from the heap. We need to insert this load-barrier each time the Java program accesses a field of object type, e.g. obj.field. Accessing fields of some other primitive type do not need a barrier, e.g. obj.anInt or obj.anDouble. ZGC doesn’t need store/write-barriers for obj.field = someValue.

Depending on the stage the GC is currently in (stored in the global variable ZGlobalPhase), the barrier either marks the object or relocates it if the reference isn’t already marked or remapped.

The global variables ZAddressGoodMask and ZAddressBadMask store the mask that determines if a reference is already considered good (that means already marked or remapped/relocated) or if there is still some action necessary. These variables are only changed at the start of marking- and relocation-phase and both at the same time. This table from ZGC’s source gives a nice overview in which state these masks can be:

               GoodMask         BadMask          WeakGoodMask     WeakBadMask
               --------------------------------------------------------------
Marked0        001              110              101              010
Marked1        010              101              110              001
Remapped       100              011              100              011

Assembly code for the barrier can be seen in the MacroAssembler for x64, I will only show some pseudo assembly code for this barrier:

mov rax, [r10 + some_field_offset]
test rax, [address of ZAddressBadMask]
jnz load_barrier_mark_or_relocate

# otherwise reference in rax is considered good

The first assembly instruction reads a reference from the heap: r10 stores the object reference and some_field_offset is some constant field offset. The loaded reference is stored in the rax register. This reference is then tested (this is just an bitwise-and) against the current bad mask. Synchronization isn’t necessary here since ZAddressBadMask only gets updated when the world is stopped. If the result is non-zero, we need to execute the barrier. The barrier needs to either mark or relocate the object depending on which GC phase we are currently in. After this action it needs to update the reference stored in r10 + some_field_offset with the good reference. This is necessary such that subsequent loads from this field return a good reference. Since we might need to update the reference-address, we need to use two registers r10 and rax for the loaded reference and the objects address. The good reference also needs to be stored into register rax, such that execution can continue just as when we would have loaded a good reference.

Since every single reference needs to be marked or relocated, throughput is likely to decrease right after starting a marking- or relocation-phase. This should get better quite fast when most references are healed.

译文(读取屏障)

当从堆中读取引用时,ZGC需要一个所谓的读取屏障(也称为读屏障)。每次Java程序访问对象类型的字段时,我们都需要插入这个读取屏障,例如“obj.field”。访问其他一些基本类型的字段不需要障碍,例如’obj.anInt’或’obj.anDouble’。对于’obj.field=someValue`,ZGC不需要存储/写入屏障。

根据GC当前所在的阶段(存储在全局变量ZGlobalPhase),如果引用尚未标记或重新映射,则屏障会标记对象或重新定位对象。

全局变量ZAddressGoodMaskZAddressBadMask存储确定引用是否已被视为良好的掩码(这意味着已经标记或重新映射/重新定位)或者如果仍然需要一些操作。这些变量仅在标记和重新定位阶段开始时更改,并且在同时更改,此表来自ZGC的来源提供了这些掩码可以处于的状态的良好概述:

               GoodMask         BadMask          WeakGoodMask     WeakBadMask
               --------------------------------------------------------------
Marked0        001              110              101              010
Marked1        010              101              110              001
Remapped       100              011              100              011

屏障的汇编代码可以在MacroAssembler中看到,对于x64,我将仅显示此屏障的一些伪汇编代码:

mov rax, [r10 + some_field_offset]
test rax, [address of ZAddressBadMask]
jnz load_barrier_mark_or_relocate

# otherwise reference in rax is considered good

第一条汇编指令从堆中读取引用:r10存储对象引用,some_field_offset是某个常量字段偏移量。加载的引用存储在’rax’寄存器中。然后针对当前坏掩码测试该引用(这只是一个按位and)。这里不需要同步,因为只有当STW时,ZAddressBadMask才会更新。如果结果非零,我们需要执行屏障。屏障需要根据我们当前所处的GC阶段标记或重新定位对象。执行此操作后,它需要使用正确的引用更新存储在“r10+某些字段\\u offset”中的引用。这是必要的,以便该字段的后续加载返回良好的引用。因为我们可能需要更新引用地址,所以我们需要使用两个寄存器’r10’和’rax’作为加载的引用和对象地址。好的引用还需要存储到寄存器“rax”中,这样执行就可以像加载好的引用一样继续。

由于每个引用都需要标记或重新定位,因此在开始标记或重新定位阶段后,吞吐量可能会立即降低。当大多数引用被处理完成时,这种情况应该会很快得到改善。

Stop-the-World Pauses

ZGC doesn’t get rid of stop-the-world pauses completely. The collector needs pauses when starting marking, ending marking and starting relocation. But this pauses are usually quite short - only a few milliseconds.

When starting marking ZGC traverses all thread stacks to mark the applications root set. The root set is the set of object references from where traversing the object graph starts. It usually consists of local and global variables, but also other internal VM structures (e.g. JNI handles).

Another pause is required when ending the marking phase. In this pause the GC needs to empty and traverse all thread-local marking buffers. Since the GC could discover a large unmarked sub-graph this could take longer. ZGC tries to avoid this by stopping the end of marking phase after 1 millisecond. It returns into the concurrent marking phase until the whole graph is traversed, then the end of marking phase can be started again.

Starting relocation phase pauses the application again. This phase is quite similar to starting marking, with the difference that this phase relocates the objects in the root set.

STW暂停阶段

ZGC并没有在所有时刻进行停止所有应用线程。

  • 开始标记 - 时收集器需要暂停

    • 开始标记时,ZGC遍历所有线程堆栈以标记GCROOT根集。GCROOT根集是从遍历对象链路图开始的对象直接引用集。它通常包括本地和全局变量,但也包括其他内部VM结构(例如JNI句柄)。
  • 结束标记-但这种暂停通常很短,只有几毫秒。

    • 结束标记阶段时,需要再次暂停。在此暂停中,GC需要清空并遍历所有线程本地标记缓冲区。由于GC可以发现一个规模较大的未标记子引用链,这可能需要更长的时间。ZGC试图通过在1毫秒后停止标记阶段的结束来避免这种情况。它返回到并发标记阶段,直到遍历整个图形,然后可以再次开始标记阶段的结束。
  • 重新定位-但这种暂停通常很短,只有几毫秒。

    • 启动重新定位阶段将再次暂停应用程序。此阶段与开始标记非常相似,不同之处在于此阶段重新定位根集中的对象。

推荐给大家大神的地址

  • http://cr.openjdk.java.net/~pliden/zgc/)

以上是关于JVM深层系列「官方技术翻译」《A FIRST LOOK INTO ZGC》初探JVM-ZGC垃圾回收器的主要内容,如果未能解决你的问题,请参考以下文章

精华推荐 | JVM深层系列「GC底层调优系列」一文带你彻底加强夯实底层原理之GC垃圾回收技术的分析指南(GC原理透析)

20.1翻译系列:EF 6中自动数据迁移技术EF 6 Code-First系列

MVC 5 的 EF6 Code First 入门

JVM深层系列「云原生时代的Java虚拟机」针对于GraalVM的技术知识脉络的重塑和探究

20.翻译系列:Code-First中的数据库迁移技术EF 6 Code-First系列

20.2.翻译系列:EF 6中基于代码的数据库迁移技术EF 6 Code-First系列