Java 方法直接调用 vs 单元素循环调用
Posted
技术标签:
【中文标题】Java 方法直接调用 vs 单元素循环调用【英文标题】:Java method direct invocation vs single element loop invocation 【发布时间】:2019-10-20 21:53:57 【问题描述】:最近我想看看直接在对象上调用方法与在同一对象上调用相同方法的性能差异是什么,如果该对象被添加到单个元素 ArrayList 中并且我们尝试循环该元素 List .
老实说,我的假设是会有小的差异,因为我希望展开单个元素循环,因此调用模拟直接调用。
为了测试我创建了以下简单的 JMH 示例:https://gist.github.com/nikkatsa/69a58c63c1b79b245d700f6465bce7f7
令我惊讶的是,直接在元素上调用该方法似乎快了约 2.5 倍。这是一个很大的数字,因为我的期望完全不同:
Benchmark Mode Cnt Score Error Units
SingleElementLoopBenchmark.directInvocation thrpt 10 332576972.176 ± 2740616.755 ops/s
SingleElementLoopBenchmark.singleElementLoopInvocation thrpt 10 130179329.978 ± 1303776.248 ops/s
我在推特上发了一点信息,希望得到一些指示,以了解为什么会发生这种情况。有人告诉我,运行perf
会给我一个线索。所以我确实使用-perf dtraceasm
运行,我得到了以下两个输出,第一个用于直接调用,第二个用于循环调用:
....[Hottest Region 1]..............................................................................
c2, com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub, version 115 (69 bytes)
0x000000010b73c92c: mov %rbp,%r9
0x000000010b73c92f: movzbl 0x94(%r9),%r10d ;*getfield isDone reexecute=0 rethrow=0 return_oop=0
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@27 (line 123)
; implicit exception: dispatches to 0x000000010b73ca5e
0x000000010b73c937: test %r10d,%r10d
0x000000010b73c93a: jne 0x000000010b73c9da ;*ifeq reexecute=0 rethrow=0 return_oop=0
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@30 (line 123)
0x000000010b73c940: mov $0x1,%ebp
0x000000010b73c945: data16 data16 nopw 0x0(%rax,%rax,1) ;*aload_1 reexecute=0 rethrow=0 return_oop=0
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@33 (line 124)
9.56% ↗ 0x000000010b73c950: mov 0x40(%rsp),%r10
1.00% │ 0x000000010b73c955: mov 0xc(%r10),%r10d ;*getfield dispatcher reexecute=0 rethrow=0 return_oop=0
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@1 (line 23)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
0.17% │ 0x000000010b73c959: mov 0xc(%r12,%r10,8),%r11d ; implicit exception: dispatches to 0x000000010b73ca12
11.18% │ 0x000000010b73c95e: test %r11d,%r11d
0.00% │ 0x000000010b73c961: je 0x000000010b73c9c9 ;*invokevirtual performAction reexecute=0 rethrow=0 return_oop=0
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invoke@5 (line 40)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@5 (line 23)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
10.69% │ 0x000000010b73c963: mov %r9,(%rsp)
0.65% │ 0x000000010b73c967: mov 0x38(%rsp),%rsi
0.00% │ 0x000000010b73c96c: mov $0x1,%edx
0.14% │ 0x000000010b73c971: xchg %ax,%ax
10.08% │ 0x000000010b73c973: callq 0x000000010b6c2900 ; ImmutableOopMap[48]=Oop [56]=Oop [64]=Oop [0]=Oop
│ ;*invokevirtual consume reexecute=0 rethrow=0 return_oop=0
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Listener::performAction@2 (line 53)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invoke@5 (line 40)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@5 (line 23)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
│ ; optimized virtual_call
1.44% │ 0x000000010b73c978: mov (%rsp),%r9
0.19% │ 0x000000010b73c97c: movzbl 0x94(%r9),%r8d ;*ifeq reexecute=0 rethrow=0 return_oop=0
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@30 (line 123)
9.77% │ 0x000000010b73c984: mov 0x108(%r15),%r10
0.99% │ 0x000000010b73c98b: add $0x1,%rbp ; ImmutableOopMapr9=Oop [48]=Oop [56]=Oop [64]=Oop
│ ;*ifeq reexecute=1 rethrow=0 return_oop=0
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@30 (line 123)
0.02% │ 0x000000010b73c98f: test %eax,(%r10) ; poll
0.28% │ 0x000000010b73c992: test %r8d,%r8d
0.00% ╰ 0x000000010b73c995: je 0x000000010b73c950 ;*aload_1 reexecute=0 rethrow=0 return_oop=0
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@33 (line 124)
0x000000010b73c997: movabs $0x10abd7d62,%r10
0x000000010b73c9a1: callq *%r10 ;*invokestatic nanoTime reexecute=0 rethrow=0 return_oop=0
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@34 (line 124)
0x000000010b73c9a4: mov 0x30(%rsp),%r10
0x000000010b73c9a9: mov %rbp,0x18(%r10) ;*putfield measuredOps reexecute=0 rethrow=0 return_oop=0
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@49 (line 126)
0x000000010b73c9ad: mov %rax,0x30(%r10) ;*putfield stopTime reexecute=0 rethrow=0 return_oop=0
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@37 (line 124)
0x000000010b73c9b1: movq $0x0,0x20(%r10) ;*invokevirtual directInvocation reexecute=0 rethrow=0 return_oop=0
....................................................................................................
....[Hottest Region 1]..............................................................................
c2, com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub, version 115 (279 bytes)
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@27 (line 123)
; implicit exception: dispatches to 0x000000011153fffe
0x000000011153fa81: test %r10d,%r10d
0x000000011153fa84: jne 0x000000011153fbe4 ;*ifeq reexecute=0 rethrow=0 return_oop=0
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@30 (line 123)
0x000000011153fa8a: mov 0x58(%rsp),%rdx
0x000000011153fa8f: test %rdx,%rdx
0x000000011153fa92: je 0x000000011153fd5e
0x000000011153fa98: mov $0x1,%ebx
╭ 0x000000011153fa9d: jmp 0x000000011153fad6
0.19% │ ↗ 0x000000011153fa9f: mov 0x58(%rsp),%r13
3.55% │ │ 0x000000011153faa4: mov (%rsp),%rcx
0.09% │ │ 0x000000011153faa8: mov 0x60(%rsp),%rdx
0.22% │ │ 0x000000011153faad: mov 0x50(%rsp),%r11
0.17% │ │ 0x000000011153fab2: mov 0x8(%rsp),%rbx ;*if_icmpge reexecute=0 rethrow=0 return_oop=0
│ │ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
│ │ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ │ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
3.55% │↗│ 0x000000011153fab7: movzbl 0x94(%r11),%r8d ;*goto reexecute=0 rethrow=0 return_oop=0
│││ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@35 (line 44)
│││ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.16% │││ 0x000000011153fabf: mov 0x108(%r15),%r10
0.28% │││ 0x000000011153fac6: add $0x1,%rbx ; ImmutableOopMapr11=Oop rcx=Oop rdx=Oop r13=Oop
│││ ;*ifeq reexecute=1 rethrow=0 return_oop=0
│││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@30 (line 123)
0.19% │││ 0x000000011153faca: test %eax,(%r10) ; poll
4.00% │││ 0x000000011153facd: test %r8d,%r8d
│││ 0x000000011153fad0: jne 0x000000011153fbe9 ;*aload_1 reexecute=0 rethrow=0 return_oop=0
│││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@33 (line 124)
0.07% ↘││ 0x000000011153fad6: mov 0xc(%rcx),%r8d ;*getfield dispatcher reexecute=0 rethrow=0 return_oop=0
││ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@1 (line 28)
││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.22% ││ 0x000000011153fada: mov 0x10(%r12,%r8,8),%r10d ;*getfield singleListenerList reexecute=0 rethrow=0 return_oop=0
││ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@4 (line 44)
││ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
││ ; implicit exception: dispatches to 0x000000011153ff2a
0.21% ││ 0x000000011153fadf: mov 0x8(%r12,%r10,8),%edi ; implicit exception: dispatches to 0x000000011153ff3e
4.39% ││ 0x000000011153fae4: cmp $0x237565,%edi ; metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')
││ 0x000000011153faea: jne 0x000000011153fc92
0.33% ││ 0x000000011153faf0: lea (%r12,%r10,8),%r9 ;*invokeinterface size reexecute=0 rethrow=0 return_oop=0
││ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@7 (line 44)
││ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.09% ││ 0x000000011153faf4: mov 0x10(%r9),%r9d
0.14% ││ 0x000000011153faf8: test %r9d,%r9d
╰│ 0x000000011153fafb: jle 0x000000011153fab7 ;*if_icmpge reexecute=0 rethrow=0 return_oop=0
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
3.98% │ 0x000000011153fafd: lea (%r12,%r8,8),%rdi ;*getfield dispatcher reexecute=0 rethrow=0 return_oop=0
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@1 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.06% │ 0x000000011153fb01: xor %r9d,%r9d ;*aload_0 reexecute=0 rethrow=0 return_oop=0
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@15 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.09% │ 0x000000011153fb04: mov 0x8(%r12,%r10,8),%esi ; implicit exception: dispatches to 0x000000011153ff4e
0.06% │ 0x000000011153fb09: cmp $0x237565,%esi ; metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')
0.00% │ 0x000000011153fb0f: jne 0x000000011153fcc2
3.93% │ 0x000000011153fb15: lea (%r12,%r10,8),%rax ;*invokeinterface get reexecute=0 rethrow=0 return_oop=0
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.06% │ 0x000000011153fb19: mov 0x10(%rax),%r10d ;*getfield size reexecute=0 rethrow=0 return_oop=0
│ ; - java.util.ArrayList::get@2 (line 458)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.11% │ 0x000000011153fb1d: test %r10d,%r10d
│ 0x000000011153fb20: jl 0x000000011153fcf6 ;*invokestatic checkIndex reexecute=0 rethrow=0 return_oop=0
│ ; - java.util.Objects::checkIndex@3 (line 372)
│ ; - java.util.ArrayList::get@5 (line 458)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.28% │ 0x000000011153fb26: cmp %r10d,%r9d
0.00% │ 0x000000011153fb29: jae 0x000000011153fc1c
3.97% │ 0x000000011153fb2f: mov 0x14(%rax),%r10d ;*getfield elementData reexecute=0 rethrow=0 return_oop=0
│ ; - java.util.ArrayList::elementData@1 (line 442)
│ ; - java.util.ArrayList::get@11 (line 459)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.05% │ 0x000000011153fb33: mov %r9d,%ebp ;*invokestatic checkIndex reexecute=0 rethrow=0 return_oop=0
│ ; - java.util.Objects::checkIndex@3 (line 372)
│ ; - java.util.ArrayList::get@5 (line 458)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.08% │ 0x000000011153fb36: mov 0xc(%r12,%r10,8),%esi ; implicit exception: dispatches to 0x000000011153ff62
1.27% │ 0x000000011153fb3b: cmp %esi,%ebp
│ 0x000000011153fb3d: jae 0x000000011153fc5a
3.94% │ 0x000000011153fb43: shl $0x3,%r10
0.05% │ 0x000000011153fb47: mov 0x10(%r10,%rbp,4),%r9d ;*aaload reexecute=0 rethrow=0 return_oop=0
│ ; - java.util.ArrayList::elementData@5 (line 442)
│ ; - java.util.ArrayList::get@11 (line 459)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
1.71% │ 0x000000011153fb4c: mov 0x8(%r12,%r9,8),%r10d ; implicit exception: dispatches to 0x000000011153ff72
17.85% │ 0x000000011153fb51: cmp $0x237522,%r10d ; metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Listener')
0.00% │ 0x000000011153fb58: jne 0x000000011153fef6 ;*checkcast reexecute=0 rethrow=0 return_oop=0
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@25 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
3.79% │ 0x000000011153fb5e: mov %rdi,0x18(%rsp)
0.02% │ 0x000000011153fb63: mov %r8d,0x10(%rsp)
0.02% │ 0x000000011153fb68: mov %rbx,0x8(%rsp)
0.19% │ 0x000000011153fb6d: mov %r11,0x50(%rsp)
3.95% │ 0x000000011153fb72: mov %rdx,0x60(%rsp)
0.02% │ 0x000000011153fb77: mov %rcx,(%rsp)
0.03% │ 0x000000011153fb7b: mov %r13,0x58(%rsp)
0.36% │ 0x000000011153fb80: mov %rdx,%rsi
3.78% │ 0x000000011153fb83: mov $0x1,%edx
0.01% │ 0x000000011153fb88: vzeroupper
4.05% │ 0x000000011153fb8b: callq 0x00000001114c2900 ; ImmutableOopMap[80]=Oop [88]=Oop [96]=Oop [0]=Oop [16]=NarrowOop [24]=Oop
│ ;*invokevirtual consume reexecute=0 rethrow=0 return_oop=0
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Listener::performAction@2 (line 53)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@29 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
│ ; optimized virtual_call
0.98% │ 0x000000011153fb90: mov 0x10(%rsp),%r8d
3.61% │ 0x000000011153fb95: mov 0x10(%r12,%r8,8),%r10d ;*getfield singleListenerList reexecute=0 rethrow=0 return_oop=0
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@4 (line 44)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.24% │ 0x000000011153fb9a: mov 0x8(%r12,%r10,8),%r9d ; implicit exception: dispatches to 0x000000011153ff9e
0.74% │ 0x000000011153fb9f: inc %ebp ;*iinc reexecute=0 rethrow=0 return_oop=0
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@32 (line 44)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.04% │ 0x000000011153fba1: cmp $0x237565,%r9d ; metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')
0.00% │ 0x000000011153fba8: jne 0x000000011153fd36
3.60% │ 0x000000011153fbae: lea (%r12,%r10,8),%r11 ;*invokeinterface size reexecute=0 rethrow=0 return_oop=0
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@7 (line 44)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.11% │ 0x000000011153fbb2: mov 0x10(%r11),%r9d
0.35% │ 0x000000011153fbb6: cmp %r9d,%ebp
╰ 0x000000011153fbb9: jge 0x000000011153fa9f ;*if_icmpge reexecute=0 rethrow=0 return_oop=0
; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0x000000011153fbbf: mov %ebp,%r9d
0x000000011153fbc2: mov 0x58(%rsp),%r13
0x000000011153fbc7: mov (%rsp),%rcx
0x000000011153fbcb: mov 0x60(%rsp),%rdx
0x000000011153fbd0: mov 0x50(%rsp),%r11
0x000000011153fbd5: mov 0x8(%rsp),%rbx
....................................................................................................
如果可能的话,我想要一些帮助来理解上述两个输出。从我很少了解的情况来看,似乎在循环时,由于列表泛型,程序要付出相当大的代价来检查数组列表的大小以执行循环,以及将对象转换为适当类型的成本.我无法真正理解循环是否展开,但我相信它会展开?
如果有人理解上述输出并能指出正在发生的事情,将不胜感激。
【问题讨论】:
vzeroupper
在这里疯了;循环内没有使用 YMM AVX 寄存器。 (它在 KNL 以外的 CPU 上并不慢,但这更多地表明 JIT 没有看穿很多东西)。显然这里有大量的优化缺失。
【参考方案1】:
您衡量一个几乎什么都不做的操作与一个涉及至少四个相关内存负载的操作:
ArrayList.size ArrayList.elementData elementData.length 元素数据[0]加上至少一个边界检查。我说“至少”,因为这可能是最好的优化编译器可以做的。然而,HotSpot 执行两个绑定检查:一个在 ArrayList.get() 中,另一个用于 elementData 数组访问。
此外,ArrayList 包含一个对象数组,因此为了从列表中获取Listener
的实例,需要将对象强制转换为目标类型。这是额外的内存负载加上类型检查。
从另一个角度看基准分数:单个元素列表不是慢 2.5 倍,而是多 3 纳秒:
Benchmark Mode Cnt Score Error Units
SingleElementLoopBenchmark.directInvocation avgt 5 2,340 ± 0,095 ns/op
SingleElementLoopBenchmark.singleElementLoopInvocation avgt 5 5,283 ± 0,074 ns/op
3 纳秒! 这只是 2 GHz 内核的 6 个 CPU 周期。光可以在 3 纳秒内传播不到 1 米。你真的期望 ArrayList 的开销比这少吗?开销再小,如果和完全没有开销相比,这个比例可能无限大。
【讨论】:
安德烈,首先感谢您的时间和详细的解释。我的主要目标是真正了解循环方法的开销是什么,您对此进行了彻底的解释。你说我可能从错误的角度看待结果是有道理的。我只是想权衡可能导致短期内调用过多的特定用例的利弊。再次感谢您。以上是关于Java 方法直接调用 vs 单元素循环调用的主要内容,如果未能解决你的问题,请参考以下文章