为啥以不同的顺序运行我的测试会导致性能截然不同? [复制]

Posted

技术标签:

【中文标题】为啥以不同的顺序运行我的测试会导致性能截然不同? [复制]【英文标题】:Why does running my tests in a different order result in drastically different performances? [duplicate]为什么以不同的顺序运行我的测试会导致性能截然不同? [复制] 【发布时间】:2014-03-02 00:44:34 【问题描述】:

我编写了一个程序,针对各种划分ints 并返回double 结果的方法,对0 到20(含)之间的分子和分母的所有可能组合运行此方法,它给出了这些运行所有这些组合所需时间的结果:

Casting numerator:*-                                       0.000029 ms average (2852 ms / 100000000 reps)
Multiplying numerator:*-                                   0.000594 ms average (5940 ms / 10000000 reps)
Making double variable out of numerator:                   0.002192 ms average (2192 ms / 1000000 reps)
Casting denominator:*+                                     0.002201 ms average (2201 ms / 1000000 reps)
Multiplying denominator:                                   0.002181 ms average (2181 ms / 1000000 reps)
Making double variable out of denominator:                 0.002183 ms average (2183 ms / 1000000 reps)
Passing double numerator:*--                               0.000041 ms average (408 ms / 10000000 reps)
Passing double denominator:*--                             0.000034 ms average (343 ms / 10000000 reps)

Casting numerator, dividing twice:                         0.002506 ms average (2506 ms / 1000000 reps)
Multiplying numerator, dividing twice:                     0.002505 ms average (2505 ms / 1000000 reps)
Making double variable out of numerator, dividing twice:   0.002517 ms average (2517 ms / 1000000 reps)
Casting denominator, dividing twice:                       0.002520 ms average (2520 ms / 1000000 reps)
Multiplying denominator, dividing twice:                   0.002517 ms average (2517 ms / 1000000 reps)
Making double variable out of denominator, dividing twice: 0.002508 ms average (2508 ms / 1000000 reps)
Passing double numerator, dividing twice:*-                0.000704 ms average (7040 ms / 10000000 reps)
Passing double denominator, dividing twice:*-              0.000594 ms average (5935 ms / 10000000 reps)

但是,以不同的顺序运行这些测试会导致某些测试花费相同的时间(如预期的那样),但其他测试花费的时间却大不相同(* 表示我得到的结果明显不同):

Casting numerator:                                         0.000029 ms average (2904 ms / 100000000 reps)
Casting denominator:                                       0.000726 ms average (7263 ms / 10000000 reps)
Multiplying numerator:                                     0.002220 ms average (2220 ms / 1000000 reps)
Multiplying denominator:                                   0.002224 ms average (2224 ms / 1000000 reps)
Making double variable out of numerator:                   0.002236 ms average (2236 ms / 1000000 reps)
Making double variable out of denominator:                 0.002249 ms average (2249 ms / 1000000 reps)
Passing double denominator:                                0.000036 ms average (3586 ms / 100000000 reps)
Passing double numerator:                                  0.001455 ms average (1455 ms / 1000000 reps)

Casting numerator, dividing twice:                         0.002542 ms average (2542 ms / 1000000 reps)
Casting denominator, dividing twice:                       0.002546 ms average (2546 ms / 1000000 reps)
Multiplying numerator, dividing twice:                     0.002545 ms average (2545 ms / 1000000 reps)
Multiplying denominator, dividing twice:                   0.002542 ms average (2542 ms / 1000000 reps)
Making double variable out of numerator, dividing twice:   0.002559 ms average (2559 ms / 1000000 reps)
Making double variable out of denominator, dividing twice: 0.002588 ms average (2588 ms / 1000000 reps)
Passing double denominator, dividing twice:                0.000700 ms average (7002 ms / 10000000 reps)
Passing double numerator, dividing twice:                  0.001581 ms average (1581 ms / 1000000 reps)

并且还运行一个“控制”测试,该测试循环遍历所有可能的组合,但实际上并没有除任何东西会产生更奇怪的结果,控制效果和过去的数字转换一样,但现在分子的效果要差 20 倍,并且稍后完成的控制比以前花费了更多的时间:

Control (Don't actually divide anything):                  0.000035 ms average (3494 ms / 100000000 reps)
Casting numerator:                                         0.000588 ms average (5880 ms / 10000000 reps)
Casting denominator:                                       0.002177 ms average (2177 ms / 1000000 reps)
Multiplying numerator:                                     0.002188 ms average (2188 ms / 1000000 reps)
Multiplying denominator:                                   0.002202 ms average (2202 ms / 1000000 reps)
Making double variable out of numerator:                   0.002186 ms average (2186 ms / 1000000 reps)
Making double variable out of denominator:                 0.002201 ms average (2201 ms / 1000000 reps)
Passing double denominator:                                0.002024 ms average (2024 ms / 1000000 reps)
Passing double numerator:                                  0.001456 ms average (1456 ms / 1000000 reps)

Control (Don't actually divide anything):                  0.000927 ms average (927 ms / 1000000 reps)
Casting numerator, dividing twice:                         0.002552 ms average (2552 ms / 1000000 reps)
Casting denominator, dividing twice:                       0.002556 ms average (2556 ms / 1000000 reps)
Multiplying numerator, dividing twice:                     0.002538 ms average (2538 ms / 1000000 reps)
Multiplying denominator, dividing twice:                   0.002554 ms average (2554 ms / 1000000 reps)
Making double variable out of numerator, dividing twice:   0.002546 ms average (2546 ms / 1000000 reps)
Making double variable out of denominator, dividing twice: 0.002535 ms average (2535 ms / 1000000 reps)
Passing double denominator, dividing twice:                0.002344 ms average (2344 ms / 1000000 reps)
Passing double numerator, dividing twice:                  0.001597 ms average (1597 ms / 1000000 reps)

这些结果对于重新运行相同的订单是一致的。改变重复次数通常对平均速度没有显着影响,但是在运行控制 10^8 次然后运行控制 10^7 次的情况下,我发现运行 cast numerator next to 与 control if it 一样好运行 10^7 次,但如果运行 10^6 或 10^8 次,运行速度会慢 20 倍。在 Windows 中让程序等待输入并将其进程优先级设置为Realtime 没有显着差异。 JVM 和/或 CPU 做了哪些不同的工作以使这些性能如此不同?

更新:“运行热身测试”

重试第一阶测试和第三阶测试,其中每个测试连续运行六次,同一测试的每次连续运行需要大约相同的时间来运行(最多 +- 15%,与数量级变化),除了控制和转换分子,在第 3 次到第 6 次连续测试中运行需要 0 毫秒,以及乘以分子,在第一次测试中需要 0 毫秒才能在第 4 次到第 6 次测试。此外,每个测试所花费的时间似乎大约等于在第三次测试中运行一次该测试所花费的时间。不知何故,在许多情况下,运行热身测试会降低性能!

测试程序

public class Tests 
    public static void main(String[] args) throws Exception 
        //number of repetitions set so that each takes 1-10 seconds total to run on my machine
        testPerformance(castNumerator, 8);
        testPerformance(multiplyNumerator, 7);
        testPerformance(makeNumerator, 6);
        testPerformance(castDenominator, 6);
        testPerformance(multiplyDenominator, 6);
        testPerformance(makeDenominator, 6);
        testPerformance(diDivider, 6);
        testPerformance(idDivider, 8);
        System.out.println();
        testPerformance(castNumerator2, 6);
        testPerformance(multiplyNumerator2, 6);
        testPerformance(makeNumerator2, 6);
        testPerformance(castDenominator2, 6);
        testPerformance(multiplyDenominator2, 6);
        testPerformance(makeDenominator2, 6);
        testPerformance(diDivider2, 6);
        testPerformance(idDivider2, 7);
    

    static void testPerformance(final Divider divider, final int logReps) 
        final int reps = (int)Math.pow(10, logReps);
        final long startTime;
        if (divider instanceof IntIntDivider) 
            final IntIntDivider iiDivider = ((IntIntDivider)divider);
            startTime = System.currentTimeMillis();
            for (int i = 0; i < reps; i++) 
                for (int n = 0; n < 20; n++) 
                    for (int d = 0; d < 20; d++) 
                        iiDivider.divide(n, d);
                    
                
            
         else if (divider instanceof DoubleIntDivider) //yucky repetition, but the only fair way to do it because generics can't do primitives
            final DoubleIntDivider diDivider = ((DoubleIntDivider)divider);
            startTime = System.currentTimeMillis();
            for (int i = 0; i < reps; i++) 
                for (int n = 0; n < 20; n++) 
                    for (int d = 0; d < 20; d++) 
                        diDivider.divide(n, d);
                    
                
            
         else if (divider instanceof IntDoubleDivider) 
            final IntDoubleDivider idDivider = ((IntDoubleDivider)divider);
            startTime = System.currentTimeMillis();
            for (int i = 0; i < reps; i++) 
                for (int n = 0; n < 20; n++) 
                    for (int d = 0; d < 20; d++) 
                        idDivider.divide(n, d);
                    
                
            
         else 
            throw new RuntimeException("Impossible divider");
        
        final long endTime = System.currentTimeMillis();
        final long time = (endTime - startTime);
        System.out.printf("    %-58s %f ms%n", divider + ":", time / (double) reps); //cast reps to double because casting time might result in precision loss
        //System.out.println(divider + ":");
        //System.out.println("\t" + time + " ms taken for " + reps + " runs");
        //System.out.printf("\tAverage of %f ms%n", time / (double) reps);
    
    static interface Divider 
    static abstract class IntIntDivider implements Divider 
        public abstract double divide(int n, int d);
    
    static abstract class DoubleIntDivider implements Divider 
        public abstract double divide(double n, int d);
    
    static abstract class IntDoubleDivider implements Divider 
        public abstract double divide(int n, double d);
    
    static final IntIntDivider control = new IntIntDivider() 
        @Override
        public double divide(int n, int d) 
            return 1;
        
        @Override
        public String toString() 
            return "Control (Don't actually divide anything)";
        
    ;
    static final IntIntDivider castNumerator = new IntIntDivider() 
        @Override
        public double divide(int n, int d) 
            return ((double)n) / d;
        
        @Override
        public String toString() 
            return "Casting numerator";
        
    ;
    static final IntIntDivider multiplyNumerator = new IntIntDivider() 
        @Override
        public double divide(int n, int d) 
            return n * 1.0 / d;
        
        @Override
        public String toString() 
            return "Multiplying numerator";
        
    ;
    static final IntIntDivider makeNumerator = new IntIntDivider() 
        @Override
        public double divide(int n, int d) 
            final double nDouble = n;
            return nDouble / d;
        
        @Override
        public String toString() 
            return "Making double variable out of numerator";
        
    ;
    static final IntIntDivider castDenominator = new IntIntDivider() 
        @Override
        public double divide(int n, int d) 
            return n / (double) d;
        
        @Override
        public String toString() 
            return "Casting denominator";
        
    ;
    static final IntIntDivider multiplyDenominator = new IntIntDivider() 
        @Override
        public double divide(int n, int d) 
            return n / (d * 1.0);
        
        @Override
        public String toString() 
            return "Multiplying denominator";
        
    ;
    static final IntIntDivider makeDenominator = new IntIntDivider() 
        @Override
        public double divide(int n, int d) 
            final double dDouble = d;
            return n / dDouble;
        
        @Override
        public String toString() 
            return "Making double variable out of denominator";
        
    ;
    static final DoubleIntDivider diDivider = new DoubleIntDivider() 
        @Override
        public double divide(double n, int d) 
            return n / d;
        
        @Override
        public String toString() 
            return "Passing double numerator";
        
    ;
    static final IntDoubleDivider idDivider = new IntDoubleDivider() 
        @Override
        public double divide(int n, double d) 
            return n / d;
        
        @Override
        public String toString() 
            return "Passing double denominator";
        
    ;
    static final IntIntDivider castNumerator2 = new IntIntDivider() 
        @Override
        public double divide(int n, int d) 
            return ((double)n) / d + ((double)n) / d;
        
        @Override
        public String toString() 
            return "Casting numerator, dividing twice";
        
    ;
    static final IntIntDivider multiplyNumerator2 = new IntIntDivider() 
        @Override
        public double divide(int n, int d) 
            return n * 1.0 / d + n * 1.0 / d;
        
        @Override
        public String toString() 
            return "Multiplying numerator, dividing twice";
        
    ;
    static final IntIntDivider makeNumerator2 = new IntIntDivider() 

        @Override
        public double divide(int n, int d) 
            final double nDouble = n;
            return nDouble / d + nDouble / d;
        
        @Override
        public String toString() 
            return "Making double variable out of numerator, dividing twice";
        
    ;
    static final IntIntDivider castDenominator2 = new IntIntDivider() 
        @Override
        public double divide(int n, int d) 
            return n / (double) d + n / (double) d;
        
        @Override
        public String toString() 
            return "Casting denominator, dividing twice";
        
    ;
    static final IntIntDivider multiplyDenominator2 = new IntIntDivider() 
        @Override
        public double divide(int n, int d) 
            return n / (d * 1.0) + n / (d * 1.0);
        
        @Override
        public String toString() 
            return "Multiplying denominator, dividing twice";
        
    ;
    static final IntIntDivider makeDenominator2 = new IntIntDivider() 
        @Override
        public double divide(int n, int d) 
            final double dDouble = d;
            return n / dDouble + n / dDouble;
        
        @Override
        public String toString() 
            return "Making double variable out of denominator, dividing twice";
        
    ;
    static final DoubleIntDivider diDivider2 = new DoubleIntDivider() 
        @Override
        public double divide(double n, int d) 
            return n / d + n / d;
        
        @Override
        public String toString() 
            return "Passing double numerator, dividing twice";
        
    ;
    static final IntDoubleDivider idDivider2 = new IntDoubleDivider() 
        @Override
        public double divide(int n, double d) 
            return n / d + n / d;
        
        @Override
        public String toString() 
            return "Passing double denominator, dividing twice";
        
    ;

【问题讨论】:

您是否在测试前进行了热身阶段?微基准测试可能会出现很多问题,但预热阶段是最关键的。这是必要的,因为 JIT 将在运行时进行编译和优化。这意味着它执行某些操作的前几次将比其他所有时间慢得多 @RichardTingle 首先运行“控制”算法以某种方式恶化实际算法的性能。 我对控件的理解(可能是错误的)是它与其他两个测试不同。如果是这样,这将不是一个很好的热身阶段,因为它不会触发您实际进行基准测试的JITing @RichardTingle 在帖子中更新 @BrianRoach 没有;问题是问 JVM 和/或 CPU 正在做什么使这成为不正确的微基准,而不是这是否是正确的微基准或哪种划分算法具有最佳性能。我对此感兴趣,如果可能的话,学习如何利用这里正在进行的任何优化,类似于如何重组代码以改进分支预测或缓存命中。 【参考方案1】:

我怀疑排序在这里有影响。问题在于,在更改顺序的结果中,运行测试的次数也发生了变化。这是一个问题,因为代码运行的次数越多,JIT 对其优化的次数就越多。

如果您只看数字,就会发现运行次数越多的测试速度越快。这是非常经典的 JIT 行为。优化后的编译运行次数更多并加权平均值。所以应该清楚的是,各个测试的运行次数需要相等。测试也应该考虑热身。

所以调整main 得到如下结果:

for(int i = 0; i < 2; i++) 
    testPerformance(castNumerator, 6);
    testPerformance(multiplyNumerator, 6);
    testPerformance(makeNumerator, 6);
    testPerformance(castDenominator, 6);
    testPerformance(multiplyDenominator, 6);
    testPerformance(makeDenominator, 6);
    testPerformance(diDivider, 6);
    testPerformance(idDivider, 6);
    System.out.println();
    testPerformance(castNumerator2, 6);
    testPerformance(multiplyNumerator2, 6);
    testPerformance(makeNumerator2, 6);
    testPerformance(castDenominator2, 6);
    testPerformance(multiplyDenominator2, 6);
    testPerformance(makeDenominator2, 6);
    testPerformance(diDivider2, 6);
    testPerformance(idDivider2, 6);
    System.out.println();

为了热身,这只是运行了两次测试,第一组结果可能意义不大。

但在这样做之后,我在第二次运行时看到如下内容:

Casting numerator:                                         0.003012 ms
Multiplying numerator:                                     0.003011 ms
Making double variable out of numerator:                   0.003025 ms
Casting denominator:                                       0.003040 ms
Multiplying denominator:                                   0.003015 ms
Making double variable out of denominator:                 0.003006 ms
Passing double numerator:                                  0.000000 ms
Passing double denominator:                                0.000000 ms

这有点奇怪,因为这表明最后两个测试需要 0 毫秒才能运行。这并不是因为这些划分真的非常快,因为更改程序以使其使用nanoTime 没有任何效果。 (nanoTime 总是被推荐,但在这里它可能不会有太大的不同,因为时间已经足够长了。)

实际上可能发生的事情是 JIT 发现您忽略了返回值,因此它通过不以某种方式执行这些测试来优化这些测试。至于为什么它只针对某些测试,你的猜测可能和我的一样好,但它表明一个严重的问题。如果 JIT 正在做这种极端优化,我们不知道它在其他地方做了什么。

这可以通过使用返回值做一些事情(基本上是任何事情)来解决:

// somewhere
static long result;

// for each op
result += idDivider.divide(n, d);

// at the end of the test
System.out.println(result);

这可能会稍微减慢测试速度,但会挫败 JIT 的聪明才智。

在所有这些更改之后,我基本上每次都得到这个:

Casting numerator:                                         0.007088 ms
Multiplying numerator:                                     0.007135 ms
Making double variable out of numerator:                   0.007162 ms
Casting denominator:                                       0.007180 ms
Multiplying denominator:                                   0.007206 ms
Making double variable out of denominator:                 0.007173 ms
Passing double numerator:                                  0.003650 ms
Passing double denominator:                                0.003663 ms

Casting numerator, dividing twice:                         0.007554 ms
Multiplying numerator, dividing twice:                     0.007574 ms
Making double variable out of numerator, dividing twice:   0.007538 ms
Casting denominator, dividing twice:                       0.007550 ms
Multiplying denominator, dividing twice:                   0.007503 ms
Making double variable out of denominator, dividing twice: 0.007577 ms
Passing double numerator, dividing twice:                  0.003765 ms
Passing double denominator, dividing twice:                0.003798 ms

现在,如果我不得不猜测为什么这表明转换为参数更快,我会说 HotSpot 正在编译它,因此循环变量始终是双精度值。

关于一般基准测试,我建议您进一步阅读以下内容:

Robust Java benchmarking, Part 1: Issues Java theory and practice: Anatomy of a flawed microbenchmark Zed Shaw's crazy and convincing rant on poor statistics(他很粗鲁,所以不适合任何容易被冒犯的人。)

【讨论】:

当我开始阅读时:“运行次数越多的测试速度越快”实际上,情况恰恰相反。更快的测试我跑了更多次,因此总时间至少为 1 秒(以提供最低准确度)但不超过 10 秒(因为时间很宝贵)。一般来说,运行更多次并不会显着提高性能。 这样做会改变结果。它们需要运行相同的次数。你说问题是发生了什么,这使得它成为一个不正确的微观基准,这就是其中之一。当您在 Java 中对某些东西进行基准测试时,您所测试的部分内容就是 HotSpot,您需要考虑到这一点。 我注意到的一点是,这在某种程度上仍然依赖于顺序:只有先传递双分子,传递双分母才会更快。 无论订购如何,我都会得到相同的结果。如果您已经实施了我推荐的更改并且仍然看到这种行为,那么除了 HotSpot 正在做的任何事情之外,我不知道它可能是什么。 HotSpot应该能够判断这些都是相同的,并将它们编译成相同的代码片段。您可以尝试的一件事是将您用作输入的数字随机化,因为 HotSpot 可能正在寻找其他方法来作弊,将输入固定为仅 0-19。 如果您想测试纯 Java 代码,您可以做的另一件事是使用 -Xint 标志关闭 HotSpot。 (虽然 HotSpot 基本上是 Java 的一个特性,所以关闭它作为真正的基准测试是不典型的。)关闭 HotSpot 后,测试之间的相对差异应该相当小。

以上是关于为啥以不同的顺序运行我的测试会导致性能截然不同? [复制]的主要内容,如果未能解决你的问题,请参考以下文章

不同的执行顺序导致Pthread程序的性能差异

为啥 cProfile 会导致函数返回不同的值?

为啥导致 ***Error 的递归方法的调用次数在程序运行之间会有所不同? [复制]

为啥递归调用会导致不同堆栈深度的 ***?

为啥过滤时使用相同的字段会导致不同的执行时间? (不同的索引用法)

为啥使用不同的 ArrayList 构造函数会导致内部数组的增长率不同?