继承真的不影响性能吗？

Posted 2023-02-23

技术标签:

【中文标题】继承真的不影响性能吗？【英文标题】：Does inheritance really not affect performance? 【发布时间】：2019-11-16 16:34:36 【问题描述】：

我在互联网上发现（here 和here），继承不会影响类的性能。我一直对此感到好奇，因为我一直在为渲染引擎编写一个矩阵模块，这个模块的速度对我来说非常重要。

在我写完之后：

基础：通用矩阵类从基础派生：方形实现派生自：方阵的 3 维和 4 维实现

我决定测试它们并遇到实例化的性能问题

所以主要的问题是：

在我的案例中出现这些性能问题的原因是什么？为什么它们通常会发生？

在这种情况下我应该忘记继承吗？

这就是这些类的一般外观：

template <class t>
class Matrix

protected:
    union 
        struct
        
            unsigned int w, h;
        ;
        struct
        
            unsigned int n, m;
        ;
    ;

    /** Changes flow of accessing `v` array members */
    bool transposed;

    /** Matrix values array */
    t* v;

public:
    ~Matrix() 
        delete[] v;
    ;
    Matrix() : v, transposed(false) ;

    // Copy
    Matrix(const Matrix<t>& m) : w(m.w), h(m.h), transposed(m.transposed) 
        v = new t[m.w * m.h];
        for (unsigned i = 0; i < m.g_length(); i++)
           v[i] = m.g_v()[i];
    ;

    // Constructor from array
    Matrix(unsigned _w, unsigned _h, t _v[], bool _transposed = false) : w(_w), h(_h), transposed(_transposed) 
       v = new t[_w * _h];
       for (unsigned i = 0; i < _w * _h; i++)
           v[i] = _v[i];
    ;

    /** Gets matrix array */
    inline t* g_v() const  return v; 
    /** Gets matrix values array size */
    inline unsigned g_length() const  return w * h; 

    // Other constructors, operators, and methods.




template<class t>
class SquareMatrix : public Matrix<t> 
public:
    SquareMatrix() : Matrix<t>() ;
    SquareMatrix(const Matrix<t>& m) : Matrix<t>(m) ;

    SquareMatrix(unsigned _s, t _v[], bool _transpose) : Matrix<t>(_s, _s, _v, _transpose) ;
    // Others...


template<class t>
class Matrix4 : public SquareMatrix<t> 
public:
    Matrix4() : SquareMatrix<t>() ;
    Matrix4(const Matrix<t>& m) : SquareMatrix<t>(m) 

    Matrix4(t _v[16], bool _transpose) : SquareMatrix<t>(4, _v, _transpose) ;
    // Others...

为了进行测试，我使用了这个

void test(std::ofstream& f, char delim, std::function<void(void)> callback) 
    auto t1 = std::chrono::high_resolution_clock::now();
    callback();
    auto t2 = std::chrono::high_resolution_clock::now();
    f << std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count() << delim;
    //std::cout << "test took " << std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count() << " microseconds\n";

性能问题

使用单个类初始化，没有问题 - 每个类几乎每次都在 5 微秒以下。但是后来我决定扩大初始化的数量，并且出现了一些麻烦

我对每个测试运行 100 次，数组长度为 500

1。使用默认构造函数初始化类

Raw results

我刚刚测试了数组的初始化

结果是（以微秒为单位的平均时间）：

矩阵 25.19 SquareMatrix 40.37（37.60% 损失） Matrix4 58.06（SquareMatrix 损失 30.47%）

在这里我们已经可以看到巨大的不同

这是代码

int main(int argc, char** argv)

    std::ofstream f("test.csv");
    f << "Matrix\t" << "SquareMatrix\t" << "Matrix4\n";

    for (int k = 0; k < 100; k++) 
        test(f, '\t', []() 
            Matrix<long double>* a = new Matrix<long double>[500];
            );

        test(f, '\t', []() 
            SquareMatrix<long double>* a = new SquareMatrix<long double>[500];
            );

        test(f, '\n', []() 
            Matrix4<long double>* a = new Matrix4<long double>[500];
            );
    

    f.close();

    return 0;

2。使用默认构造函数和填充进行类初始化

Raw results

测试了类实例数组的初始化，并在之后用自定义矩阵填充它们

结果（以微秒为单位的平均时间）：

矩阵 402.8 SquareMatrix 475（15.20% 损失） Matrix4 593.86（SquareMatrix 损失 20.01%）

代码

int main(int argc, char** argv)

    long double arr[16] = 
       1, 2, 3, 4,
       5, 6, 7, 8,
       9, 10, 11, 12,
       13, 14,15,16
    ;

    std::ofstream f("test.csv");
    f << "Matrix\t" << "SquareMatrix\t" << "Matrix4\n";

    for (int k = 0; k < 100; k++) 
        test(f, '\t', [&arr]() 
            Matrix<long double>* a = new Matrix<long double>[500];
            for (int i = 0; i < 500; i++) 
                a[i] = Matrix<long double>(4, 4, arr);
            );

        test(f, '\t', [&arr]() 
            SquareMatrix<long double>* a = new SquareMatrix<long double>[500];
            for (int i = 0; i < 500; i++) 
                a[i] = SquareMatrix<long double>(4, arr);
            );

        test(f, '\n', [&arr]() 
            Matrix4<long double>* a = new Matrix4<long double>[500];
            for (int i = 0; i < 500; i++) 
                a[i] = Matrix4<long double>(arr);
            );
    

    f.close();

    return 0;

3。用类实例填充向量

Raw results

将自定义矩阵推回向量

结果（以微秒为单位的平均时间）：

矩阵 4498.1 SquareMatrix 4693.93（损失 4.17%） Matrix4 4960.12（其 SquareMatrix 损失 5.37%）

代码

int main(int argc, char** argv)

    long double arr[16] = 
       1, 2, 3, 4,
       5, 6, 7, 8,
       9, 10, 11, 12,
       13, 14,15,16
    ;

    std::ofstream f("test.csv");
    f << "Matrix\t" << "SquareMatrix\t" << "Matrix4\n";

    for (int k = 0; k < 100; k++) 
        test(f, '\t', [&arr]() 
            std::vector<Matrix<long double>> a = std::vector<Matrix<long double>>();
            for (int i = 0; i < 500; i++)
                a.push_back(Matrix<long double>(4, 4, arr));
            );

        test(f, '\t', [&arr]() 
            std::vector<SquareMatrix<long double>> a = std::vector<SquareMatrix<long double>>();
            for (int i = 0; i < 500; i++)
                a.push_back(SquareMatrix<long double>(4, arr));
            );

        test(f, '\n', [&arr]() 
            std::vector<Matrix4<long double>> a = std::vector<Matrix4<long double>>();
            for (int i = 0; i < 500; i++)
                a.push_back(Matrix4<long double>(arr));
            );
    

    f.close();

    return 0;

如果你需要所有的源代码，你可以把here看成matrix.h和matrix.cpp

【问题讨论】：

我假设您测试了一个发布版本启用了优化。不是未优化的调试版本。 ? inline 关键字对于类定义中实现的函数是完全多余的。我建议你不要使用它，因为它并不意味着“内联这个函数”。天哪……我忘了这个！我应该删除问题，因为发布时所有问题都已解决？ @nt4f04uNd 您可能想要做一个适当的基准测试（参见例如google benchmark）。无论如何，您的 callback 调用很可能在发布模式下都被完全优化掉了，因为您从不使用结果。 @nt4f04uNd 我建议留下这个问题，以便其他人寻找“为什么？”并且忘记发布/调试构建差异也可以做一个面子：) 【参考方案1】：

继承真的不影响性能吗？

是的。只要不涉及虚方法，继承就不会影响运行时性能。（因为只有这样你才需要在运行时推断类型并调用相应的虚拟方法覆盖）。事实上，如果你深入了解底层细节，你就会知道 c++ 继承大多只是静态的东西，也就是在编译时完成的。

在我的案例中出现这些性能问题的原因是什么？为什么它们通常会发生？

启用优化后这些似乎效果很好？

在这种情况下我应该忘记继承吗？

在这种对性能敏感的情况下，您唯一需要做的就是避免使用虚方法。

与此问题无关的内容。我已经阅读了你的代码。也许在头文件中实现你的模板会更好？

【讨论】：

我喜欢将实现和声明分开，这对我来说看起来更清晰一些，因为当它们都在头文件中时，它看起来像一团糟，对我来说又是这样就非模板类/函数而言，您必须将实现和声明分开。但是处理 template 定义的常见做法是将它们放在头文件中。也许你已经知道这一点。 why-can-templates-only-be-implemented-in-the-header-file。 替代解决方案不是一个好习惯。至少在这种情况下。

以上是关于继承真的不影响性能吗？的主要内容，如果未能解决你的问题，请参考以下文章