c++动态内存分配-矩阵乘法

Posted 2023-02-17

技术标签:

【中文标题】c++动态内存分配-矩阵乘法【英文标题】：c++ dynamic memory allocation - matrix multiplication 【发布时间】：2021-10-25 18:41:09 【问题描述】：

我正在尝试进行大型矩阵乘法，例如1000x1000。不幸的是，它只适用于非常小的矩阵。对于大的，程序只是打开，仅此而已 - 没有结果。代码如下：

#include <iostream>

using namespace std;

int main() 
    int matrix_1_row;
    int matrix_1_column;
    matrix_1_row = 10;
    matrix_1_column = 10;

    int** array_1 = new int* [matrix_1_row];
    // dynamically allocate memory of size matrix_1_column for each row
    for (int i = 0; i < matrix_1_row; i++)
    
        array_1[i] = new int[matrix_1_column];
    
    // assign values to allocated memory
    for (int i = 0; i < matrix_1_row; i++)
    
        for (int j = 0; j < matrix_1_column; j++)
        
            array_1[i][j] = 3;
        
    

    int matrix_2_row;
    int matrix_2_column;
    matrix_2_row = 10;
    matrix_2_column = 10;
    // dynamically create array of pointers of size matrix_2_row
    int** array_2 = new int* [matrix_2_row];
    // dynamically allocate memory of size matrix_2_column for each row
    for (int i = 0; i < matrix_2_row; i++)
    
        array_2[i] = new int[matrix_2_column];
    
    // assign values to allocated memory
    for (int i = 0; i < matrix_2_row; i++)
    
        for (int j = 0; j < matrix_2_column; j++)
        
            array_2[i][j] = 2;
        
    

    // Result
    int result_row = matrix_1_row;
    int result_column = matrix_2_column;
    // dynamically create array of pointers of size result_row
    int** array_3 = new int* [result_row];
    // dynamically allocate memory of size result_column for each row
    for (int i = 0; i < result_row; i++)
    
        array_3[i] = new int[result_column];
    


    // Matrix multiplication
    for (int i = 0; i < matrix_1_row; i++)
    
        for (int j = 0; j < matrix_2_column; j++)
        
            array_3[i][j] = 0;
            for (int k = 0; k < matrix_1_column; k++)
            
                array_3[i][j] += array_1[i][k] * array_2[k][j];
            
        
    


    //RESULTS
    for (int i = 0; i < result_row; i++)
    
        for (int j = 0; j < result_column; j++)
        
            std::cout << array_3[i][j] << "\t";
        
    


    // deallocate memory using delete[] operator 1st matrix
    for (int i = 0; i < matrix_1_row; i++)
    
        delete[] array_1[i];
    
    delete[] array_1;
    // deallocate memory using delete[] operator 2nd matrix
    for (int i = 0; i < matrix_2_row; i++)
    
        delete[] array_2[i];
    
    delete[] array_2;
    // deallocate memory using delete[] operator result
    for (int i = 0; i < result_row; i++)
    
        delete[] array_3[i];
    
    delete[] array_3;

    return 0;

有人知道如何解决吗？我在什么时候出错了？我使用了指针，动态内存分配。

【问题讨论】：

您可以将数据存储在std::vector<std::vector<double>>中，而不是手动分配/删除内存。首先：将所有这些包装在一个类中，这样您就不必手动调用delete[]。其次，矩阵应该只对整个数据使用单个分配，然后进行数学计算来计算索引（同样，隐藏在类中）。第三，对于具体的矩阵乘法，注意你访问内存的顺序，因为CPU缓存很重要。关于我的第三点，请参见例如***.com/a/7395643/1405588 @Eugene Nah，单个std::vector<double> 要好得多，可能包装在 Matrix 类中请发布一个无效的变体，而不是一个有效的变体。（据我所知，这应该在几秒钟内完成。） 【参考方案1】：

不要使用直接命名为矩阵的数组，而是尝试一些简单且可扩展的方法，然后进行优化。像这样的：

class matrix 
 
    private: 
    // sub-matrices 
    std::shared_ptr<matrix> c11;     
    std::shared_ptr<matrix> c12;     
    std::shared_ptr<matrix> c21;     
    std::shared_ptr<matrix> c22; 
 
    // properties 
    const int n; 
    const int depth; 
    const int maxDepth; 
 
    // this should be shared-ptr too. Too lazy. 
    int data[16]; // lowest level matrix = 4x4 without sub matrix 
 
 
    // multiplication memory 
    std::shared_ptr<std::vector<matrix>> m; 
     
    public: 
    matrix(const int nP=4,const int depthP=0,const int maxDepthP=1): 
        n(nP),depth(depthP),maxDepth(maxDepthP) 
     
        if(depth<maxDepth) 
         
            // allocate c11,c22,c21,c22 
            // allocate m1,m2,m3,...m7 
         
     
 
    // matrix-matrix multiplication 
    matrix operator * (const matrix & mat) 
     
        // allocate result 
 
        // multiply 
        if(depth!=maxDepth) 
         
            // Strassen's multiplication algorithm 
            *m[0] = (*c11 + *c22) * (*mat.c11 + *mat.c22); 
            ... 
            *m[6] = (*c12 - *c22) * (*mat.c21 + *mat.c22); 
 
            *c11 = *m[0] + *m[3] - *m[4] + *m[6]; 
            .. 
            *c22 = .. 
         
        else 
         
            // innermost submatrices (4x4) multiplied normally 
            result.data[0] = data[0]*mat.data[0] + .... 
            ... 
            result.data[15]= ... 
         
        return result; 
     
 
    // matrix-matrix adder 
    matrix operator + (const matrix & mat) 
     
        // allocate result 
 
        // add 
        if(depth!=maxDepth) 
         
            *result.c11 = *c11 + *mat.c11; 
            *result.c12 = *c12 + *mat.c12; 
            *result.c21 = *c21 + *mat.c21; 
            *result.c22 = *c22 + *mat.c22; 
         
        else 
         
            // innermost matrix 
            result.data[0] = ... 
         
        return result; 
     
;

通过这种方式，它的时间复杂度更低，而且看起来仍然易于阅读。在它工作之后，你可以在类内使用单块矩阵数组来优化速度，最好只在根矩阵分配一次并使用

std::span

用于从较新 C++ 版本的子矩阵访问。它甚至可以轻松并行化，因为每个矩阵可以将其工作分配给至少 4 个线程，它们可以分配给 16 个线程、64 个线程等。但是当然，线程过多与分配过多一样糟糕，应该以更好的方式进行优化方式。

【讨论】：

shared_ptrs 在这里似乎有点浪费。真的有分享吗？以防多线程（大尺寸）。此外，具有新 C++ 功能的数组视图 (std::span) 的单个内存块会更好。否则 unique_ptr 是好的。

以上是关于c++动态内存分配-矩阵乘法的主要内容，如果未能解决你的问题，请参考以下文章