为啥 MEX 代码比 matlab 代码慢得多？

Posted 2023-03-06

技术标签:

【中文标题】为啥 MEX 代码比 matlab 代码慢得多？【英文标题】：Why is MEX code much slower than the matlab code?为什么 MEX 代码比 matlab 代码慢得多？ 【发布时间】：2017-03-16 11:54:14 【问题描述】：

首先，请不要将此问题标记为重复，因为一旦详细研究就会清楚。

我正在尝试实现正交匹配追踪算法。为此，我需要找到两个大小为 144*14596 和 144*1 的矩阵的点积，如下所示

        clc,clear;

        load('E');
        load('R');
        load('P');


        sparse=zeros(14596,2209);

            dictionary=tem2;

            atoms=zeros(size(dictionary,1),size(dictionary,2));
            coefs=zeros(size(dictionary,2),1);
        tic
            %Normalize the dictionary 
            for index=1:size(dictionary,2)
               dictionary(:,index)=dictionary(:,index)./norm(dictionary(:,index)); 
            end
            D=dictionary;

      /*  NOTE: I tried for ii=1:5 to check the difference in computational time*/

         for ii=1:2209

            r=tem4(:,ii);
            dictionary=D;
            index=[];
            count=0;
            t=5;
            while(t>1e-15 && count~=144)
    /***************Problem lies here**************/
        % inner_product=dictionary'*r; %Dot Product (Should be slow but is fast)
        inner_product=dotProduct(dictionary',r); %(Should be fast but is very slow)
/****************************************************/
                [m,ind]=max(abs(inner_product));

                index=[index ind];
                atoms(:,ind)=dictionary(:,ind); %Select atom which has maximum inner product
                dictionary(:,ind)=0;
                at=atoms(:,index);
                x=(at'*at)\(at'*r);
                coefs(index)=x;
                r=r-at*x;
                t=norm(r);
                count=count+1;
            end
                sparse(:,ii)=coefs;

         end

        sig=D*sparse;
        final=uint8((repmat((((max(tem4))-min(tem4))./((max(sig)-min(sig)))),size(tem4,1),1).*(sig-repmat(min(sig),size(tem4,1),1)))+repmat(min(tem4),size(tem4,1),1));  

        toc

但我面临的问题是，在 MATLAB 中使用以下代码找出点积需要花费大量时间（如分析器报告中所示）。

inner_product=字典'*r;

为了减少计算时间，我写了如下所示的MEX代码来求点积：

/***********************************************************************
 *Program to create a MEX-file to find the dot product of matrices     *
 *Created by: Navdeep Singh                                            * 
 *@Copyright Reserved                                                  * 
 ***********************************************************************/

#include "mex.h"

void dot_prod(double *m1,double *m2, double *t,size_t M,size_t N, size_t M2,size_t N2 )
   
    int i,j,k;
    double s;

    for(i=0;i<M;i++)
       for(k=0;k<N2;k++)
           s=0;
            for(j=0;j<N;j++)
               s=s+*((m1+i)+(M*j))*(*(m2+(j+M2*k)));
            
            *((t+i)+(M*k))=s;
        
    
  

void mexFunction(int nlhs,mxArray *plhs[],int nrhs, const mxArray *prhs[])
   double *mat1,*mat2,*out;
    size_t rows_mat1,cols_mat1,rows_mat2,cols_mat2;
    mat1=mxGetPr(prhs[0]);
    mat2=mxGetPr(prhs[1]);
    rows_mat1=mxGetM(prhs[0]);
    cols_mat1=mxGetN(prhs[0]);
    rows_mat2=mxGetM(prhs[1]);
    cols_mat2=mxGetN(prhs[1]);
    plhs[0]=mxCreateDoubleMatrix(rows_mat1,cols_mat2,mxREAL);
    out=mxGetPr(plhs[0]);
    dot_prod(mat1,mat2,out,rows_mat1,cols_mat1,rows_mat2,cols_mat2);

但令我惊讶的是，我发现 MEX 解决方案比 MATLAB 中使用的解决方案慢得多，这违背了 MEX 的最终目的。为了知道原因，我在网上搜索了很多，发现了一些有趣的事实，例如：

Matlab: Does calling the same mex function repeatedly from a loop incur too much overhead?

Matlab mex-file with mexCallMATLAB is almost 300 times slower than the corresponding m-file

这些链接表明开销不应该太多，如果有的话，总是第一次调用，因为加载符号表等需要时间。 -- 但与此相反，我发现我的代码会产生大量开销。

我还发现参数的大小并不重要，尽管参数的数量会影响计算时间，但它又是最小的。其中一个链接还建议应该释放动态分配的内存（除了由 matlab 本身分配的内存），但我也没有任何此类分配。

所以请告诉我背后的原因是什么

为什么 MEX 需要大量时间？

有什么办法可以解决？

非常感谢您的帮助。

各种文件都可以在这里找到：

dictionary.m

dotProduct.c

Report MEX

E.mat

R.mat

P.mat

【问题讨论】：

您几乎不可能为点积编写比 MATLAB 更快的代码。你说的线路很慢，是因为单次调用很慢，还是因为你调用了太多次，占总数的很大一部分？此外，mex 文件速度很快，但在创建和传递变量时可能会被偷听到我认为您还没有完全阅读该帖子并且您的答案不清楚。您应该专注于让您的 MATLAB 代码更快，而不是编写 MEX 代码。 MATLAB 中的矩阵乘法和点积比您自己编写要快得多。我可以在您的 MATLAB 代码中看到几个可以向量化的点。 @Navdeep 您没有显示没有 mex 文件的分析器报告。 MATLAB 可能是 2017 年人类已知的最快的矩阵运算，我的评论仍然成立：是什么让你认为你可以让它们更快？ @rayryeng 谢谢。你总是乐于助人。 Matlab 在矩阵乘法和点积方面更快是可以的，那么为什么代码这么慢，因为在 matlab 上运行这段代码大约需要 12 分钟。另外请给我一些提示，哪些是可以矢量化的点，因为大多数时间都用于查找点积。我的最后一个问题是 MEX 的需求是什么，它在哪里使用，如果 MATLAB 在矩阵运算中更快。谢谢。 【参考方案1】：

Matlab 具有高度优化的代码来计算矩阵的点积，

您刚刚编写了一个嵌套 for 循环来计算点积，因此您可以将此 Mex 代码与 matlab 中的“类似嵌套 for 循环”进行比较，然后决定 MEX 代码是更快还是 matlab，

其实matlab不使用嵌套for循环来计算矩阵的点积，

来自 MATLAB 文档：

MEX 文件有几个应用程序：

从 MATLAB 调用大型预先存在的 c/c++ 和 FORTRAN 程序，而无需将它们重写为 MATLAB 函数

用 c/c++ 实现替换对性能至关重要的例程

MEX 文件并不适用于所有应用程序。 MATLAB 是一个高效的环境，其专长是消除使用 C 或 C++ 等编译语言进行耗时的低级编程。一般来说，在 MATLAB 中进行编程。除非您的应用程序需要，否则不要使用 MEX 文件。

EXAMPLE1

【讨论】：

谢谢。你能告诉我们在哪里可以使用 MEX。我现在很困惑。

以上是关于为啥 MEX 代码比 matlab 代码慢得多？的主要内容，如果未能解决你的问题，请参考以下文章