使用自定义类时,多线程泛型矩阵添加非常慢
Posted
技术标签:
【中文标题】使用自定义类时,多线程泛型矩阵添加非常慢【英文标题】:Multithreaded generic matrices addition very slow when using custom classes 【发布时间】:2018-10-22 19:24:44 【问题描述】:我使用多线程和顺序算法实现了两个通用矩阵的相加。我用两个包含实数(双精度数)的大矩阵(2000x2000)测试了我的程序,结果非常好。手术很快就完成了。后来我实现了一个表示复数的类,并尝试用两个矩阵重复相同的场景,我发现即使是两个 50x50 矩阵也需要很长时间才能完成整个过程。我应该怎么做才能提高执行的持续时间?
这是创建线程的方法(首先我创建了两个一维数组以便更容易地为每个线程提供其起点和终点):
template<typename T, typename Func>
Matrix<T> *calculateLinearDistribution(Matrix<T> *matrix1,
Matrix<T> *matrix2,
Func operation,
int nThreads)
const int n = matrix1->getN(), m = matrix2->getM(), totalNumbers = n * m;
Matrix<T> *result = new Matrix<T>(n, m);
T *matrix1Unidim = new T[totalNumbers];
T *matrix2Unidim = new T[totalNumbers];
convertMatrixToUnidimensionalArray(matrix1, matrix1Unidim);
convertMatrixToUnidimensionalArray(matrix1, matrix2Unidim);
if (totalNumbers < nThreads)
nThreads = totalNumbers;
const int quantityPerThread = totalNumbers / nThreads;
int rest = totalNumbers % nThreads;
int start = 0, end = 0;
std::vector<std::thread> threads;
std::chrono::milliseconds startTime = std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::system_clock::now().time_since_epoch());
for (int i = 0; i < nThreads; i++)
end += quantityPerThread;
if (rest > 0)
end++;
rest--;
threads.push_back(std::thread(MultithreadedMethods<T, Func>::linearElementsDistribution, &matrix1Unidim[0],
&matrix2Unidim[0], result, start, end, operation));
start = end;
for (int i = 0; i < nThreads; i++)
threads[i].join();
std::chrono::milliseconds endTime = std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::system_clock::now().time_since_epoch());
std::ofstream out(linearElemensStatisticsFile, std::ios_base::app);
std::chrono::milliseconds time = endTime - startTime;
out << "Dimensiune matrice: " << matrix1->getN() << "x" << matrix1->getM()
<< " | Nr. threads: " << nThreads << " | Timp de executie: " << time.count() << std::endl;
out.close();
delete[] matrix1Unidim;
delete[] matrix2Unidim;
return result;
这是提供给线程的函数:
template<typename T, typename Func>
void MultithreadedMethods<T, Func>::linearElementsDistribution(T *matrix1,
T *matrix2,
Matrix<T> *result,
int start,
int end,
Func operation)
const int m = result->getM();
for (int i = start; i < end; i++)
result->getElements()[i / m][i % m] = operation(matrix1[i], matrix2[i]);
这是我用实数运行过程的地方(非常快):
Matrix<double> *linearDistributionResult = calculateLinearDistribution(matrix1,
matrix2,
[](double a, double b)
return a +
b;
, nThreads);
最后,这是我尝试使用复数的不好部分,与顺序结果相比,它需要很多时间甚至失败......
Matrix<ComplexNumber> *linearDistributionResult = calculateLinearDistribution(matrix1,
matrix2,
[](ComplexNumber a,
ComplexNumber b)
return ComplexNumber(
a.getRealComponent() +
b.getRealComponent(),
a.getImaginaryComponent() +
b.getImaginaryComponent());
, nThreads);
当然这是顺序实现(我想指出的是,与实数相比,当我使用复数时,这也很慢):
template<typename T, typename Func>
Matrix<T> *calculateSequentialResult(Matrix<T> *matrix1,
Matrix<T> *matrix2,
Func operation)
const int n = matrix1->getN(), m = matrix1->getM();
Matrix<T> *result = new Matrix<T>(n, m);
std::chrono::milliseconds startTime = std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::system_clock::now().time_since_epoch());
for (int i = 0; i < n; i++)
for (int j = 0; j < m; j++)
result->getElements()[i][j] = operation(matrix1->getElements()[i][j], matrix2->getElements()[i][j]);
std::chrono::milliseconds endTime = std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::system_clock::now().time_since_epoch());
std::ofstream out(sequentialElementsStatistics, std::ios_base::app);
std::chrono::milliseconds time = endTime - startTime;
out << "Dimensiune matrice: " << matrix1->getN() << "x" << matrix1->getM()
<< " | Nr. threads: 1 | Timp de executie: " << time.count() << std::endl;
out.close();
return result;
更新 这是使用 Very Sleepy 来分析执行时的结果:
ComplexNumber 类:
class ComplexNumber
private:
double realComponent;
double imaginaryComponent;
public:
ComplexNumber()
ComplexNumber(const ComplexNumber &complexNumber);
double getRealComponent() const;
ComplexNumber(double realComponent, double imaginaryComponent);
void setRealComponent(double realComponent);
double getImaginaryComponent() const;
void setImaginaryComponent(double imaginaryComponent);
friend std::ostream &operator<<(std::ostream &os, const ComplexNumber &complexNumber);
;
and the definition:
double ComplexNumber::getRealComponent() const
return realComponent;
void ComplexNumber::setRealComponent(double realComponent)
ComplexNumber::realComponent = realComponent;
double ComplexNumber::getImaginaryComponent() const
return imaginaryComponent;
void ComplexNumber::setImaginaryComponent(double imaginaryComponent)
ComplexNumber::imaginaryComponent = imaginaryComponent;
ComplexNumber::ComplexNumber(double realComponent, double imaginaryComponent) : realComponent(realComponent),
imaginaryComponent(imaginaryComponent)
ComplexNumber::ComplexNumber(const ComplexNumber &complexNumber)
this->imaginaryComponent = complexNumber.imaginaryComponent;
this->realComponent = complexNumber.realComponent;
std::ostream &operator<<(std::ostream &os, const ComplexNumber &complexNumber)
if (complexNumber.imaginaryComponent == 0)
os << std::to_string(complexNumber.realComponent);
else if (complexNumber.realComponent == 0)
os << std::to_string(complexNumber.imaginaryComponent) + "i";
else
os << std::to_string(complexNumber.realComponent) + ((complexNumber.imaginaryComponent < 0) ?
("-" + std::to_string(complexNumber.imaginaryComponent) +
"i") :
("+" + std::to_string(complexNumber.imaginaryComponent) +
"i"));
return os;
已解决
问题是我使用正则表达式来解析文件中的复数,而且速度非常慢。更换它们后,我设法获得了正确的行为。
【问题讨论】:
这个问题更适合 CodeReview 网站。 您是否尝试通过代码分析器来查找任何热点? 您是否尝试过使用 openmp 而不是自定义线程? 这是作业,我不能使用 Openmp 我建议尽可能使用 const 引用,例如在您的 lambda 中: [](const ComplexNumber &a, const ComplexNumber &b) ... 【参考方案1】:重写这个:
struct ComplexNumber
double real; // *maybe* = 0
double imaginary; // *maybe* = 0
ComplexNumber( double r, double i ):real(r), imaginary(i)
ComplexNumber() = default;
ComplexNumber(const ComplexNumber &complexNumber) = default;
ComplexNumber& operator=(const ComplexNumber &complexNumber) = default;
;
std::ostream &operator<<(std::ostream &os, const ComplexNumber &complexNumber);
<<
可能很慢,不需要成为朋友。停止使用访问器(尤其是不可内联的)访问您的字段。
如果你真的需要访问器,至少让它们内联并在标题中。但在这里它们毫无意义。
即使我不需要它们,我也会写 operator+
之类的,因为为什么不呢。
struct ComplexNumber
double real; // *maybe* = 0
double imaginary; // *maybe* = 0
ComplexNumber( double r, double i ):real(r), imaginary(i)
ComplexNumber() = default;
ComplexNumber(const ComplexNumber &complexNumber) = default;
ComplexNumber& operator=(const ComplexNumber &complexNumber) = default;
ComplexNumber& operator+=( ComplexNumber const& o )&
real += o.real;
imaginary += o.imaginary;
return *this;
ComplexNumber& operator-=( ComplexNumber const& o )&
real -= o.real;
imaginary -= o.imaginary;
return *this;
ComplexNumber& operator*=( ComplexNumber const& o )&
ComplexNumber r real*o.real - imaginary*o.imaginary, real*o.imaginary + imaginary*o.real ;
*this = r;
return *this;
friend ComplexNumber operator+( ComplexNumber lhs, ComplexNumber const& rhs )
lhs += rhs;
return lhs;
friend ComplexNumber operator-( ComplexNumber lhs, ComplexNumber const& rhs )
lhs -= rhs;
return lhs;
friend ComplexNumber operator*( ComplexNumber lhs, ComplexNumber const& rhs )
lhs *= rhs;
return lhs;
;
这是脑残的样板,但我不能证明至少没有这些是ComplexNumber
类型。 (我省略了/
,因为关于如何处理被零除的重要决定仍然存在)。
无论如何,一旦我们不再隐藏如何从工作代码中访问数据,优化器现在就有机会进行实际优化。
【讨论】:
问题是我使用正则表达式来解析文件中的复数,而且速度很慢。更换它们后,我设法获得了正确的行为。以上是关于使用自定义类时,多线程泛型矩阵添加非常慢的主要内容,如果未能解决你的问题,请参考以下文章
如何在Mexfile中的matlab(矩阵,单元格)和c ++(向量或自定义类)之间正确转换变量