在 C++ 中通过 csv 文件的更快方法

Posted 2023-02-22

技术标签:

【中文标题】在 C++ 中通过 csv 文件的更快方法【英文标题】：Faster way to go through csv file in c++ 【发布时间】：2013-11-07 06:13:45 【问题描述】：

我在一个文件中有数百万条记录，需要进行一些计算。为此，我有 java 程序和 c++ 程序的相同副本，但 Java 的执行速度比 c++ 快。我切换到 C++ 的主要原因是执行多线程以使程序运行得更快。但是当我比较 java 和 c++ 之间的 1 个线程工作时，java 在一半的时间内完成了这项工作。

我需要解决这个问题。 C++ 应该会更快，但性能很差。

一些好的提醒会很好，所以我可以研究并尝试修复它。

谢谢

这是用逗号分隔的数据生成对象的类

//Parser.cpp 
#include "Parser.h"
#include "PriceBar.h"
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <stdlib.h>


using namespace std;

vector<PriceBar> Parser :: parseFile(string file)

    string STRING;
    vector<PriceBar> bars;
    ifstream infile;
    infile.open (file.c_str());
    int a=0;
    string token;


while(getline(infile,STRING)) // To get you all the lines.

    vector<string> data;
    istringstream ss(STRING);
    while(getline(ss, token, ',')) 
                data.push_back(token);
            
    //cout<<data[4]<<endl;

    if(!data[1].empty())
                            //cout << "if is working" << endl;
                            double open = atof(data[1].c_str());
                            double high = atof(data[2].c_str());
                            double low = atof(data[3].c_str());
                            double close = atof(data[4].c_str());
                            bars.push_back(PriceBar(open, high, low, close));
                        //end of if

//end of while
infile.close();
//cout << "parser is done " << bars[2].getOpen() <<endl;
//cout << bars.size() << endl;
return bars;

价格栏类

/*
 * PriceBar.cpp
 *
 *  Created on: Nov 5, 2013
 *      Author: hansaka
 */

#include <iostream>
#include <string>
#include <vector>
#include "PriceBar.h"

using namespace std;

PriceBar :: PriceBar(double open, double high, double low, double close)
this -> open = open;
this -> high = high;
this -> low = low;
this -> close = close;


double PriceBar :: getOpen() 
    return open;

void PriceBar :: setOpen(double open) 
    this -> open = open;

double PriceBar :: getHigh() 
    return high;

void PriceBar :: setHigh(double high) 
    this -> high = high;

double PriceBar :: getLow() 
    return low;

void PriceBar :: setLow(double low) 
    this -> low = low;

double PriceBar :: getClose() 
    return close;

void PriceBar :: setClose(double close) 
    this -> close = close;

主文件

#include <iostream>
#include <vector>
#include <string>
#include "PriceBar.h"
#include "Parser.h"
#include <ctime>

using namespace std;

int main() 
Parser p;

//getting the counter ready
time_t tstart, tend;

//Starting the time
tstart = time(0);

vector<string> path;
path.push_back("file.csv");

for( vector<string>::const_iterator it = path.begin(); it != path.end(); ++it )
  //    cout << *it << endl;
    vector<PriceBar> priceBars = p.parseFile(*it);
    //priceBars = p.parseFile(*it);

//      cout << "done" << endl;

    double maxHigh = 0.0;
    double maxLow = 0.0;
    double maxOpen = 0.0;
    double maxClose = 0.0;
    double maxVolume = 0.0;
    double current = 0.0;

  //     cout << "hippy " << priceBars[2].getOpen() <<endl;
   int size = priceBars.size();
 //      cout << "size = " << size << endl;


    for (int j=0;j<size;j++) 
        current = priceBars[j].getOpen();
        if (current > maxOpen) 
            maxOpen = current;
        
    //end of pricebar for

    current = 0.0;
    for (int j=0;j<size;j++) 
        current = priceBars[j].getOpen();
        if (current > maxHigh) 
            maxHigh = current;
        
    
    current = 0.0;
    for (int j=0;j<size;j++) 
        current = priceBars[j].getOpen();
        if (current > maxLow) 
            maxLow = current;
        
    
    current = 0.0;
    for (int j=0;j<size;j++) 
        current = priceBars[j].getOpen();
        if (current > maxClose) 
            maxClose = current;
        
    

                cout << "MaxHigh =" << maxOpen << " MaxLow = " << maxHigh
                        << " MaxHigh =" << maxLow << " MaxLow = " << maxClose << endl;


//end of it for
cout << "DONE" << endl;

//Ending the time count
tend = time(0);

cout << " It took " << difftime(tend, tstart) << " second(s).";

return 0;

我一直在编辑这段代码，所以没有多少 cmets，我刚刚注释掉了一些代码部分以供参考，对此我深表歉意。

【问题讨论】：

直接将代码从 Java 复制到 C++ 很少会带来更好的性能。在 C++ 中，分配通常比在 GCed 语言中昂贵得多。最好的办法是尝试减少这些。我怀疑运行时间由 IO 控制，因此我不确定多线程如何提供帮助。无论如何，您是否在启用优化的情况下进行编译？你编译时是否开启了优化？不是 C++ 表现不佳，而是您的 C++ 程序表现不佳。仅仅因为您将一些 java 代码翻译成 C++ 代码并不意味着您应该期待更好的性能。例如，为什么您使用std::vector 来表示只有5 个元素的data？为什么不使用固定大小的序列，例如 std::array 甚至 C 样式的数组？ 【参考方案1】：

我会做几件事：

std::vector<std::string>

std::vector<double>

main()

std::ios_base::sync_with_stdio(false);

这是我将如何编写函数：

std::vector<PriceBar> Parser::parseFile(std::string const& file) 
    std::vector<PriceBar> bars;
    std::ifstream         infile(file.c_str());
    std::istringstream    lin;
    std::vector<double>   columns;

    for (std::string line, topic, value; std::getline(infile, line); ) 
        lin.clear();
        lin.str(line);
        columns.clear();
        for (std::getline(lin, topic, ','); getline(ss, value, ',')) 
            columns.push_back(value.empty()? 0.0: std::atof(value.c_str()));
        
        if (columns.size() == 4) 
            bars.push_back(PriceBar(columns[0], columns[1], columns[2], columns[3]));
        
    
    return bars;

我不认为处理多个线程会有多大帮助。读取只有一百万行左右的小文件并不能保证相应的复杂性。

【讨论】：

谢谢，这只是一个开始。用java编写的大代码大约需要9天才能完成处理。所以我希望它们是线程化的，所以至少我想通过使用一个具有多个处理器的节点或多个节点来减少一半的处理时间。所有处理都依赖于数百万条记录。

以上是关于在 C++ 中通过 csv 文件的更快方法的主要内容，如果未能解决你的问题，请参考以下文章