从文本文件中提取不均匀的数据

Posted 2023-02-22

技术标签:

【中文标题】从文本文件中提取不均匀的数据【英文标题】：Extracting Uneven Data From Text File 【发布时间】：2017-02-14 19:45:00 【问题描述】：

我有一个数据文件，其中每行包含九个不同变量的值：x、y、z、v_x、v_y、v_z、m、ID ，V。我正在编写一个程序来从数据文件中提取 x、y 和 z 值。我对这种类型的过程相对较新，我在这样做时遇到了问题，因为这些值并不总是相同的长度。部分数据文件的示例在这里（仅x、y、z 列）：

2501.773926 1701.783081 211.1383057

1140.961426 4583.300781 322.4959412 

1194.471313 5605.764648 1377.315552 

506.1424866 6037.965332 1119.67041  

213.5106354 5788.785156 2340.610352 

59.43727493 5914.666016 2357.921143 

1223.028564 4292.818848 3007.292725 

4445.61377  3684.48999  2903.169189 

5649.732422 4596.819824 2661.301025 

5741.396973 5503.06543  2412.082031 

4806.246094 5587.194336 2676.126465 

4855.521973 5482.893066 2743.014648 

5190.890625 5399.349121 1549.1698

请注意，在大多数情况下，每个数字的长度是 11 个空格，但情况并非总是如此。我写的代码在这里：

#include <cmath>
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <string>
#include <vector>

using namespace std;

// data created by Gadget2
const string gadget_data("particles_64cubed.txt");

int main()


cout << "GADGET2: Extracting Desired Data From ASCII File." << endl;

// declaring vectors to store the data
int bins = 135000000; // 512^3 particles = 134,217,728 particles
vector<double> x(bins), y(bins), z(bins);


// read the data file
ifstream data_file(gadget_data.c_str());
if (data_file.fail()) 

    cerr << "Cannot open " << gadget_data << endl;
    exit(EXIT_FAILURE);
 
else
    cout << "Reading data file: " << gadget_data << endl;
string line;
int particles = 0;
while (getline(data_file, line)) 

    string x_pos = line.substr(0, 11);
    double x_val = atof(x_pos.c_str());    // atof converts string to double
    string y_pos = line.substr(12, 11);
    double y_val = atof(y_pos.c_str());
    string z_pos = line.substr(24, 11);
    double z_val = atof(z_pos.c_str());

    if (particles < bins) 
    
        x[particles] = x_val;
        y[particles] = y_val;
        z[particles] = z_val;
        ++particles;
    

data_file.close();
cout << "Stored " << particles << " particles in positions_64.dat" << endl;

vector<double> x_values, y_values, z_values;
for (int i = 0; i < particles; i++) 

    x_values.push_back(x[i]);
    y_values.push_back(y[i]);
    z_values.push_back(z[i]);


// write desired data to file
ofstream new_file("positions_64.dat");
for (int i = 0; i < x_values.size(); i++)
    new_file << x_values[i] << '\t' << y_values[i] << '\t' << z_values[i] << endl;
new_file.close();
cout << "Wrote desired data to file: " << "positions_64.dat" << endl;

由于每个值的长度不恒定，代码显然会失败。有谁知道实现这一目标的另一种方法？也许不是子字符串并跨越特定长度的字符，而是将值抓取到空格的东西？任何帮助，将不胜感激。谢谢！

【问题讨论】：

您找到分隔符，然后根据它而不是固定数字对其进行子串化。查看this answer，它提供了基于用户提供的分隔符分割字符串的功能。 【参考方案1】：

我注意到您已经在使用 ifstream 和 getline 读取文件。你为什么退回到将行切割成 N 个字符的块并atof'ing 他们？我的意思是，iostream 可以读取和写入整数、双精度数等，最好在 cin 和 cout 的示例中看到。

有一个istringstream 课程可以轻松帮助您：

std::istringstream input(line); // line is std::string from getline()
double x,y,z;
if(input >> x >> y >> z) // just this! and it's already a simple error check
    ; // do something with x,y,z
else
    ; // handle the error

它应该可以正常工作，因为您已经进行了逐行读取，并且因为数据由空格分隔，>> 运算符默认会忽略这些数据。

仅供参考：istringstream

【讨论】：

这完成了工作，非常感谢！唯一的问题是我丢失了原始数据中的一些重要数字。生成的数据文件最多有两位小数，而不是最多七位。有什么想法吗？ @LeighK：我不记得运营商 >> 有任何问题。它应该按顺序读取每个数字并尝试将其放入双精度数中，因此您应该观察到仅由双精度数本身引起的精度损失。我认为这可能是因为操作员<< 的一些默认精度设置，所以在你写入文件的那一刻。请尝试new_file << setprecision(16) << x_values[i] 或类似的东西。请参见 this example I just did 并查看未配置的 cout 如何将双精度数修剪为 6 位数。我想你的 ofstream new_file 也是这样。

以上是关于从文本文件中提取不均匀的数据的主要内容，如果未能解决你的问题，请参考以下文章