将不同长度的numpy数组保存到同一个csv文件的最佳方法是啥?
Posted
技术标签:
【中文标题】将不同长度的numpy数组保存到同一个csv文件的最佳方法是啥?【英文标题】:What is the best way to save numpy arrays of different length to the same csv file?将不同长度的numpy数组保存到同一个csv文件的最佳方法是什么? 【发布时间】:2014-11-09 17:09:13 【问题描述】:我正在使用一维 numpy 数组,首先进行一些数学运算,然后将所有内容保存到单个 csv 文件中。数据集通常具有不同的长度,我无法将它们拼合在一起。这是我能想到的最好的方法,但必须有更优雅的方法。
import numpy as np
import pandas as pd
import os
array1 = np.linspace(1,20,10)
array2 = np.linspace(12,230,10)
array3 = np.linspace(7,82,20)
array4 = np.linspace(6,55,20)
output1 = np.column_stack((array1.flatten(),array2.flatten())) #saving first array set to file
np.savetxt("tempfile1.csv", output1, delimiter=',')
output2 = np.column_stack((array3.flatten(),array4.flatten())) # doing it again second array
np.savetxt("tempfile2.csv", output2, delimiter=',')
a = pd.read_csv('tempfile1.csv') # use pandas to read both files
b = pd.read_csv("tempfile2.csv")
merged = b.join(a, rsuffix='*') # merge with panda for single file
os.remove('tempfile1.csv')
os.remove("tempfile2.csv") # delete temp files
merged.to_csv('savefile.csv', index=False) # save merged file
【问题讨论】:
【参考方案1】:您可以只使用concat
并传递参数axis=1
,将数组附加为列:
In [49]:
array1 = np.linspace(1,20,10)
array2 = np.linspace(12,230,10)
array3 = np.linspace(7,82,20)
array4 = np.linspace(6,55,20)
pd.concat([pd.DataFrame(array1), pd.DataFrame(array2), pd.DataFrame(array3), pd.DataFrame(array4)], axis=1)
Out[49]:
0 0 0 0
0 1.000000 12.000000 7.000000 6.000000
1 3.111111 36.222222 10.947368 8.578947
2 5.222222 60.444444 14.894737 11.157895
3 7.333333 84.666667 18.842105 13.736842
4 9.444444 108.888889 22.789474 16.315789
5 11.555556 133.111111 26.736842 18.894737
6 13.666667 157.333333 30.684211 21.473684
7 15.777778 181.555556 34.631579 24.052632
8 17.888889 205.777778 38.578947 26.631579
9 20.000000 230.000000 42.526316 29.210526
10 NaN NaN 46.473684 31.789474
11 NaN NaN 50.421053 34.368421
12 NaN NaN 54.368421 36.947368
13 NaN NaN 58.315789 39.526316
14 NaN NaN 62.263158 42.105263
15 NaN NaN 66.210526 44.684211
16 NaN NaN 70.157895 47.263158
17 NaN NaN 74.105263 49.842105
18 NaN NaN 78.052632 52.421053
19 NaN NaN 82.000000 55.000000
然后你可以像平常一样把它写到 csv 中
pd.concat([pd.DataFrame(array1), pd.DataFrame(array2), pd.DataFrame(array3), pd.DataFrame(array4)], axis=1).to_csv('savefile.csv', index=False)
【讨论】:
【参考方案2】:您可能会找到一个使用numpy.savetxt
的不错的解决方案,并且可能有一个比您的更简单的pandas
解决方案,但在这种情况下,使用标准库csv
和itertools
的解决方案非常简洁:
In [45]: import csv
In [46]: from itertools import izip_longest # Use zip_longest in Python 3.
In [47]: rows = izip_longest(array3, array4, array1, array2, fillvalue='')
In [48]: with open("out.csv", "w") as f:
....: csv.writer(f).writerows(rows)
....:
In [49]: !cat out.csv
7.0,6.0,1.0,12.0
10.947368421052632,8.5789473684210531,3.1111111111111112,36.222222222222221
14.894736842105264,11.157894736842106,5.2222222222222223,60.444444444444443
18.842105263157894,13.736842105263158,7.3333333333333339,84.666666666666657
22.789473684210527,16.315789473684212,9.4444444444444446,108.88888888888889
26.736842105263158,18.894736842105264,11.555555555555555,133.11111111111111
30.684210526315788,21.473684210526315,13.666666666666668,157.33333333333331
34.631578947368425,24.05263157894737,15.777777777777779,181.55555555555554
38.578947368421055,26.631578947368421,17.888888888888889,205.77777777777777
42.526315789473685,29.210526315789473,20.0,230.0
46.473684210526315,31.789473684210527,,
50.421052631578945,34.368421052631575,,
54.368421052631575,36.94736842105263,,
58.315789473684205,39.526315789473685,,
62.263157894736842,42.10526315789474,,
66.21052631578948,44.684210526315788,,
70.15789473684211,47.263157894736842,,
74.10526315789474,49.842105263157897,,
78.05263157894737,52.421052631578945,,
82.0,55.0,,
【讨论】:
以上是关于将不同长度的numpy数组保存到同一个csv文件的最佳方法是啥?的主要内容,如果未能解决你的问题,请参考以下文章