numpy.genfromtxt 导入元组而不是数组
Posted
技术标签:
【中文标题】numpy.genfromtxt 导入元组而不是数组【英文标题】:numpy.genfromtxt imports tuples instead of arrays 【发布时间】:2014-07-29 08:26:03 【问题描述】:我正在努力学习 Python 和 Numpy,所以请多多包涵。我正在使用 numpy.genfromtxt 将 CSV 文件导入矩阵。 CSV 如下所示:
Time(min),Nm,Speed,Power,Distance,Rpm,Bpm,interval,Altitude,Rate,Incline,Temp,PowerBalance,LeftTorqueEffectiveness,RightTorqueEffectiveness,getLeftPedalSmoothness,getRightPedalSmoothness,getCombinedPedalSmoothness,THb,SmO2,km
0.016666668,,4.3555064,0,0.002,0,118,1,684.3,0.0,0.0,14.71,50,-1.0,-1.0,-1.0,-1.0,-1.0,311.72,311.72
0.033333335,,4.3555064,20,0.002,0,119,1,684.3,0.0,0.0,14.71,50,-1.0,-1.0,-1.0,-1.0,-1.0,311.72,311.72
0.05,,4.444291,13,0.004,0,119,1,684.3,0.0,0.0,14.71,50,-1.0,-1.0,-1.0,-1.0,-1.0,311.72,311.72
现在我运行:
matrixCsv = np.genfromtxt(open(csvFile, "rb"), delimiter=',', \
missing_values=0,skip_header=1,dtype=float,\
usecols=(0,2,3,4,5,6,7,8,9,10,11,17),names=True)
我得到:
[ (0.033333335, 4.3555064, 20.0, 0.002, 0.0, 119.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0)
(0.05, 4.444291, 13.0, 0.004, 0.0, 119.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0)
(0.06666667, 4.4781966, 16.0, 0.006, 0.0, 120.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0)
...,
在我看来,这就像封装在数组中的元组。但为什么是元组?我知道 numpy 数组/矩阵需要是同质的,并且 numpy 从非同质数据中生成一个元组。但为什么我的数据不均匀?没看懂……
【问题讨论】:
【参考方案1】:您对如何使用skip_header
和names
感到困惑。读取数据并使用第一行作为变量名的正确方法是:
In [185]:
np.genfromtxt('temp.csv', delimiter=',', \
missing_values=0,skip_header=0,dtype=float,\
usecols=(0,2,3,4,5,6,7,8,9,10,11,17),names=True)
Out[185]:
array([ (0.016666668, 4.3555064, 0.0, 0.002, 0.0, 118.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0),
(0.033333335, 4.3555064, 20.0, 0.002, 0.0, 119.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0),
(0.05, 4.444291, 13.0, 0.004, 0.0, 119.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0)],
dtype=[('Timemin', '<f8'), ('Speed', '<f8'), ('Power', '<f8'), ('Distance', '<f8'), ('Rpm', '<f8'), ('Bpm', '<f8'), ('interval', '<f8'), ('Altitude', '<f8'), ('Rate', '<f8'), ('Incline', '<f8'), ('Temp', '<f8'), ('getCombinedPedalSmoothness', '<f8')])
它不是tuple
的数组,而是structured array
。 skip_header=1
将使用第一行数据作为名称,这可能不是您想要的(看看您是如何遗漏第一行数据的?)。
你也可以去掉名字,把数据读入普通的numpy
array
。
In [186]:
np.genfromtxt('temp.csv', delimiter=',', \
missing_values=0,skip_header=1,dtype=float,\
usecols=(0,2,3,4,5,6,7,8,9,10,11,17))
Out[186]:
array([[ 1.66666680e-02, 4.35550640e+00, 0.00000000e+00,
2.00000000e-03, 0.00000000e+00, 1.18000000e+02,
1.00000000e+00, 6.84300000e+02, 0.00000000e+00,
0.00000000e+00, 1.47100000e+01, -1.00000000e+00],
[ 3.33333350e-02, 4.35550640e+00, 2.00000000e+01,
2.00000000e-03, 0.00000000e+00, 1.19000000e+02,
1.00000000e+00, 6.84300000e+02, 0.00000000e+00,
0.00000000e+00, 1.47100000e+01, -1.00000000e+00],
[ 5.00000000e-02, 4.44429100e+00, 1.30000000e+01,
4.00000000e-03, 0.00000000e+00, 1.19000000e+02,
1.00000000e+00, 6.84300000e+02, 0.00000000e+00,
0.00000000e+00, 1.47100000e+01, -1.00000000e+00]])
【讨论】:
以上是关于numpy.genfromtxt 导入元组而不是数组的主要内容,如果未能解决你的问题,请参考以下文章