防止熊猫数据框标题行在for语句中重复
Posted
技术标签:
【中文标题】防止熊猫数据框标题行在for语句中重复【英文标题】:Preventing pandas data frame header row from repeating in for statement 【发布时间】:2016-10-28 13:09:21 【问题描述】:我正在遍历管道以打印出名为 safety
的类的 20 个信息量最大的特性。
classnum_saf = 3
inds = np.argsort(clf_3.named_steps['clf'].coef_[classnum_saf, :])[-20:]
for i in inds:
f = feature_names[i]
c = clf_3.named_steps['clf'].coef_[classnum_saf, [i]]
print(f,c)
output = 'features':f, 'coefficients':c
df = pd.DataFrame(output, columns = ['features', 'coefficients'])
print(df)
我想要一个只输出一个标题的数据帧,但我返回的这个输出似乎一遍又一遍地重复标题,因为它正在迭代 [i]。
1800 [-8.73800344]
features coefficients
0 1800 -8.738003
hr [-8.73656027]
features coefficients
0 hr -8.73656
wa [-8.7336777]
features coefficients
0 wa -8.733678
1400 [-8.72197545]
features coefficients
0 1400 -8.721975
hrwa [-8.71952656]
features coefficients
0 hrwa -8.719527
perimeter [-8.71173264]
features coefficients
0 perimeter -8.711733
response [-8.67388885]
features coefficients
0 response -8.673889
analysis [-8.65460329]
features coefficients
0 analysis -8.654603
00 [-8.58386785]
features coefficients
0 00 -8.583868
raw [-8.56148006]
features coefficients
0 raw -8.56148
run [-8.51374794]
features coefficients
0 run -8.513748
factor [-8.50725691]
features coefficients
0 factor -8.507257
200 [-8.50334896]
features coefficients
0 200 -8.503349
file [-8.39990841]
features coefficients
0 file -8.399908
pb [-8.38173753]
features coefficients
0 pb -8.381738
mar [-8.21304343]
features coefficients
0 mar -8.213043
1998 [-8.21239836]
features coefficients
0 1998 -8.212398
signal [-8.02426499]
features coefficients
0 signal -8.024265
area [-8.01782987]
features coefficients
0 area -8.01783
98 [-7.3166918]
features coefficients
0 98 -7.316692
我如何返回data frame
,例如:
features coefficients
0 1800 -8.738003
.. ... ...
18 area -8.01783
19 98 -7.316692
现在当我返回 print(d,f) 时,它显示以下最高值:
1800 [-8.73800344]
hr [-8.73656027]
wa [-8.7336777]
1400 [-8.72197545]
hrwa [-8.71952656]
perimeter [-8.71173264]
response [-8.67388885]
analysis [-8.65460329]
00 [-8.58386785]
raw [-8.56148006]
run [-8.51374794]
factor [-8.50725691]
200 [-8.50334896]
file [-8.39990841]
pb [-8.38173753]
mar [-8.21304343]
1998 [-8.21239836]
signal [-8.02426499]
area [-8.01782987]
98 [-7.3166918]
我研究了几个类似的问题here、here 和here,但似乎并没有直接解决我的问题。
提前谢谢你,还在学习。
【问题讨论】:
【参考方案1】:我尝试模拟一些数据,您可以在循环的每个步骤中将list
附加到L
,最后从L
创建df
:
L = []
classnum_saf = 3
inds = np.argsort(clf_3.named_steps['clf'].coef_[classnum_saf, :])[-20:]
for i in inds:
f = feature_names[i]
c = clf_3.named_steps['clf'].coef_[classnum_saf, [i]]
print(f,c)
#add [0] for removing list of list (it works nice if len of f[i] == 1)
L.append([c[i], f[i][0]])
df = pd.DataFrame(L, columns = ['features', 'coefficients'])
print(df)
示例:
import pandas as pd
f = [[1],[2],[3]]
c = ['a','b','c']
L = []
for i in range(3):
# print(f[i],c[i])
#swap c and f
L.append([c[i], f[i][0]])
print (L)
[['a', 1], ['b', 2], ['c', 3]]
df = pd.DataFrame(L, columns = ['features', 'coefficients'])
print(df)
features coefficients
0 a 1
1 b 2
2 c 3
【讨论】:
感谢您的帮助!你的 c 是一个列表,而我的是一个 numpy.ndarray。这可以解释我在运行时的错误,“索引 1169 超出了轴 0 大小为 1 的范围”。我想我需要把 c 变成一个列表? 你可以尝试,但我认为它可以与 ndarray 一起使用。最好的办法是尝试将 f 和 c 更改为 ndarrays 并进行测试。以上是关于防止熊猫数据框标题行在for语句中重复的主要内容,如果未能解决你的问题,请参考以下文章