防止熊猫数据框标题行在for语句中重复

Posted

技术标签:

【中文标题】防止熊猫数据框标题行在for语句中重复【英文标题】:Preventing pandas data frame header row from repeating in for statement 【发布时间】:2016-10-28 13:09:21 【问题描述】:

我正在遍历管道以打印出名为 safety 的类的 20 个信息量最大的特性。

classnum_saf = 3
inds = np.argsort(clf_3.named_steps['clf'].coef_[classnum_saf, :])[-20:]
for i in inds: 
   f = feature_names[i]
   c = clf_3.named_steps['clf'].coef_[classnum_saf, [i]]
   print(f,c)
   output = 'features':f, 'coefficients':c
   df = pd.DataFrame(output, columns = ['features', 'coefficients'])
   print(df)

我想要一个只输出一个标题的数据帧,但我返回的这个输出似乎一遍又一遍地重复标题,因为它正在迭代 [i]。

   1800 [-8.73800344]
   features  coefficients
   0     1800     -8.738003
   hr [-8.73656027]
   features  coefficients
   0       hr      -8.73656
   wa [-8.7336777]
   features  coefficients
   0       wa     -8.733678
   1400 [-8.72197545]
   features  coefficients
   0     1400     -8.721975
   hrwa [-8.71952656]
   features  coefficients
   0     hrwa     -8.719527
   perimeter [-8.71173264]
   features  coefficients
   0  perimeter     -8.711733
   response [-8.67388885]
   features  coefficients
   0  response     -8.673889
   analysis [-8.65460329]
   features  coefficients
   0  analysis     -8.654603
   00 [-8.58386785]
   features  coefficients
   0       00     -8.583868
   raw [-8.56148006]
   features  coefficients
   0      raw      -8.56148
   run [-8.51374794]
   features  coefficients
   0      run     -8.513748
   factor [-8.50725691]
   features  coefficients
   0   factor     -8.507257
   200 [-8.50334896]
   features  coefficients
   0      200     -8.503349
   file [-8.39990841]
   features  coefficients
   0     file     -8.399908
   pb [-8.38173753]
   features  coefficients
   0       pb     -8.381738
   mar [-8.21304343]
   features  coefficients
   0      mar     -8.213043
   1998 [-8.21239836]
   features  coefficients
   0     1998     -8.212398
   signal [-8.02426499]
   features  coefficients
   0   signal     -8.024265
   area [-8.01782987]
   features  coefficients
   0     area      -8.01783
   98 [-7.3166918]
   features  coefficients
   0       98     -7.316692

我如何返回data frame,例如:

          features     coefficients
   0      1800          -8.738003
   ..     ...           ...
   18     area          -8.01783
   19     98            -7.316692

现在当我返回 print(d,f) 时,它显示以下最高值:

   1800 [-8.73800344]
   hr [-8.73656027]
   wa [-8.7336777]
   1400 [-8.72197545]
   hrwa [-8.71952656]
   perimeter [-8.71173264]
   response [-8.67388885]
   analysis [-8.65460329]
   00 [-8.58386785]
   raw [-8.56148006]
   run [-8.51374794]
   factor [-8.50725691]
   200 [-8.50334896]
   file [-8.39990841]
   pb [-8.38173753]
   mar [-8.21304343]
   1998 [-8.21239836]
   signal [-8.02426499]
   area [-8.01782987]
   98 [-7.3166918]

我研究了几个类似的问题here、here 和here,但似乎并没有直接解决我的问题。

提前谢谢你,还在学习。

【问题讨论】:

【参考方案1】:

我尝试模拟一些数据,您可以在循环的每个步骤中将list 附加到L,最后从L 创建df

L = []
classnum_saf = 3
inds = np.argsort(clf_3.named_steps['clf'].coef_[classnum_saf, :])[-20:]
for i in inds: 
   f = feature_names[i]
   c = clf_3.named_steps['clf'].coef_[classnum_saf, [i]]
   print(f,c)
   #add [0] for removing list of list (it works nice if len of f[i] == 1)
   L.append([c[i], f[i][0]])

df = pd.DataFrame(L, columns = ['features', 'coefficients'])
print(df) 

示例:

import pandas as pd

f = [[1],[2],[3]]
c = ['a','b','c']

L = []
for i in range(3): 
#   print(f[i],c[i])
   #swap c and f
   L.append([c[i], f[i][0]])

print (L)
[['a', 1], ['b', 2], ['c', 3]]

df = pd.DataFrame(L, columns = ['features', 'coefficients'])
print(df)  

  features  coefficients
0        a             1
1        b             2
2        c             3

【讨论】:

感谢您的帮助!你的 c 是一个列表,而我的是一个 numpy.ndarray。这可以解释我在运行时的错误,“索引 1169 超出了轴 0 大小为 1 的范围”。我想我需要把 c 变成一个列表? 你可以尝试,但我认为它可以与 ndarray 一起使用。最好的办法是尝试将 f 和 c 更改为 ndarrays 并进行测试。

以上是关于防止熊猫数据框标题行在for语句中重复的主要内容,如果未能解决你的问题,请参考以下文章

基于if语句的for循环,错误消息= Series的真值不明确[重复]

使用 for 语句从数据框中附加列表

如何检查熊猫数据框行在各个列中是不是具有某些值,可扩展性

如何在 for 循环中附加熊猫数据框中的行?

我如何在python中使用for循环制作熊猫数据框对象

如何使用 for 循环过滤熊猫数据框中的观察结果?