python 3、pandas 和创建新列失败并出现 keyerror

Posted

技术标签:

【中文标题】python 3、pandas 和创建新列失败并出现 keyerror【英文标题】:python 3, pandas and creating new columns fail with keyerror 【发布时间】:2018-04-07 04:54:15 【问题描述】:

我一直在数据框上使用 apply 方法来创建新列。所以,如果我有一个看起来像这样的 df:

stdf.columns
Index(['Username', 'First Name', 'Last Name', 'Class', 'Screens Typed','Time Spent', 'Avg Speed', 'Avg Acc'],  dtype='object')

我一直在使用这样的语法来创建新列

stdf['uid'] = stdf['Username'].apply(lambda x: x[0:6]) + "-" + stdf['First Name'] + "-" + stdf['Last Name']

今天在使用相同的方法创建新列时,我在新列名上遇到了 keyerror

stdf['truSpeed'] = stdf['nSpeed'].apply(lambda x: x * .1 * stdf["truAcc"])

是的,“nSpeed”和“truAcc”确实作为列存在。

Index(['Username', 'First Name', 'Last Name', 'Class', 'Screens Typed', 'Time Spent', 'Avg Speed', 'Avg Acc', 'truTime', 'uid', 'truAcc',

'nSpeed'], dtype='object')

keyerror 指向 'truSpeed 标识符。 所以我的问题是为什么熊猫现在告诉我在尝试创建新列时我有一个键错误,而它过去总是创建新列?

一定还有其他一些我没有看到的错误。

这是几乎完整的回溯

KeyError                                  Traceback (most recent call last)
/home/david/dev/msc/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 'truSpeed'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/internals.py in set(self, item, value, check)
   3667         try:
-> 3668             loc = self.items.get_loc(item)
   3669         except KeyError:

/home/david/dev/msc/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 'truSpeed'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-18-35d20ff4edf0> in <module>()
      4 stdf['nSpeed'] = stdf['Avg Speed'].apply(lambda x: int(x.split(" ")[0]))
      5 print(stdf.columns)
----> 6 stdf['truSpeed'] = stdf['nSpeed'].apply(lambda x: x * .1 * stdf["truAcc"])
      7 # stdf['truSpeed']
      8 # print(stdf.columns)

/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
   2417         else:
   2418             # set column
-> 2419             self._set_item(key, value)
   2420 
   2421     def _setitem_slice(self, key, value):

/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/frame.py in _set_item(self, key, value)
   2484         self._ensure_valid_index(value)
   2485         value = self._sanitize_column(key, value)
-> 2486         NDFrame._set_item(self, key, value)
   2487 
   2488         # check if we are modifying a copy

/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/generic.py in _set_item(self, key, value)
   1498 
   1499     def _set_item(self, key, value):
-> 1500         self._data.set(key, value)
   1501         self._clear_item_cache()
   1502 

/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/internals.py in set(self, item, value, check)
   3669         except KeyError:
   3670             # This item wasn't present, just insert at end
-> 3671             self.insert(len(self.items), item, value)
   3672             return
   3673 

/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/internals.py in insert(self, loc, item, value, allow_duplicates)
   3770 
   3771         block = make_block(values=value, ndim=self.ndim,
-> 3772                            placement=slice(loc, loc + 1))
   3773 
   3774         for blkno, count in _fast_count_smallints(self._blknos[loc:]):

/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/internals.py in make_block(values, placement, klass, ndim, dtype, fastpath)
   2683                      placement=placement, dtype=dtype)
   2684 
-> 2685     return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
   2686 
   2687 # TODO: flexible with index=None and/or items=None

/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/internals.py in __init__(self, values, placement, ndim, fastpath)
    107             raise ValueError('Wrong number of items passed %d, placement '
    108                              'implies %d' % (len(self.values),
--> 109                                              len(self.mgr_locs)))
    110 
    111     @property

ValueError: Wrong number of items passed 58, placement implies 1

【问题讨论】:

当我尝试分配框架的子集而不是系列时出现此错误df['new_col'] = df[df['some_col']=='SomeValue'] 【参考方案1】:
stdf['truSpeed'] = stdf['nSpeed'].apply(lambda x: x * .1 * stdf["truAcc"])

应该是

stdf['truSpeed'] = stdf.eval('nSpeed * truAcc * .1')

或者

stdf['truSpeed'] = stdf['nSpeed'] * stdf['truAcc'] * .1

或者用缓慢的方式

stdf['truSpeed'] = stdf.apply(lambda x: x['nSpeed'] * x['truAcc'] * .1, axis=1)

【讨论】:

piRSquare:谢谢。 df.eval 语法对我来说是新的。我想我可能一直在看旧的 pandas 文档。列表末尾的旧慢方式仍然通过 keyerror。我现在猜测有一个不同的异常正在通过,并且 keyerror 正在掩盖它。在查看了 apply lambda 语法后,我意识到我不需要 lambda。我认为我的大脑陷入了类似范例的 r-data 框架中——你上面给出的第三种语法完全符合我的需要。【参考方案2】:

感谢 piRSquared,语法更简单。如评论中所述, df.eval 语法是新的并且有效。但是,似乎“适合”excel 电子表格使用的范式的语法是第三种语法

stdf['truSpeed'] = stdf['nSpeed'] * stdf['truAcc'] * .1

我认为最初生成的 keyerror 一定是由其他错误引起的,因为使用标识符“truSpeed”只是 find 在数据框中创建新列。

【讨论】:

以上是关于python 3、pandas 和创建新列失败并出现 keyerror的主要内容,如果未能解决你的问题,请参考以下文章

python – Pandas使用groupby中的count来创建新列

在 Pandas 的特定位置创建新列 [重复]

Python如何在pandas数据框中提取[]括号内的指定字符串并创建一个具有布尔值的新列

为多索引 Panda 数据框创建基于另一列的新列

根据多个条件将新列添加到 Python Pandas DataFrame [重复]

如果 ID 存在于其他数据框中,则 Python Pandas 数据框在新列中添加“1”