在 Pandas 中使用 groupby 函数时如何解决“keyerror”?
Posted
技术标签:
【中文标题】在 Pandas 中使用 groupby 函数时如何解决“keyerror”?【英文标题】:how do i resolve "keyerror" while using groupby function in Pandas? 【发布时间】:2020-04-07 08:41:27 【问题描述】:我正在尝试将我的数据集与“驱动轮”、“车身样式”和“价格”进行分组。我得到了关键错误。我的代码是。 (我已经进口了熊猫)
url="https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"
df=pd.read_csv(url)
df_test=df['drive-wheels:','body-style:','price:']
df_grp=df_test.groupby(['drive-wheels:','body-style:'], as_index= False).mean()
df_pivot=df_grp.pivot(index='drive-wheels:',columns='body-style')
我收到了这个错误。我尝试了各种方法,例如删除列之间的空间。我是熊猫的新手。所以如果有人可以帮助我,我会很高兴
D:\SOFTWARE\IllustratorPortable\anc\lib\site-packages\pandas\core\indexes\base.py in
get_loc(self,key, method, tolerance)
2601 try:
-> 2602 return self._engine.get_loc(key)
2603 except KeyError:
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: ('drive-wheels:', 'body-style:', 'price:')
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-8-a14bda9f1cf1> in <module>
1 url="https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"
2 df=pd.read_csv(url)
----> 3 df_test=df['drive-wheels:','body-style:','price:']
4 df_grp=df_test.groupby(['drive-wheels:','body-style:'], as_index= False).mean()
5 df_pivot=df_grp.pivot(index='drive-wheels:',columns='body-style')
D:\SOFTWARE\IllustratorPortable\anc\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2915 if self.columns.nlevels > 1:
2916 return self._getitem_multilevel(key)
-> 2917 indexer = self.columns.get_loc(key)
2918 if is_integer(indexer):
2919 indexer = [indexer]
D:\SOFTWARE\IllustratorPortable\anc\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2602 return self._engine.get_loc(key)
2603 except KeyError:
-> 2604 return self._engine.get_loc(self._maybe_cast_indexer(key))
2605 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2606 if indexer.ndim > 1 or indexer.size > 1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: ('drive-wheels:', 'body-style:', 'price:')
【问题讨论】:
df['drive-wheels:','body-style:','price:']
-> df[['drive-wheels:','body-style:','price:']]
在你的第二行之后,你能print(df.head())
不,我不能@user1558604
【参考方案1】:
该文件不包含标题。
Attribute: Attribute Range:
------------------ -----------------------------------------------
1. symboling: -3, -2, -1, 0, 1, 2, 3.
2. normalized-losses: continuous from 65 to 256.
3. make: alfa-romero, audi, bmw, chevrolet, dodge, honda,
isuzu, jaguar, mazda, mercedes-benz, mercury,
mitsubishi, nissan, peugot, plymouth, porsche,
renault, saab, subaru, toyota, volkswagen, volvo
4. fuel-type: diesel, gas.
5. aspiration: std, turbo.
6. num-of-doors: four, two.
7. body-style: hardtop, wagon, sedan, hatchback, convertible.
8. drive-wheels: 4wd, fwd, rwd.
9. engine-location: front, rear.
10. wheel-base: continuous from 86.6 120.9.
11. length: continuous from 141.1 to 208.1.
12. width: continuous from 60.3 to 72.3.
13. height: continuous from 47.8 to 59.8.
14. curb-weight: continuous from 1488 to 4066.
15. engine-type: dohc, dohcv, l, ohc, ohcf, ohcv, rotor.
16. num-of-cylinders: eight, five, four, six, three, twelve, two.
17. engine-size: continuous from 61 to 326.
18. fuel-system: 1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi.
19. bore: continuous from 2.54 to 3.94.
20. stroke: continuous from 2.07 to 4.17.
21. compression-ratio: continuous from 7 to 23.
22. horsepower: continuous from 48 to 288.
23. peak-rpm: continuous from 4150 to 6600.
24. city-mpg: continuous from 13 to 49.
25. highway-mpg: continuous from 16 to 54.
26. price: continuous from 5118 to 45400.
你可以使用 iloc
df_test = df.iloc[[7,6,25]]
或设置列
df.columns = ['one', 'two', 'three']
【讨论】:
我添加了它,但仍然得到相同的错误 headers =["symboling:", "normalized-losses:","make:","fuel-type:","aspiration: ","门数:", "车身样式:","驱动轮:","发动机位置:","轮距:","长度:", "宽度:" ,高度:”,“整备质量:”,“发动机类型:”,“缸数:”,“发动机尺寸:”,“燃油系统:”,“缸径:”,“冲程:” , "压缩比:","马力:","峰值转速:","city-mpg:","highway-mpg:","price:"] df.columns=headers跨度> 【参考方案2】:您正在加载的数据,不包含标题:
所以
df_test = df['drive-wheels:', 'body-style:', 'price:']
失败。
UPDATE:选择多列使用:
df_test = df[['drive-wheels:', 'body-style:', 'price:']]
【讨论】:
我添加了标题。但我仍然遇到同样的错误。 headers =["symboling:", "normalized-losses:","make:","fuel-type:","aspiration:","num-of-doors:", "body-style:" “驱动轮:”,“发动机位置:”,“轴距:”,“长度:”,“宽度:”,“高度:”,“整备质量:”,“发动机类型:” ,"气缸数:","发动机尺寸:","燃油系统:","缸径:","冲程:", "压缩比:","马力:","峰值转速:","city-mpg:","highway-mpg:","price:"] df.columns=headers【参考方案3】:我也一直在研究同一个数据集。 我添加了标题
path = "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"
headers = ["symboling","normalized-losses","make","fuel-type","aspiration", "num-of-
doors","body-style",
"drive-wheels","engine-location","wheel-base",
"length","width","height","curb-weight","engine-type",
"num-of-cylinders", "engine-size","fuel-system","bore","stroke","compression-ratio","horsepower",
"peak-rpm","city-mpg","highway-mpg","price"]
path_read = pd.read_csv(path,names = headers) 汽车_df = pd.DataFrame(path_read) 汽车_df
在此之后,您必须首先处理数据集中缺失的数据。 之后应该可以工作了,在选择列时添加另一对方括号
temp_df = automobile_df[["body-style","drive-wheels","price"]]
现在应该不再是问题了。
【讨论】:
以上是关于在 Pandas 中使用 groupby 函数时如何解决“keyerror”?的主要内容,如果未能解决你的问题,请参考以下文章
pandas中groupby,apply,lambda函数使用
pandas使用groupby函数基于指定分组变量对dataframe数据进行分组使用size函数计算分组数据中每个分组样本的个数
Pandas`agc`列表,“AttributeError / ValueError:函数不减少”
pandas使用groupby.first函数groupby.nth函数获取每个组中的第一个值实战:groupby.first函数和groupby.nth函数对比(对待NaN的差异)