为啥我的熊猫数据框选择的形状是错误的

Posted 2023-02-23

技术标签:

【中文标题】为啥我的熊猫数据框选择的形状是错误的【英文标题】：Why does the shape of the selection of my pandas dataframe is wrong为什么我的熊猫数据框选择的形状是错误的 【发布时间】：2018-12-17 09:54:39 【问题描述】：

我有一个名为 df 的 pandas DataFrame，其中 df.shape 是 (53, 80)，其中索引和列都是 int。

如果我像这样选择第一行，我会得到：

df.loc[0].shape
(80,)

而不是：

(1,80)

然后df.loc[0:0].shape 或df[0:1].shape 都显示正确的形状。

【问题讨论】：

【参考方案1】：

df.loc[0] 返回一个一维 pd.Series 对象，表示单行中的数据，通过索引提取。

df.loc[0:0] 返回一个二维 pd.DataFrame 对象，表示通过切片提取的一行数据框中的数据。

如果打印这些操作的结果，您可以更清楚地看到这一点：

import pandas as pd, numpy as np

df = pd.DataFrame(np.arange(9).reshape(3, 3))

res1 = df.loc[0]
res2 = df.loc[0:0]

print(type(res1), res1, sep='\n')

<class 'pandas.core.series.Series'>
0    0
1    1
2    2
Name: 0, dtype: int32

print(type(res2), res2, sep='\n')

<class 'pandas.core.frame.DataFrame'>
   0  1  2
0  0  1  2

约定遵循 NumPy 索引/切片。这是很自然的，因为 Pandas 是基于 NumPy 数组构建的。

arr = np.arange(9).reshape(3, 3)

print(arr[0].shape)    # (3,), i.e. 1-dimensional
print(arr[0:0].shape)  # (0, 3), i.e. 2-dimensional

【讨论】：

【参考方案2】：

当您调用df.iloc[0] 时，它会选择第一行并且类型为Series，而在其他情况下df.iloc[0:0] 它正在切片行并且类型为dataframe。而Series是根据pandas Series documentation：

带有轴标签的一维ndarray

而dataframe 是二维 (pandas Dataframe documentation)。

尝试运行以下几行以查看差异：

print(type(df.iloc[0]))
# <class 'pandas.core.series.Series'>

print(type(df.iloc[0:0]))
# <class 'pandas.core.frame.DataFrame'>

【讨论】：

以上是关于为啥我的熊猫数据框选择的形状是错误的的主要内容，如果未能解决你的问题，请参考以下文章

熊猫用不同的列python连接数据框列表

如何选择数据框中列的前 3 个值 - 熊猫