如何避免 Pandas 中的 SettingWithCopyWarning？

Posted 2023-03-11

技术标签:

【中文标题】如何避免 Pandas 中的 SettingWithCopyWarning？【英文标题】：How to avoid SettingWithCopyWarning in pandas? 【发布时间】：2017-04-25 20:56:07 【问题描述】：

我想使用 pandas 将列的类型转换为 int。这是源代码：

# CustomerID is missing on several rows. Drop these rows and encode customer IDs as Integers.
cleaned_data = retail_data.loc[pd.isnull(retail_data.CustomerID) == False]
cleaned_data['CustomerID'] = cleaned_data.CustomerID.astype(int)

这会引发以下警告：

SettingWithCopyWarning：试图在一个副本上设置一个值从 DataFrame 切片

如何避免此警告？有没有更好的方法将 CustomerID 的类型转换为 int？我在 python 3.5 上。

【问题讨论】：

可能重复：***.com/q/38809796/190597 【参考方案1】：

合二为一loc:

retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'] = retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'].astype(int)

例子：

import pandas as pd
import numpy as np

retail_data = pd.DataFrame(np.random.rand(4,1)*10, columns=['CustomerID'])
retail_data.iloc[2,0] = np.nan
print(retail_data)

   CustomerID
0    9.872067
1    5.645863
2         NaN
3    9.008643

retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'] = retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'].astype(int)

       CustomerID
0         9.0
1         5.0
2         NaN
3         9.0

您会注意到列的 dtype 仍然是浮点数，因为 np.nan 不能在 int 列中编码。

如果您真的想删除这些行而不更改底层的retail_data，请创建一个实际的copy()：

cleaned_data = retail_data.loc[~retail_data.CustomerID.isnull()].copy()

【讨论】：

以上是关于如何避免 Pandas 中的 SettingWithCopyWarning？的主要内容，如果未能解决你的问题，请参考以下文章