如何用随机字典值填充熊猫数据框列

Posted 2023-02-23

技术标签:

【中文标题】如何用随机字典值填充熊猫数据框列【英文标题】：How to fill pandas dataframe columns with random dictionary values 【发布时间】：2018-05-07 22:37:34 【问题描述】：

我是 Pandas 的新手，我想玩随机文本数据。我正在尝试向 DataFrame df 添加 2 个新列，每个新列将由从字典中随机选择的键 (newcol1) + 值 (newcol2) 填充。

countries = 'Africa':'Ghana','Europe':'France','Europe':'Greece','Asia':'Vietnam','Europe':'Lithuania'

我的 df 已经有 2 列，我想要这样的：

    Year Approved Continent    Country
0   2016      Yes    Africa      Ghana
1   2016      Yes    Europe  Lithuania
2   2017       No    Europe     Greece

我当然可以使用 for 或 while 循环来填充 df['Continent'] 和 df['Country'] 但我觉得 .apply() 和 np.random.choice 可能会为此提供一个更简单、更受欢迎的解决方案。

【问题讨论】：

【参考方案1】：

是的，你是对的。您可以将np.random.choice 与map 一起使用：

df

    Year Approved
0   2016      Yes
1   2016      Yes
2   2017       No

df['Continent'] = np.random.choice(list(countries), len(df))
df['Country'] = df['Continent'].map(countries)

df

    Year Approved Continent    Country
0   2016      Yes    Africa      Ghana
1   2016      Yes      Asia    Vietnam
2   2017       No    Europe  Lithuania

您从country 键列表中随机选择len(df) 键的数量，然后使用country 字典作为映射器来查找与先前选择的键对应的国家/地区。

【讨论】：

太棒了！我以为我必须一口气完成，现在我了解了各种两步映射选项。 @ozaarm 请记住，对于每个早于 3.6 的版本，您都需要一个两步解决方案。要在 python3.6 上一步完成，您可以使用 random.choices 作为另一个答案显示，但速度仍然是我关心的问题。【参考方案2】：

您也可以尝试使用DataFrame.sample():

df.join(
    pd.DataFrame(list(countries.items()), columns=["continent", "country"])
    .sample(len(df), replace=True)
    .reset_index(drop=True)
)

如果您的大陆国家地图已经是数据框，则可以更快。

如果您使用的是 Python 3.6，另一种方法是使用 random.choices()：

df.join(
    pd.DataFrame(choices([*countries.items()], k=len(df)), columns=["continent", "country"])
)

random.choices() 与numpy.random.choice() 类似，不同之处在于您可以传递键值元组对列表，而numpy.random.choice() 仅接受一维数组。

【讨论】：

以上是关于如何用随机字典值填充熊猫数据框列的主要内容，如果未能解决你的问题，请参考以下文章

应用字典查找功能来比较熊猫数据框列

Python：如何从熊猫系列的字典中获取值

如何用条件填充缺失值？

如何在熊猫数据框列中选择一系列值？

如何用'0'填充日期时间字符串以在熊猫中正确排序

熊猫如何通过数据框列值获取行索引