如何通过在两行之间划分特定列中的值并保持其他列不变来在熊猫数据框中创建新行？

Posted 2023-02-23

技术标签:

【中文标题】如何通过在两行之间划分特定列中的值并保持其他列不变来在熊猫数据框中创建新行？【英文标题】：How to create a new row in pandas dataframe by dividing values in a specific column between two rows and keeping other columns intact? 【发布时间】：2022-01-18 20:26:37 【问题描述】：

我有一个 pandas 数据框 df，如下所示：

Country Continent   Capital City    Year    Indicator   Value   Unit
0   Nepal   Asia    Kathmandu   2015    Population  3   million
1   Nepal   Asia    Kathmandu   2020    Population  5   million
2   Germany Europe  Berlin  2015    Population  4   million
3   Germany Europe  Berlin  2020    Population  6   million

df.to_dict()如下图：

'Country': 0: 'Nepal', 1: 'Nepal', 2: 'Germany', 3: 'Germany',
 'Continent': 0: 'Asia', 1: 'Asia', 2: 'Europe', 3: 'Europe',
 'Capital City': 0: 'Kathmandu', 1: 'Kathmandu', 2: 'Berlin', 3: 'Berlin',
 'Year': 0: 2015, 1: 2020, 2: 2015, 3: 2020,
 'Indicator': 0: 'Population',
  1: 'Population',
  2: 'Population',
  3: 'Population',
 'Value': 0: 3, 1: 5, 2: 4, 3: 6,
 'Unit': 0: 'million', 1: 'million', 2: 'million', 3: 'million'

数据框由尼泊尔和德国两国首都分别在 2015 年和 2020 年的人口数据组成。

我想创建两个新行，显示 2015 年到 2020 年之间的人口增长率（例如，尼泊尔的 5/3 即 1.67 和德国的 6/4 即 1.5）。这些行需要在同一个数据框中。在新行中，各个国家/地区的 Country、Continent 和 Capital City 列应保持不变。年份值保持2020年，指标名称需为“人口增长率”，单位需为“乘以2015年值”。它应该如下所示：

Country Continent   Capital City    Year    Indicator   Value   Unit
0   Nepal   Asia    Kathmandu   2015    Population  3   million
1   Nepal   Asia    Kathmandu   2020    Population  5   million
2   Germany Europe  Berlin  2015    Population  4   million
3   Germany Europe  Berlin  2020    Population  6   million
4   Nepal   Asia    Kathmandu   2020    Population growth rate  1.666667    times 2015 value
5   Germany Europe  Berlin  2020    Population growth rate  1.5 times 2015 value

如何在原始数据框中附加人口增长率的情况下创建这两个新行？

【问题讨论】：

【参考方案1】：

先用groupby 然后append

out = df.groupby(['Country','Continent','Capital City']).agg('Year':'last','Value':lambda x : x.iloc[-1]/x.iloc[0]).reset_index()
out['Indicator'] = 'Population growth rate'
df = df.append(out)
df
Out[16]: 
   Country Continent Capital City  ...               Indicator     Value     Unit
0    Nepal      Asia    Kathmandu  ...              Population  3.000000  million
1    Nepal      Asia    Kathmandu  ...              Population  5.000000  million
2  Germany    Europe       Berlin  ...              Population  4.000000  million
3  Germany    Europe       Berlin  ...              Population  6.000000  million
0  Germany    Europe       Berlin  ...  Population growth rate  1.500000      NaN
1    Nepal      Asia    Kathmandu  ...  Population growth rate  1.666667      NaN
[6 rows x 7 columns]

【讨论】：

谢谢！这似乎奏效了。我在上面分享的数据是更大数据集的快照，其中还包含 2014、2016、2017、2018、2019、2021 等年份的值。您能否也分享一个替代解决方案，您可以在其中指定 2020 年和 2015 年进行除法在 lambda 函数中而不是索引位置？ @hbstha123

out = df.loc[df['Year'].isin([2015,2020])].groupby(['Country','Continent','Capital City']).agg('Year':'last','Value':lambda x : x.iloc[-1]/x.iloc[0]).reset_index()

谢谢！这是一个很好的解决方案。

以上是关于如何通过在两行之间划分特定列中的值并保持其他列不变来在熊猫数据框中创建新行？的主要内容，如果未能解决你的问题，请参考以下文章