Python Pandas:如果条件为真,则将现有列值放入新列
Posted
技术标签:
【中文标题】Python Pandas:如果条件为真,则将现有列值放入新列【英文标题】:Python Pandas: if condition is true, put existing column value into new column 【发布时间】:2021-02-15 07:30:42 【问题描述】:我想修改我的 pandas 数据框,所以如果 can 列值 = 'Group Total',cv1 和 cvs1 的值对于我的数据框中的上述行,同一行被放置在新的 pv1 和 pvs1 列中。如果 pty_n = 'Independent',我希望 pv1 和 pvs1 值与同一行中的 'cv1' 和 'csv1' 值相同。这是一个插图:
不过,我现在收到的内容如下所示:
'rg': 0: 'Oceania', 1: 'Oceania', 2: 'Oceania', 3: 'Oceania', 4: 'Oceania', 5: 'Oceania', 6: 'Oceania', 7: 'Oceania', 8: 'Oceania', 9: 'Oceania', 'ctr_n': 0: 'Australia', 1: 'Australia', 2: 'Australia', 3: 'Australia', 4: 'Australia', 5: 'Australia', 6: 'Australia', 7: 'Australia', 8: 'Australia', 9: 'Australia', 'ctr': 0: '', 1: '', 2: '', 3: '', 4: '', 5: '', 6: '', 7: '', 8: '', 9: '', 'yr': 0: '2019', 1: '2019', 2: '2019', 3: '2019', 4: '2019', 5: '2019', 6: '2019', 7: '2019', 8: '2019', 9: '2019', 'mn': 0: '06', 1: '06', 2: '06', 3: '06', 4: '06', 5: '06', 6: '06', 7: '06', 8: '06', 9: '06', 'sub': 0: '-990', 1: '-990', 2: '-990', 3: '-990', 4: '-990', 5: '-990', 6: '-990', 7: '-990', 8: '-990', 9: '-990', 'cst_n': 0: 'Canberra, ACT', 1: 'Canberra, ACT', 2: 'Canberra, ACT', 3: 'Canberra, ACT', 4: 'Canberra, ACT', 5: 'Canberra, ACT', 6: 'Canberra, ACT', 7: 'Canberra, ACT', 8: 'Canberra, ACT', 9: 'Canberra, ACT', 'cst': 0: '', 1: '', 2: '', 3: '', 4: '', 5: '', 6: '', 7: '', 8: '', 9: '', 'can': 0: 'Ticket Votes', 1: 'SESELJA, Zed', 2: 'GUNNING, Robert', 3: 'Group Total', 4: 'Ticket Votes', 5: 'KYBURZ, Penny', 6: 'DAVIDSON, Emma', 7: 'Group Total', 8: 'Ticket Votes', 9: 'PESEC, Anthony', 'pty_n': 0: 'Liberal', 1: 'Liberal', 2: 'Liberal', 3: 'Liberal', 4: 'The Greens', 5: 'The Greens', 6: 'The Greens', 7: 'The Greens', 8: '\xa0', 9: '\xa0', 'cv1': 0: '21,209', 1: '2,142', 2: '1,001', 3: '24,352', 4: '14,637', 5: '5,719', 6: '875', 7: '21,231', 8: '1,404', 9: '3,225', 'cvs1': 0: '24.15', 1: '2.44', 2: '1.14', 3: '27.73', 4: '16.67', 5: '6.51', 6: '1.00', 7: '24.17', 8: '1.60', 9: '3.67', 'vv1': 0: '87,828', 1: '87,828', 2: '87,828', 3: '87,828', 4: '87,828', 5: '87,828', 6: '87,828', 7: '87,828', 8: '87,828', 9: '87,828', 'pv1': 0: '24,352', 1: '24,352', 2: '24,352', 3: '24,352', 4: '24,352', 5: '24,352', 6: '24,352', 7: '24,352', 8: '24,352', 9: '24,352', 'pvs1': 0: '27.73', 1: '27.73', 2: '27.73', 3: '27.73', 4: '27.73', 5: '27.73', 6: '27.73', 7: '27.73', 8: '27.73', 9: '27.73'
如何修改我的代码,使结果看起来像第一张图片,而不是第二张?就上下文而言,这将适用于 panda 数据框中 > 20,000 行,其中“pty_n”值无规律地变化(例如,4 行 Liberal、4 行 Green、7 行 Labor、2 行 Citizen Elected 等。 ) 谢谢!
aust19 = pd.DataFrame(
'rg' : region,
'ctr_n' : ctrname,
'ctr' : ctrcode,
'yr' : year,
'mn' : month,
'sub' : sub,
'cst_n': constituencies,
'cst' : cstcode,
'can': candidates,
'pty_n': partynames,
'cv1': canvotes,
'cvs1': canshare,
'vv1': totalvotes
)
real_pv1 = None
real_pvs1 = None
for idx, row in aust19.iloc[::-1].iterrows():
if row.can == "Group Total":
real_pv1 = row.cv1
real_pvs1 = row.cvs1
else:
aust19.loc[idx].pv1 = real_pv1
aust19.loc[idx].pvs1 = real_pvs1
aust19['pv1'] = real_pv1
aust19['pvs1'] = real_pvs1
aust19.to_csv("austtbd.csv")
【问题讨论】:
您能以文本形式提供您的数据框吗?只需print(df.head(10).to_dict())
并将输出粘贴到您的主体中,然后将其格式化为代码。
@Manakin 更新了!
【参考方案1】:
我用于这种事情的一般模式是:
dataframe.loc[condition, destination columns] = dataframe.loc[condition, source columns]
这利用了矢量化的 pandas 运算符
更具体地说,对于您的用例,这可以分两步完成,例如:
aust19.loc[aust19["can"] == "Group Total", ["pv1", "pvs1"]] = aust19.loc[aust19["can"] == "Group Total", ["cv1", "cvs1"]]
aust19.loc[aust19["pty_n"] == "Independent", ["pv1", "pvs1"]] = aust19.loc[aust19["pty_n"] == "Independent", ["cv1", "cvs1"]]
编辑:
我能够在数据帧的多次传递中完成此操作以满足条件
aust19 = pd.DataFrame('rg': 0: 'Oceania', 1: 'Oceania', 2: 'Oceania', 3: 'Oceania', 4: 'Oceania', 5: 'Oceania', 6: 'Oceania', 7: 'Oceania', 8: 'Oceania', 9: 'Oceania', 'ctr_n': 0: 'Australia', 1: 'Australia', 2: 'Australia', 3: 'Australia', 4: 'Australia', 5: 'Australia', 6: 'Australia', 7: 'Australia', 8: 'Australia', 9: 'Australia', 'ctr': 0: '', 1: '', 2: '', 3: '', 4: '', 5: '', 6: '', 7: '', 8: '', 9: '', 'yr': 0: '2019', 1: '2019', 2: '2019', 3: '2019', 4: '2019', 5: '2019', 6: '2019', 7: '2019', 8: '2019', 9: '2019', 'mn': 0: '06', 1: '06', 2: '06', 3: '06', 4: '06', 5: '06', 6: '06', 7: '06', 8: '06', 9: '06', 'sub': 0: '-990', 1: '-990', 2: '-990', 3: '-990', 4: '-990', 5: '-990', 6: '-990', 7: '-990', 8: '-990', 9: '-990', 'cst_n': 0: 'Canberra, ACT', 1: 'Canberra, ACT', 2: 'Canberra, ACT', 3: 'Canberra, ACT', 4: 'Canberra, ACT', 5: 'Canberra, ACT', 6: 'Canberra, ACT', 7: 'Canberra, ACT', 8: 'Canberra, ACT', 9: 'Canberra, ACT', 'cst': 0: '', 1: '', 2: '', 3: '', 4: '', 5: '', 6: '', 7: '', 8: '', 9: '', 'can': 0: 'Ticket Votes', 1: 'SESELJA, Zed', 2: 'GUNNING, Robert', 3: 'Group Total', 4: 'Ticket Votes', 5: 'KYBURZ, Penny', 6: 'DAVIDSON, Emma', 7: 'Group Total', 8: 'Ticket Votes', 9: 'PESEC, Anthony', 'pty_n': 0: 'Liberal', 1: 'Liberal', 2: 'Liberal', 3: 'Liberal', 4: 'The Greens', 5: 'The Greens', 6: 'The Greens', 7: 'The Greens', 8: '\xa0', 9: '\xa0', 'cv1': 0: '21,209', 1: '2,142', 2: '1,001', 3: '24,352', 4: '14,637', 5: '5,719', 6: '875', 7: '21,231', 8: '1,404', 9: '3,225', 'cvs1': 0: '24.15', 1: '2.44', 2: '1.14', 3: '27.73', 4: '16.67', 5: '6.51', 6: '1.00', 7: '24.17', 8: '1.60', 9: '3.67', 'vv1': 0: '87,828', 1: '87,828', 2: '87,828', 3: '87,828', 4: '87,828', 5: '87,828', 6: '87,828', 7: '87,828', 8: '87,828', 9: '87,828', 'pv1': 0: '24,352', 1: '24,352', 2: '24,352', 3: '24,352', 4: '24,352', 5: '24,352', 6: '24,352', 7: '24,352', 8: '24,352', 9: '24,352', 'pvs1': 0: '27.73', 1: '27.73', 2: '27.73', 3: '27.73', 4: '27.73', 5: '27.73', 6: '27.73', 7: '27.73', 8: '27.73', 9: '27.73')
# convert from string
aust19['cv1'] = aust19['cv1'].str.replace(",","").astype(int)
aust19['cvs1'] = aust19['cvs1'].str.replace(",","").astype(float)
aust19['pv1'] = aust19['pv1'].str.replace(",","").astype(int)
aust19['pvs1'] = aust19['pvs1'].str.replace(",","").astype(float)
# set cache to zero
cv1_sum = 0.
cvs1_sum = 0.
group = 0
aust19['group_n'] = None
for i in aust19.index:
# Ignore rows that are Independent
if aust19.loc[i, "pty_n"] != "Independent":
# For group totals, write the cache and reset to zero
if aust19.loc[i, "can"] == "Group Total":
aust19.loc[i, "pv1"] = cv1_sum
aust19.loc[i, "pvs1"] = cvs1_sum
aust19.loc[i, "group_n"] = group
cv1_sum = 0
cv1s_sum = 0
group += 1 # increment group
# For non group totals, add the current row to the cache
else:
cv1_sum += aust19.loc[i, "cv1"]
cvs1_sum += aust19.loc[i, "cvs1"]
aust19.loc[i, "group_n"] = group
## Second pass
aust19['group_n'] = aust19['group_n'].astype(int)
for g in range(group):
aust19.loc[(aust19["group_n"] == g) & (aust19["can"] != "Group Total"), "pv1"] = int(aust19.loc[(aust19["group_n"] == g) & (aust19["can"] == "Group Total"), "pv1"])
aust19.loc[(aust19["group_n"] == g) & (aust19["can"] != "Group Total"), "pvs1"] = float(aust19.loc[(aust19["group_n"] == g) & (aust19["can"] == "Group Total"), "pvs1"])
# handle independents
aust19.loc[aust19["pty_n"] == "Independent", ["pv1", "pvs1"]] = aust19.loc[aust19["pty_n"] == "Independent", ["cv1", "cvs1"]]
【讨论】:
这很有帮助!对于第一步,有没有一种方法可以将相应的“pv1”和“pvs1”值应用于“组总计”上方的所有行的“cv1”和“cvs1”,而不仅仅是“组总计”的行?例如,如何将“666”和“0.7”应用于上图中“Rise Up Australia”组总数之上的所有行? 只是为了澄清:对于高于组总数的所有行(与同一方?)或高于上一个“组总数”的所有行 另外,正如前面的评论者@Manakin 提到的,请提供您的数据集的示例 sn-p,而不仅仅是图像。您会发现,当人们可以重新创建您的数据集而不是从图像中转录它时,您会收到更多的响应。 以上所有行直到上一个“组总数”,因为同一方的多个实例具有不同的组总数。我现在正在运行代码来创建一个示例 sn-p,感谢您的提示! 更新了! @alex_danielssen以上是关于Python Pandas:如果条件为真,则将现有列值放入新列的主要内容,如果未能解决你的问题,请参考以下文章
Python Pandas 数据框:对于一年中的每个月,如果月份不存在,则将当月最后一天的日期添加到索引中,或者删除重复项