遍历 Pandas 数据框 [重复]
Posted
技术标签:
【中文标题】遍历 Pandas 数据框 [重复]【英文标题】:Iterating trough Pandas dataframe [duplicate] 【发布时间】:2021-04-05 05:29:26 【问题描述】:我有一个 csv 文件“players.csv”,其中包含球员“姓名、年龄、国籍、总体、潜力、俱乐部、价值 ...”的属性。
我的任务是通过将所有俱乐部的所有球员价值相加来总结所有俱乐部的价值。
到目前为止,我通过以下解决方案获得了预期的结果。我的问题是我的解决方案需要很长时间来处理,因为有两个 for 循环。
有没有更有效的方法来解决这个问题? (Dataframe 有 14700 名玩家)
import pandas as pd
pd.options.mode.chained_assignment = None # default='warn'
# Load the data with only the Club Name and Value of the player
df = pd.read_csv('./players.csv',usecols=['Club','Value'])
# Create new List where the Club Value will be shown
# Drop all duplicates of Clubs. Now we have a Dataframe with all the available Clubs inside
# Futhermore Drop the column 'Value' and add 'Club_Value'
df_Clubs = df.drop_duplicates('Club').drop('Value',axis=1)
df_Clubs['Club_Value']=0
df = df.sort_values(by=["Club"])
#Iterating trough the players Dataframe and get the row we are in and the Value of that row
for rowdf, valuedf in df.iterrows():
#Iterating trough the new Dataframe with only the unique Clubs
for row, value in df_Clubs.iterrows():
if valuedf["Club"] == value["Club"]:
#When the Club of the Player matches with the Unique Clubs Dataframe,
#we asign the Value of the Player to the club Value
ValueClub_old = df_Clubs["Club_Value"][row]
ValuePlayer = df["Value"][rowdf]
ValueClub_new = ValueClub_old + ValuePlayer
df_Clubs["Club_Value"][row] = ValueClub_new
# save the new dataframe
df_Clubs.to_csv(r'Players_Value.csv', index = False)
df.head()
print(df_Clubs)
【问题讨论】:
【参考方案1】:在俱乐部和总和上使用 groupby。
df_new=df.groupby(['Club'])['Value'].sum().reset_index()
【讨论】:
以上是关于遍历 Pandas 数据框 [重复]的主要内容,如果未能解决你的问题,请参考以下文章