遍历行如何提高速度

Posted 2023-03-12

技术标签:

【中文标题】遍历行如何提高速度【英文标题】：iterate over rows how to increase speed 【发布时间】：2021-11-07 23:02:44 【问题描述】：

我有一个非常大的数据框（>250.000 行，150 列），我需要为每一行创建国家和大陆代码。我正在使用以下代码来更新 datframe，但效率不高。我知道 iterrows 不是最好的选择，但我很难像其他帖子中描述的那样设置更快的迭代代码。你能帮我改进我的代码吗？谢谢

for index, row in dfSPSSstudent.iterrows():
    print(row['Country_ID'])
    col = row['Country_ID']
    cn_a2_code =  country_name_to_country_alpha2(col)
    cn_continent = country_alpha2_to_continent_code(cn_a2_code)
    dfSPSSstudent['CN']=cn_a2_code
    print(col, cn_a2_code, cn_continent)

【问题讨论】：

也许

dfSPSSstudent["CN"] = dfSPSSstudent["Country_ID"].apply(lambda x: country_alpha2_to_continent_code(country_name_to_country_alpha2(x)))

? 删除打印语句 - 它们是 SLOW。打印 500k 行毫无用处。如果您之后需要打印该列。 【参考方案1】：

尝试使用apply：

dfSPSSstudent["CN"] = dfSPSSstudent["Country_ID"].apply(lambda x: country_alpha2_to_continent_code(country_name_to_country_alpha2(x)))

或列表推导：

dfSPSSstudent["CN"] = [country_alpha2_to_continent_code(country_name_to_country_alpha2(x)) for x in dfSPSSstudent["Country_ID"]]

【讨论】：

以上是关于遍历行如何提高速度的主要内容，如果未能解决你的问题，请参考以下文章