如何编写用于迭代 DataFrame 的 for 循环并将其子集化以仅包含在每次迭代中检索到的那些列？

Posted 2023-03-12

技术标签:

【中文标题】如何编写用于迭代 DataFrame 的 for 循环并将其子集化以仅包含在每次迭代中检索到的那些列？【英文标题】：How to write for loop for iterating a DataFrame and subset it to include only those columns that are retrieved at each iteration? 【发布时间】：2021-11-21 02:44:59 【问题描述】：

目的是在每次迭代后获得一个子集DataFrame，例如三列 'id', 'reference','sample 1' 当样本 1 为 0 时（对每个样本执行此操作） 'id', 'reference','sample 1' 当样本 1 为 1 时（对每个样本执行此操作）例如，当样本 1 = 0 时，生成的子集 DataFrame 将是

【问题讨论】：

【参考方案1】：

试试：

sample_cols = df.columns[2:]
dfs = []
for col in sample_cols:
    print('='*50, col, '='*50)
    for condition in [0, 1]:
        print('='*20, condition, '='*20)
        df_subset = df[df[col]==condition].reset_index(drop=True)
        df_subset = df_subset[['id', 'Reference', col]]
        print(df_subset)
        #df_subset.to_csv(f'./col_condition.csv', index=False)
        dfs.append(df_subset)

df_final = pd.concat(dfs, ignore_index=True)
df_final.to_csv('./file_name.csv', index=False)

【讨论】：

谢谢。它奏效了……你救了我的命。代码简洁明了。我可以问你更多，如何将结果迭代地存储到数据框或 csv 中？如果这不打扰您...再次感谢我已经编辑了答案，现在检查一下。如果有帮助，请投票并接受答案。 :) 当我尝试使用更多样本时，它在 5 个样本后停止并抛出错误。[Errno 2] 没有这样的文件或目录：'./6|中亚/南西伯利亚俄罗斯哥萨克降低唐|_0 .csv' 谢谢您，先生。它解决了我的问题。我很高兴，再次感谢。我想成为一个更好的程序员，你能提出一些想法吗？因为我觉得你的代码简单而强大..我觉得这很有趣检查这个：kaggle.com/learn

以上是关于如何编写用于迭代 DataFrame 的 for 循环并将其子集化以仅包含在每次迭代中检索到的那些列？的主要内容，如果未能解决你的问题，请参考以下文章