按年份和 ID 拆分 txt 文件,并将每个新的 txt 文件重命名为“Year_ID.txt”

Posted

技术标签:

【中文标题】按年份和 ID 拆分 txt 文件,并将每个新的 txt 文件重命名为“Year_ID.txt”【英文标题】:Split txt file by Year and ID and rename each new txt file as "Year_ID.txt" 【发布时间】:2021-09-29 08:24:36 【问题描述】:

我有一堆 txt 文件(逗号分隔),我想使用第 1 列(年份)和第 3 列(ID)中的通用组标识符将文件拆分为单独的文本文件。另外,我想将新文件名保存为“Column1_Column3.txt”。我不想为这些文件保留任何标题。 我尝试了许多其他问题的脚本/建议,但似乎没有任何效果。 我是 python 新手,任何建议都会非常有帮助。非常感谢。

文件格式:

1.0,9.0,0.0,0.0,5.0,13.2,143.2,993.8529934630001,18.005554199200002,92.5999984741,0.0,0.0,159.882055791 1.0,9.0,0.0,1.0,5.0,13.3,142.8,992.4,19.0,91.5013544438,0.0,0.0,202.645072402 1.0,9.0,0.0,2.0,5.0,13.4,142.5,989.0,21.2,90.4027104135,0.0,0.0,235.39787781 1.0,9.0,0.0,3.0,5.0,13.5,142.2,986.5,22.7,89.3040663832,0.0,0.0,268.74681081200004 1.0,11.0,1.0,1.0,5.0,11.5,175.6,995.6,18.7,18.5200004578,0.0,0.0,680.61138846 1.0,11.0,1.0,5.0,5.0,12.2,174.1,988.9,23.4,18.5200004578,0.0,0.0,645.040646961 1.0,11.0,1.0,6.0,5.0,12.4,173.9,986.5,24.9,18.5200004578,0.0,0.0,654.7981628169999 1.0,9.0,2.0,4.0,5.0,10.7,146.8,986.0,23.2,68.3182237413,0.0,0.0,364.724300756 1.0,9.0,2.0,5.0,5.0,10.8,146.2,982.9,25.0,66.8777792189,0.0,0.0,317.156397048

所以我的输出应该是: 文件1:

1.0,9.0,0.0,0.0,5.0,13.2,143.2,993.8529934630001,18.005554199200002,92.5999984741,0.0,0.0,159.882055791 1.0,9.0,0.0,1.0,5.0,13.3,142.8,992.4,19.0,91.5013544438,0.0,0.0,202.645072402 1.0,9.0,0.0,2.0,5.0,13.4,142.5,989.0,21.2,90.4027104135,0.0,0.0,235.39787781

文件2:

1.0,11.0,1.0,1.0,5.0,11.5,175.6,995.6,18.7,18.5200004578,0.0,0.0,680.61138846 1.0,11.0,1.0,5.0,5.0,12.2,174.1,988.9,23.4,18.5200004578,0.0,0.0,645.040646961 1.0,11.0,1.0,6.0,5.0,12.4,173.9,986.5,24.9,18.5200004578,0.0,0.0,654.7981628169999

文件3:

1.0,9.0,2.0,4.0,5.0,10.7,146.8,986.0,23.2,68.3182237413,0.0,0.0,364.724300756 1.0,9.0,2.0,5.0,5.0,10.8,146.2,982.9,25.0,66.8777792189,0.0,0.0,317.156397048

【问题讨论】:

【参考方案1】:

假设:

    所有条目都是统一的 条目位于二维列表中 所有条目的长度至少为 3(以包括两个分隔字段)

有点担心:

在 File1 中,第二个条目的前面是否应该有“2055791”?这意味着列表条目对于您想要的不是太统一。如果是这种情况,那么我建议您事先清理数据或添加到此代码中,以便它可以忽略它。
#grab the full list
full_list = []

#grab every value of column 1
col_one_list = [a[0] for a in full_list]

#grab every value of column 3
col_three_list = [b[2] for b in full_list]


#sort by them
for i in col_one_list:
    for j in col_three_list:
        separate_list = []
        for entry in full_list:
            if (entry[0] == i and entry[2] == j):
                separate_list.append(entry)
        with open(str(i) + "_" +str(j)+".txt", "w" ) as file:
            for item in separate_list:
                file.write("%s\n" % item)

这应该足够了。

【讨论】:

您好 dperry5910,非常感谢您的反馈。我会试试这个脚本。 2055791 值只是这篇文章中的复制粘贴错误....该值实际上属于文件中的第 1 行。所以格式是统一的。

以上是关于按年份和 ID 拆分 txt 文件,并将每个新的 txt 文件重命名为“Year_ID.txt”的主要内容,如果未能解决你的问题,请参考以下文章

将时间数据 CSV 拆分为不同的年份并将它们绘制在一个图表中

按年份过滤大型数据集

C# 从 .txt 读取并拆分为结构数组

如何拆分TXT文件

国家和年份 - 上次修改日期

r 按 ID 按年份折叠