使用python修改csv文件中的一些行

Posted

技术标签:

【中文标题】使用python修改csv文件中的一些行【英文标题】:Modify some rows in a csv file using python 【发布时间】:2021-04-18 03:31:36 【问题描述】:

我有一个csv文件如下:

name,row,column,length_of_field
AB000M,8,12,1
AB000M,9,12,1
AB000M,10,0,80
AB000M,10,12,1
AB000M,11,1,1
AB000M,21,0,80
AB000M,22,0,80

我想要做的是每当列字段为 0 时,将字段长度添加到列,行应减 1。此外,如果后续行的列再次为 0,则添加第一个字段长度在列中,第一次出现时将行减 1,然后将字段长度加在一起。然后删除旧行。

所以在上面的 csv - "AB000M,10,0,80" 将变成 "AB000M,9,80,80" 并且 "AB000M,21,0,80 AB000M,22,0,80" ---这两行应替换为"AB000M,20,80,160"。

我正在尝试使用这个 sn-p 来实现这一点,但它不起作用:

df = pd.read_csv("file.csv")
for ind in df.index:
    if ind >= len(df)-1:
        break
    if df['column'][ind] == 0 and df['column'][ind + 1] != 0:
        df['row'][ind] -=  1
        df['column'][ind] = 80
    elif df['column'][ind] == 0 and df['column'][ind + 1] == 0:
        df['row'][ind] -=  1
        df['column'][ind] = 80
        df['length_of_field'][ind] += df['length_of_field'][ind + 1]
        df.drop([df.index[ind + 1]], axis=0)

【问题讨论】:

【参考方案1】:

这是一个可能对您有用的示例。

import pandas as pd

df = pd.read_csv('test.csv')
newRows = []
last_val_is_zero = False
tempRow = None
for row in df.iterrows():
    vals = row[1]
    if vals['column'] == 0:
        if not last_val_is_zero:
            vals['row'] = vals['row'] - 1
            vals['column'] = vals['length_of_field']
            tempRow = vals
            last_val_is_zero = True
        else:
            tempRow['length_of_field'] = tempRow['length_of_field'] + vals['length_of_field']
    else:
        if tempRow is not None:
            newRows.append(tempRow)
        newRows.append(vals)
        tempRow = None
        last_val_is_zero = False

if tempRow is not None:
    newRows.append(tempRow)
    
newData = [[val for val in row] for row in newRows]
newDf = pd.DataFrame(newData, columns=[x for x in newRows[0].keys()])

【讨论】:

【参考方案2】:

样本数据:

df_str = '''
name,row,column,length_of_field
AB000M,8,12,1
AB000M,9,12,1
AB000M,10,0,80
AB000M,10,12,1
AB000M,11,1,1
AB000M,21,0,80
AB000M,22,0,80
AB000M,23,11,1
AB000M,24,11,1
AB000M,25,0,80
AB000M,26,0,80
AB000M,27,0,80
AB000M,28,11,1
AB000M,29,0,80
'''
df = pd.read_csv(io.StringIO(df_str.strip()), sep=',', index_col=False)

解决方案:

# split the row which to update or left
cond = df['column'] == 0
df_to_update = df[cond].copy()
df_left = df[~cond].copy()

# modify the update rows
df_to_update['column'] = df_to_update['length_of_field']
df_to_update['row'] -= 1

# create tag for which is diff 1 with the previous row
cond = df_to_update['row'].diff() != 1
df_to_update['tag'] = np.where(cond, 1, 0)

# cumsum tag to creat group
df_to_update['label'] = df_to_update['tag'].cumsum()

print(df_to_update)

#       name  row  column  length_of_field  tag  label
# 2   AB000M    9      80               80    1      1
# 5   AB000M   20      80               80    1      2
# 6   AB000M   21      80               80    0      2
# 9   AB000M   24      80               80    1      3
# 10  AB000M   25      80               80    0      3
# 11  AB000M   26      80               80    0      3
# 13  AB000M   28      80               80    1      4


# agg groupy left first row, and sum(length_of_field)
obj_list = []
for tag, group in df_to_update.groupby('label'):
    obj = group.iloc[0].copy()
    obj['length_of_field'] = group['length_of_field'].sum()
    obj_list.append(obj)
dfn_to_update = pd.concat(obj_list,axis=1).T[df.columns]    

# merge final result
dfn = df_left.append(dfn_to_update).sort_index()

结果:

print(dfn)

      name row column length_of_field
0   AB000M   8     12               1
1   AB000M   9     12               1
2   AB000M   9     80              80
3   AB000M  10     12               1
4   AB000M  11      1               1
5   AB000M  20     80             160
7   AB000M  23     11               1
8   AB000M  24     11               1
9   AB000M  24     80             240
12  AB000M  28     11               1
13  AB000M  28     80              80

print(df)

      name  row  column  length_of_field
0   AB000M    8      12                1
1   AB000M    9      12                1
2   AB000M   10       0               80
3   AB000M   10      12                1
4   AB000M   11       1                1
5   AB000M   21       0               80
6   AB000M   22       0               80
7   AB000M   23      11                1
8   AB000M   24      11                1
9   AB000M   25       0               80
10  AB000M   26       0               80
11  AB000M   27       0               80
12  AB000M   28      11                1
13  AB000M   29       0               80

【讨论】:

以上是关于使用python修改csv文件中的一些行的主要内容,如果未能解决你的问题,请参考以下文章

Python中的大型csv文件[重复]

batch - 修改子文件夹中的每个csv文件

java要怎么修改csv中指定行列位置的值

删除CSV文件中不以python中的数字开头的所有行

迭代python中的特定csv行输出一个空白文件

如何使用 Python 中的 argparse 和 csv 库编写文件?