使用python修改csv文件中的一些行
Posted
技术标签:
【中文标题】使用python修改csv文件中的一些行【英文标题】:Modify some rows in a csv file using python 【发布时间】:2021-04-18 03:31:36 【问题描述】:我有一个csv文件如下:
name,row,column,length_of_field
AB000M,8,12,1
AB000M,9,12,1
AB000M,10,0,80
AB000M,10,12,1
AB000M,11,1,1
AB000M,21,0,80
AB000M,22,0,80
我想要做的是每当列字段为 0 时,将字段长度添加到列,行应减 1。此外,如果后续行的列再次为 0,则添加第一个字段长度在列中,第一次出现时将行减 1,然后将字段长度加在一起。然后删除旧行。
所以在上面的 csv - "AB000M,10,0,80" 将变成 "AB000M,9,80,80" 并且 "AB000M,21,0,80 AB000M,22,0,80" ---这两行应替换为"AB000M,20,80,160"。
我正在尝试使用这个 sn-p 来实现这一点,但它不起作用:
df = pd.read_csv("file.csv")
for ind in df.index:
if ind >= len(df)-1:
break
if df['column'][ind] == 0 and df['column'][ind + 1] != 0:
df['row'][ind] -= 1
df['column'][ind] = 80
elif df['column'][ind] == 0 and df['column'][ind + 1] == 0:
df['row'][ind] -= 1
df['column'][ind] = 80
df['length_of_field'][ind] += df['length_of_field'][ind + 1]
df.drop([df.index[ind + 1]], axis=0)
【问题讨论】:
【参考方案1】:这是一个可能对您有用的示例。
import pandas as pd
df = pd.read_csv('test.csv')
newRows = []
last_val_is_zero = False
tempRow = None
for row in df.iterrows():
vals = row[1]
if vals['column'] == 0:
if not last_val_is_zero:
vals['row'] = vals['row'] - 1
vals['column'] = vals['length_of_field']
tempRow = vals
last_val_is_zero = True
else:
tempRow['length_of_field'] = tempRow['length_of_field'] + vals['length_of_field']
else:
if tempRow is not None:
newRows.append(tempRow)
newRows.append(vals)
tempRow = None
last_val_is_zero = False
if tempRow is not None:
newRows.append(tempRow)
newData = [[val for val in row] for row in newRows]
newDf = pd.DataFrame(newData, columns=[x for x in newRows[0].keys()])
【讨论】:
【参考方案2】:样本数据:
df_str = '''
name,row,column,length_of_field
AB000M,8,12,1
AB000M,9,12,1
AB000M,10,0,80
AB000M,10,12,1
AB000M,11,1,1
AB000M,21,0,80
AB000M,22,0,80
AB000M,23,11,1
AB000M,24,11,1
AB000M,25,0,80
AB000M,26,0,80
AB000M,27,0,80
AB000M,28,11,1
AB000M,29,0,80
'''
df = pd.read_csv(io.StringIO(df_str.strip()), sep=',', index_col=False)
解决方案:
# split the row which to update or left
cond = df['column'] == 0
df_to_update = df[cond].copy()
df_left = df[~cond].copy()
# modify the update rows
df_to_update['column'] = df_to_update['length_of_field']
df_to_update['row'] -= 1
# create tag for which is diff 1 with the previous row
cond = df_to_update['row'].diff() != 1
df_to_update['tag'] = np.where(cond, 1, 0)
# cumsum tag to creat group
df_to_update['label'] = df_to_update['tag'].cumsum()
print(df_to_update)
# name row column length_of_field tag label
# 2 AB000M 9 80 80 1 1
# 5 AB000M 20 80 80 1 2
# 6 AB000M 21 80 80 0 2
# 9 AB000M 24 80 80 1 3
# 10 AB000M 25 80 80 0 3
# 11 AB000M 26 80 80 0 3
# 13 AB000M 28 80 80 1 4
# agg groupy left first row, and sum(length_of_field)
obj_list = []
for tag, group in df_to_update.groupby('label'):
obj = group.iloc[0].copy()
obj['length_of_field'] = group['length_of_field'].sum()
obj_list.append(obj)
dfn_to_update = pd.concat(obj_list,axis=1).T[df.columns]
# merge final result
dfn = df_left.append(dfn_to_update).sort_index()
结果:
print(dfn)
name row column length_of_field
0 AB000M 8 12 1
1 AB000M 9 12 1
2 AB000M 9 80 80
3 AB000M 10 12 1
4 AB000M 11 1 1
5 AB000M 20 80 160
7 AB000M 23 11 1
8 AB000M 24 11 1
9 AB000M 24 80 240
12 AB000M 28 11 1
13 AB000M 28 80 80
print(df)
name row column length_of_field
0 AB000M 8 12 1
1 AB000M 9 12 1
2 AB000M 10 0 80
3 AB000M 10 12 1
4 AB000M 11 1 1
5 AB000M 21 0 80
6 AB000M 22 0 80
7 AB000M 23 11 1
8 AB000M 24 11 1
9 AB000M 25 0 80
10 AB000M 26 0 80
11 AB000M 27 0 80
12 AB000M 28 11 1
13 AB000M 29 0 80
【讨论】:
以上是关于使用python修改csv文件中的一些行的主要内容,如果未能解决你的问题,请参考以下文章