用NA替换文本文件中特定位置的空间
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了用NA替换文本文件中特定位置的空间相关的知识,希望对你有一定的参考价值。
我有一个像这样的TEXT文件 -
sfdfd
kgfkhgjk
fsdfs
sgsgggggfsdf
Node: RBS6301 CXP102051/26_R30F L17A.4-6 (C17.0_LSV198_PA24)
=================================
col1 clo2 clo3
=================================
1 avb wer21g2
---------------------------------
=================================
empcode Emnname Date DESC
12d sf 2018-02-06 dghsjf hfhgf jfjh
asf2 asdfw2 2018-02-16 fsfsfg jhjhhjghk
dsf21 sdf2 2016-02-06 sdgfsgf
sdgg dsds dkfd-sffddfdf aaaa
dfd gfg dfsdffd aaaa
df dfdf efefkhgvkjgjk kgkjjk
4fr freff klhlkkl
-----------------------------------
hfjh
vkgjlbljkbkjbk/n/l jhfjhfhj kutiugjm iugiuk
hfhj
fggggggggggggggggggggggg
从上面我用 - 提取了以下部分 -
import pandas as pd
import csv
findStr = 'empcode Emnname'
EndStr = '-----------------------------------'
tmp1 = []
tmp = []
tmp2=[]
with open('test123.txt') as f:
out = []
for line in f:
if line.startswith(findStr):
tmp.append(re.findall('w+',line.strip()))
for line in f:
if line.rstrip()==EndStr:
out.append(tmp)
break
tmp.append(re.sub('s',' ',line.strip()))
f.close()
tmp O / P-
[['empcode', 'Emnname', 'Date', 'DESC'],
'12d sf 2018-02-06 dghsjf hfhgf jfjh',
'asf2 asdfw2 2018-02-16 fsfsfg jhjhhjghk',
'dsf21 sdf2 2016-02-06 sdgfsgf',
'sdgg dsds dkfd-sffddfdf aaaa',
'dfd gfg dfsdffd aaaa',
'df dfdf efefkhgvkjgjk kgkjjk',
'4fr freff klhlkkl']
但是,我想要在空白区域内的NA。在4fr之后的gfgor以下。任何人都可以帮忙。它应该像 -
[['empcode', 'Emnname', 'Date', 'DESC'],
'12d sf 2018-02-06 dghsjf hfhgf jfjh',
'asf2 asdfw2 2018-02-16 fsfsfg jhjhhjghk',
'dsf21 sdf2 2016-02-06 sdgfsgf',
'sdgg dsds dkfd-sffddfdf aaaa',
'dfd gfg dfsdffd aaaa',
'df NA dfdf efefkhgvkjgjk kgkjjk',
'4fr NA NA freff klhlkkl']
答案
使用re
提取您正在寻找的部分并利用Pandas read_fwf
固定宽度读取器。
import re
import pandas as pd
pat = '(empcode Emnname(.|
)*)-----------------------------------'
txt = re.findall(pat, open('test123.txt').read())[0][0]
h, b = txt.split('
', 1)
df = pd.read_fwf(pd.io.common.StringIO(b), header=None, names=h.split())
df
empcode Emnname Date DESC
0 12d sf 2018-02-06 dghsjf hfhgf jfjh
1 asf2 asdfw2 2018-02-16 fsfsfg jhjhhjghk
2 dsf21 sdf2 2016-02-06 sdgfsgf
3 sdgg dsds dkfd-sffddfdf aaaa
4 dfd gfg dfsdffd aaaa
5 df NaN dfdf efefkhgvkjgjk kgkjjk
6 4fr NaN NaN freff klhlkkl
如果由于某种原因OP实际上想要列表输出
[df.columns.tolist()] + df.to_string(header=None, index=None).splitlines()
[['empcode', 'Emnname', 'Date', 'DESC'],
'12d sf 2018-02-06 dghsjf hfhgf jfjh',
' asf2 asdfw2 2018-02-16 fsfsfg jhjhhjghk',
'dsf21 sdf2 2016-02-06 sdgfsgf',
' sdgg dsds dkfd-sffddfdf aaaa',
' dfd gfg dfsdffd aaaa',
' df NaN dfdf efefkhgvkjgjk kgkjjk',
' 4fr NaN NaN freff klhlkkl']
以上是关于用NA替换文本文件中特定位置的空间的主要内容,如果未能解决你的问题,请参考以下文章
如何在实时代码应用程序中的Mac OS X上收听特定的文本字符串