将文本拆分为关联表时无法显示文本列
Posted
技术标签:
【中文标题】将文本拆分为关联表时无法显示文本列【英文标题】:Can't display text column when split text into associated table 【发布时间】:2019-08-30 10:59:45 【问题描述】:这是我的数据集,(只有一列)
Apr 1 09:14:55 i have apple
Apr 2 08:10:10 i have mango
这是我需要的结果
month date time message
Apr 1 09:14:55 i have apple
Apr 2 09:10:10 i have mango
这就是我所做的
import pandas as pd
month = []
date = []
time = []
message = []
for line in dns_data:
month.append(line.split()[0])
date.append(line.split()[1])
time.append(line.split()[2])
df = pd.DataFrame(data='month': month, 'date':date, 'time':time)
这是我得到的输出
month date time
0 Apr 1 09:14:55
1 Apr 2 09:10:10
如何显示message
列?
【问题讨论】:
df1 = df['data'].str.extract(r'^(?P<month>\S+)\s+(?P<date>\d+)\s+(?P<time>\S+)\s+(?P<message>.*)')
【参考方案1】:
在Series.str.split
中使用参数n
按前3 个空格分割,expand=True
用于输出DataFrame
:
print (df)
col
0 Apr 1 09:14:55 i have apple
1 Apr 2 08:10:10 i have mango
df1 = df['col'].str.split(n=3, expand=True)
df1.columns=['month','date','time','message']
print (df1)
month date time message
0 Apr 1 09:14:55 i have apple
1 Apr 2 08:10:10 i have mango
列表理解的另一种解决方案:
c = ['month','date','time','message']
df1 = pd.DataFrame([x.split(maxsplit=3) for x in df['col']], columns=c)
print (df1)
month date time message
0 Apr 1 09:14:55 i have apple
1 Apr 2 08:10:10 i have mango
【讨论】:
【参考方案2】:您可以将Series.str.extractall
与正则表达式模式一起使用:
df = pd.DataFrame('text': 0: 'Apr 1 09:14:55 i have apple', 1: 'Apr 2 08:10:10 i have mango')
df_new = (df.text.str
.extractall(r'^(?P<month>\w3)\s?(?P<date>\d1,2)\s?(?P<time>\d2:\d2:\d2)\s?(?P<message>.*)$')
.reset_index(drop=True))
print(df_new)
month date time message
0 Apr 1 09:14:55 i have apple
1 Apr 2 08:10:10 i have mango
【讨论】:
【参考方案3】:这可能会对你有所帮助。
(?<Month>\w+)\s(?<Date>\d+)\s(?<Time>[\w:]+)\s(?<Message>.*)
Match 1
Month Apr
Date 1
Time 09:14:55
Message i have apple
Match 2
Month Apr
Date 2
Time 08:10:10
Message i have mango
https://rubular.com/r/1S4BcbDxPtlVxE
【讨论】:
以上是关于将文本拆分为关联表时无法显示文本列的主要内容,如果未能解决你的问题,请参考以下文章
React.js - 从文本文件中获取字符串后,数组无法正确显示