将 txt 文件中的日期解析为整数

Posted 2023-03-12

技术标签:

【中文标题】将 txt 文件中的日期解析为整数【英文标题】：Parsing dates from txt file as integers 【发布时间】：2018-02-17 16:47:17 【问题描述】：

我的来源是txt文件，格式如下：

cpu95-20000117-04004,134.perl,42.6,44.4
cpu95-20000117-04004,147.vortex,44.7,44.7

我想用 python 将日期解析成可以用matplotlib.pyplot 绘制的形式（即没有字符串或时间戳对象）。我将根据日期（即 2000/01/17）绘制最后一项（即 44.4）。我稍后也会将此数据用作scikitlearn linear regression model 的输入，所以我认为它应该是int 或float。非常感谢。

PS - 我检查了类似的问题，但趋势是使用.date() 方法或panda 的pd.to_datetime 及其变体，或者生成不适合scikit model 或matplotlib 的对象的方法.

编辑我应该更清楚：我想绘制real dates（所以没有toordinal），因此不能使用日期时间选项（不适用于pyplot和scikit，当试图转动@987654335时@到int);因此，我可能需要找到一种方法将 2000/01/17 或 2000.01.17 之类的东西视为整数。

【问题讨论】：

你查过here吗？你为什么要给一个模特配上这样的日期？通常的做法是使用索引。假设2000:01:17 是等于1 的初始周期点。那么，下一个周期将等于2，依此类推。您无法将2000/01/17 或2000.01.17 视为int 对象。 【参考方案1】：

假设您可以使用年份的整数表示和行中最后一项的浮点值作为 scikit 的输入，这应该可以满足您的需求。

toordinal 返回日期的“proleptic”。这意味着第 1 年的 1 月 1 日由 1 表示，1 月 2 日变为 2，依此类推。这对于普通回归来说很好。

re.search 从输入行中提取出您需要的两部分以进行进一步处理。

随着 for 循环的进行编译三个列表。 Y 最终包含输入行中的最终项目，dates_for_plotting matplotlib 所需的日期，dates_for_regression 回归所需的整数值。

脚本的最后一部分展示了如何使用从输入中收集的日期来创建绘图。

>>> txt = '''\
... cpu95-20000117-04004,134.perl,42.6,44.4
... cpu95-20000117-04004,147.vortex,44.7,44.7
... '''
>>> import re
>>> from datetime import datetime
>>> Y = []
>>> dates_for_plotting = []
>>> dates_for_regression = []
>>> for line in txt.split('\n'):
...     if line:
...         r = re.search(r'-([^-]+)-(?:[^,]+,)3([0-9.]+)', line).groups()
...         the_date = datetime.strptime(r[0], '%Y%m%d')
...         dates_for_plotting.append(the_date.date())
...         dates_for_regression.append(the_date.toordinal())
...         Y.append(r[1])
...         
>>> import matplotlib.pyplot as plt
>>> import matplotlib.dates as mdates
>>> plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
>>> plt.gca().xaxis.set_major_locator(mdates.DayLocator())
>>> plt.plot(dates_for_plotting, Y)
>>> plt.gcf().autofmt_xdate()
>>> plt.show()

【讨论】：

【参考方案2】：

不是最好的答案，但你可以这样尝试

import csv
from datetime import datetime
with open('file.txt', 'r') as file:
    dt = csv.reader(file, delimiter=',')
    for row in dt:
        date = int(row[0][6:14])
        value = float(row[3])

【讨论】：

【参考方案3】：

如果我正确理解了您的问题，也许这就是您要寻找的东西:)

with open("YourFileName.txt",'r') as f:
    for line in f.readlines():
        line = line.strip()
        #line = "cpu95-20000117-04004,134.perl,42.6,44.4"
        items = line.split(',') # [cpu95-20000117-04004,134.perl,42.6,44.4]

        date = int(items[0].split('-')[1])
        lastItem = float(items[-1])
        # rest of your code

【讨论】：

【参考方案4】：

将数字包装在 int() 中。

例子：

myString = "20000117"
try:
    myVar = int(myString)
except ValueError:
    pass # or take some action here

Python parse int from string

将其包装在 try 块中以确保安全。

【讨论】：

【参考方案5】：

为此，您可能必须编写自己的小型解析器。

您可以使用正则表达式，或在文件的每一行上使用 line.split(',')。

【讨论】：

以上是关于将 txt 文件中的日期解析为整数的主要内容，如果未能解决你的问题，请参考以下文章