如何将具有 24 小时值的日期/时间字符串转换为 Pandas 中的日期时间?
Posted
技术标签:
【中文标题】如何将具有 24 小时值的日期/时间字符串转换为 Pandas 中的日期时间?【英文标题】:How to convert date/time strings that have 24 as the hour value to datetimes in Pandas? 【发布时间】:2019-07-03 16:24:18 【问题描述】:我正在从通用邮件应用程序 (Mac OS X) 将电子邮件作为文本文件导入。不幸的是,电子邮件中的许多日期都有类似"24:01:01"
的时间,这不是有效时间(应该是"00:01:01"
)。
有没有简单的方法来转换这些?
正常的日期/时间字符串可以正常工作:
>>> pd.to_datetime("March 23, 2011 at 23:42:46 PDT")
Timestamp('2011-03-23 23:42:46-0700', tz='pytz.FixedOffset(-420)')
日期字符串异常:
>>> pd.to_datetime("March 23, 2011 at 24:42:46 PDT")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/anaconda/envs/pyqt/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
1860 try:
-> 1861 values, tz_parsed = conversion.datetime_to_datetime64(data)
1862 # If tzaware, these values represent unix timestamps, so we
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-38-4cb009b21802> in <module>
----> 1 pd.to_datetime("March 23, 2011 at 24:42:46 PDT")
~/anaconda/envs/pyqt/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin, cache)
609 result = convert_listlike(arg, box, format)
610 else:
--> 611 result = convert_listlike(np.array([arg]), box, format)[0]
612
613 return result
~/anaconda/envs/pyqt/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in _convert_listlike_datetimes(arg, box, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
300 arg, dayfirst=dayfirst, yearfirst=yearfirst,
301 utc=utc, errors=errors, require_iso8601=require_iso8601,
--> 302 allow_object=True)
303
304 if tz_parsed is not None:
~/anaconda/envs/pyqt/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
1864 return values.view('i8'), tz_parsed
1865 except (ValueError, TypeError):
-> 1866 raise e
1867
1868 if tz_parsed is not None:
~/anaconda/envs/pyqt/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
1855 dayfirst=dayfirst,
1856 yearfirst=yearfirst,
-> 1857 require_iso8601=require_iso8601
1858 )
1859 except ValueError as e:
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime_object()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime_object()
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()
~/anaconda/envs/pyqt/lib/python3.6/site-packages/dateutil/parser/_parser.py in parse(timestr, parserinfo, **kwargs)
1354 return parser(parserinfo).parse(timestr, **kwargs)
1355 else:
-> 1356 return DEFAULTPARSER.parse(timestr, **kwargs)
1357
1358
~/anaconda/envs/pyqt/lib/python3.6/site-packages/dateutil/parser/_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
651 raise ValueError("String does not contain a date:", timestr)
652
--> 653 ret = self._build_naive(res, default)
654
655 if not ignoretz:
~/anaconda/envs/pyqt/lib/python3.6/site-packages/dateutil/parser/_parser.py in _build_naive(self, res, default)
1225 repl['day'] = monthrange(cyear, cmonth)[1]
1226
-> 1227 naive = default.replace(**repl)
1228
1229 if res.weekday is not None and not res.day:
ValueError: hour must be in 0..23
【问题讨论】:
您期望从中获得什么价值?第二天0:42:46? @Paul 是的,我预计第二天 00:42:46。 不确定这些时间字符串的来源。可能是 Apple Mail Mac,也可能来自用于创建原始电子邮件的邮件服务。以下是它们在电子邮件文本文件中的显示方式:Date: February 24, 2011 at 24:48:03 PST
【参考方案1】:
首先用to_datetime
和errors='coerce'
转换好的日期时间 - 得到NaT
的坏值。所以过滤它,replace
24
并添加一天。最后fillna
用它替换缺失值:
d = ["March 23, 2011 at 24:42:46 PDT",
"March 23, 2011 at 23:42:46 PDT"]
s = pd.Series(d)
s1 = pd.to_datetime(s, errors='coerce')
m = s1.isna()
s2 = (pd.to_datetime(s[m].replace('at 24:', 'at 00:', regex=True), errors='coerce') +
pd.Timedelta(1, unit='d'))
s = s1.fillna(s2)
print (s)
0 2011-03-24 00:42:46
1 2011-03-23 23:42:46
dtype: datetime64[ns]
另一个想法 - 提取日期和时间以分隔列并添加timedelta
s:
s1 = pd.to_datetime(s, errors='coerce')
m = s1.isna()
df2 = s[m].str.split(' at ', expand=True)
df2.columns = ['date','time']
df2['date'] = pd.to_datetime(df2['date'], errors='coerce')
df2['time'] = pd.to_timedelta(df2['time'].str.extract('(\d+:\d+:\d+)', expand=False))
df2['date1'] = df2['date'] + df2['time']
print (df2)
date time date1
0 2011-03-23 1 days 00:42:46 2011-03-24 00:42:46
s = s1.fillna(df2['date1'])
print (s)
0 2011-03-24 00:42:46
1 2011-03-23 23:42:46
dtype: datetime64[ns]
【讨论】:
以上是关于如何将具有 24 小时值的日期/时间字符串转换为 Pandas 中的日期时间?的主要内容,如果未能解决你的问题,请参考以下文章
如何将 12 或 24 小时时间格式的日期转换为“互联网”日期?
如何将日期/时间从 24 小时格式转换为 12 小时 AM/PM? [复制]
Javascript:将 24 小时时间字符串转换为 12 小时时间,上午/下午且无时区