使用 Pandas query() 过滤时间戳列上的数据帧
Posted
技术标签:
【中文标题】使用 Pandas query() 过滤时间戳列上的数据帧【英文标题】:Using Pandas query() to filter dataframe on a timestamp column 【发布时间】:2020-06-02 06:12:17 【问题描述】:我正在尝试在时间戳列上使用字符串和函数 query()
过滤 Pandas 数据框:
df.query('Timestamp < "2020-02-01"')
但是,我收到以下错误:
Traceback (most recent call last):
File "C:\ENERCON\Python 3.7.2\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-3-7bb40e9c631a>", line 1, in <module>
df.query('Timestamp < "2020-02-01"')
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\frame.py", line 3199, in query
res = self.eval(expr, **kwargs)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\frame.py", line 3315, in eval
return _eval(expr, inplace=inplace, **kwargs)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\eval.py", line 327, in eval
ret = eng_inst.evaluate()
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\engines.py", line 142, in evaluate
return self.expr()
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 837, in __call__
return self.terms(self.env)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\ops.py", line 380, in __call__
return self.func(left, right)
TypeError: '<' not supported between instances of 'type' and 'str'
也尝试将字符串转换为日期时间,但错误类似。
df.query('Timestamp < @pd.to_datetime("2020-02-01")')
Traceback (most recent call last):
File "C:\ENERCON\Python 3.7.2\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-23540526aad9>", line 1, in <module>
df.query('Timestamp < @pd.to_datetime("2020-02-01")')
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\frame.py", line 3199, in query
res = self.eval(expr, **kwargs)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\frame.py", line 3315, in eval
return _eval(expr, inplace=inplace, **kwargs)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\eval.py", line 322, in eval
parsed_expr = Expr(expr, engine=engine, parser=parser, env=env, truediv=truediv)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 830, in __init__
self.terms = self.parse()
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 847, in parse
return self._visitor.visit(self.expr)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 441, in visit
return visitor(node, **kwargs)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 447, in visit_Module
return self.visit(expr, **kwargs)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 441, in visit
return visitor(node, **kwargs)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 450, in visit_Expr
return self.visit(node.value, **kwargs)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 441, in visit
return visitor(node, **kwargs)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 747, in visit_Compare
return self.visit(binop)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 441, in visit
return visitor(node, **kwargs)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 565, in visit_BinOp
return self._maybe_evaluate_binop(op, op_class, left, right)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 547, in _maybe_evaluate_binop
return self._maybe_eval(res, self.binary_ops)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 519, in _maybe_eval
self.env, self.engine, self.parser, self.term_type, eval_in_python
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\ops.py", line 399, in evaluate
res = self(env)
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\ops.py", line 380, in __call__
return self.func(left, right)
TypeError: '<' not supported between instances of 'type' and 'Timestamp'
如果我使用 .loc
运行等效函数,我会得到想要的结果(但我不能使用用户输入字符串)。
df.loc[df['Timestamp'] < "2020-02-01"]
Out[4]:
Timestamp Error ... ToD Day_Night
0 2020-01-17 00:00:00 0 ... 0 Night
1 2020-01-17 00:10:00 0 ... 0 Night
2 2020-01-17 00:20:00 0 ... 0 Night
3 2020-01-17 00:30:00 0 ... 0 Night
4 2020-01-17 00:40:00 0 ... 0 Night
2154 2020-01-31 23:10:00 0 ... 23 Night
2155 2020-01-31 23:20:00 0 ... 23 Night
2156 2020-01-31 23:30:00 0 ... 23 Night
2157 2020-01-31 23:40:00 0 ... 23 Night
2158 2020-01-31 23:50:00 0 ... 23 Night
[2159 rows x 37 columns]
有人知道如何将query()
与日期时间列一起使用吗?
【问题讨论】:
我认为错误消息提供了一个线索 - 时间戳是一种类型,不能与 str 或 datetime 进行比较。运行测试并将时间戳名称更改为其他名称,然后查看代码是否有效。 df['Timestamp'] 是熊猫允许的,这就是它起作用的原因,因为它不是一种类型,而是一种列。阅读警告框了解更多信息:pandas.pydata.org/pandas-docs/stable/user_guide/… 谢谢,这就是问题所在。重命名该列后,它可以工作。 【参考方案1】:Timestamp
列名隐藏了内置类型 timestamp
。作为第一步,您可以使用 rename()
将列重命名为其他名称:
df.rename(columns="Timestamp": "MyTimestamp")
那么以下应该可以解决日期时间的问题:
df.query('MyTimestamp < 20200201')
或者,如果您想使用时间戳查询数据帧:
df.query('MyTimestamp < @ts("20200201T071320")'
【讨论】:
以上是关于使用 Pandas query() 过滤时间戳列上的数据帧的主要内容,如果未能解决你的问题,请参考以下文章
根据 Pandas Dataframe 中的时间戳列过滤给定的列(计数)