Python Datatable/Pydatatable：如何通过正则表达式过滤数据表中的行并根据过滤器为新变量赋值

Posted 2023-03-12

技术标签:

【中文标题】Python Datatable/Pydatatable：如何通过正则表达式过滤数据表中的行并根据过滤器为新变量赋值【英文标题】：Python Datatable/Pydatatable: How to filter rows in datatable by regex and assign value to new variable according to filter 【发布时间】：2020-10-04 20:38:52 【问题描述】：

我想根据 python-datatable 语法中另一列中的正则表达式匹配为新列分配值。

DT[通过正则表达式获取行，为新列赋值，]

import pandas as pd
import datatable as dt
from datatable import f, Frame
import re as re

DT = dt.Frame('a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat'])
DT['new_col']=DT[:,f.b]
DT['new_col'] = Frame([re.sub('f.*','words starting with f', s) for s in DT[:, "new_col"].to_list()[0]])
DT.head()
DT['new_col'] = Frame([re.sub('c.*','words starting with c', s) for s in DT[:, "new_col"].to_list()[0]])
DT.head()

是否有另一种解决方案，无需在数据表包中使用“to_list()”等进行转换（无循环）？

此问题中的正则表达式的结果不允许对整列进行操作： Python data.table row filter by regex 这适用于熊猫，但不适用于数据表： How to filter rows in pandas by regex

【问题讨论】：

【参考方案1】：

我认为现在您可以使用解决方案。并且随着数据表的增长，将查看所需的工具并将其添加到数据表中。

导入库

import pandas as pd
import datatable as dt
from datatable import f,by
import re as re

创建一个 DT

DT_X = dt.Frame('a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat'])

并进行所需的操作

DT_X[:,f[:].extend('new_col':dt.Frame([re.sub('f.*','words starting with f', s) for s in DT_X[:, f.b].to_list()[0]]))]

输出：

  |  a  b    new_col              
-- + --  ---  ---------------------
 0 |  1  hi   hi                   
 1 |  2  foo  words starting with f
 2 |  3  fat  words starting with f
 3 |  4  cat  cat

【讨论】：

以上是关于Python Datatable/Pydatatable：如何通过正则表达式过滤数据表中的行并根据过滤器为新变量赋值的主要内容，如果未能解决你的问题，请参考以下文章