根据一列的连续值获取数据框的行

Posted 2023-03-23

技术标签:

【中文标题】根据一列的连续值获取数据框的行【英文标题】：Get the rows of dataframe based on the consecutive values of one column 【发布时间】：2021-11-08 04:25:39 【问题描述】：

有没有办法根据特定列的值获取连续的行？例如：

	column1	column2	View
row1	1	2	c
row2	3	4	a
row3	5	6	p
row4	7	8	p
row5	9	10	n

我需要获取包含单词'app'的字母的行作为View，所以在这个例子中我需要将row2、row3和row4保存在一个列表中。

【问题讨论】：

【参考方案1】：

这是一个通用的方法。我使用index_slice_by_substring() 生成一个表示开始行和结束行的整数元组。函数rows_by_consecutive_letters() 获取您的数据框、要检查的列名和您要查找的字符串，并使用.iloc 通过整数值获取表的一部分作为返回值。

获取切片索引的关键是使用''.join(df[column]) 将“视图”列值连接到一个字符串中，并从左到右检查与条件字符串长度相同的子字符串，直到匹配为止

def index_slice_by_substring(full_string, substring) -> tuple:
    len_substring = len(substring)
    len_full_string = len(full_string)
    for x0, x1 in enumerate(range(len_substring,len_full_string)):
        if full_string[x0:x1] == substring:
            return (x0,x1)

def rows_by_consecutive_letters(df, column, condition) -> pd.DataFrame:
    row_begin, row_end = index_slice_by_substring(''.join(df[column]), condition)
    return df.iloc[row_begin:row_end,:]

print(rows_by_consecutive_letters(your_df,"View","app"))

   column1  column2 View
1        3        4    a
2        5        6    p
3        7        8    p

【讨论】：

【参考方案2】：

不是pythonic的方式，而是做的工作：

keep = []
for i in range(len(df) - 2):
    if (df.View[i]=='a') & (df.View[i+1] =='p') & (df.View[i+2] =='p'):
        keep.append(df[i])
        keep.append(df[i+1])
        keep.append(df[i+2])

结果：

【讨论】：

这条线for i,_ in enumerate(range(len(df))) 是我见过的最奇怪的方式for i in range(len(df))。另外，为什么需要if i+3==len(df): break？就做for i in range(len(df) - 2) 第三，为什么要保留行的索引而不是只保留行？而不是keep.append(i)，使用keep.append(df[i])，然后你不需要在最后使用df.iloc，你只需要keep中的行。【参考方案3】：

您可以使用str.find，但这只会找到您的搜索词的第一次出现。

search = 'app'
i = ''.join(df.View).find(search)
if i>-1:
    print(df.iloc[i: i+len(search)])

输出

      column1  column2 View                         
row2        3        4    a
row3        5        6    p
row4        7        8    p

要查找无（不进行错误检查），您可以使用re.finditer。结果是数据帧切片的列表。

import re
search='p'   # searched for 'p' to find more than one
[df.iloc[x.start():x.end()] for x in re.finditer(search, ''.join(df.View))]

输出

[      column1  column2 View                        
 row3        5        6    p,
       column1  column2 View                         
 row4        7        8    p]

【讨论】：

很好，但如果 df 中有多个三元组，则不起作用。 @NivDudovitch - 为多次出现添加了解决方案。

以上是关于根据一列的连续值获取数据框的行的主要内容，如果未能解决你的问题，请参考以下文章