使用 iloc 从数据框中切片多个列范围
Posted
技术标签:
【中文标题】使用 iloc 从数据框中切片多个列范围【英文标题】:Slicing multiple column ranges from a dataframe using iloc 【发布时间】:2018-02-09 16:06:21 【问题描述】:我有一个 32 列的 df
df.shape
(568285, 32)
我正在尝试以特定方式重新排列列,并使用 iloc 删除第一列
df = df.iloc[:,[31,[1:23],24,25,26,28,27,29,30]]
^
SyntaxError: invalid syntax
这是正确的做法吗?
【问题讨论】:
欺骗***.com/a/41540037/2137255 @JohnGalt 需要提高搜索技能,道歉 给它时间,我们都在那里。 :) @novicebioinforesearcher 另外,重要的是要注意,骗子也不错。有时我们搜索的方式不同。或者您只是不知道要搜索什么。它不应该反映对您或您的问题的负面判断。事实上,从这个问题和答案中获得了很多代表。所有标记欺骗完成的是将搜索结果重定向到欺骗目标。因此,事实上,您已经通过包含将重新路由到另一个答案的其他搜索词来提供帮助(-: 啊,我明白骗子的概念是有道理的。 【参考方案1】:您可以使用np.r_
索引器。
class RClass(AxisConcatenator) | Translates slice objects to concatenation along the first axis. | | This is a simple way to build up arrays quickly. There are two use cases.
df = df.iloc[:, np.r_[31, 1:23, 24, 25, 26, 28, 27, 29, 30]]
df
0 1 2 3 4 5 6 7 8 9 ... 40 \
A 33.0 44.0 68.0 31.0 NaN 87.0 66.0 NaN 72.0 33.0 ... 71.0
B NaN NaN 77.0 98.0 NaN 48.0 91.0 43.0 NaN 89.0 ... 38.0
C 45.0 55.0 NaN 72.0 61.0 87.0 NaN 99.0 96.0 75.0 ... 83.0
D NaN NaN NaN 58.0 NaN 97.0 64.0 49.0 52.0 45.0 ... 63.0
41 42 43 44 45 46 47 48 49
A NaN 87.0 31.0 50.0 48.0 73.0 NaN NaN 81.0
B 79.0 47.0 51.0 99.0 59.0 NaN 72.0 48.0 NaN
C 93.0 NaN 95.0 97.0 52.0 99.0 71.0 53.0 69.0
D NaN 41.0 NaN NaN 55.0 90.0 NaN NaN 92.0
out = df.iloc[:, np.r_[31, 1:23, 24, 25, 26, 28, 27, 29, 30]]
out
31 1 2 3 4 5 6 7 8 9 ... 20 \
A 99.0 44.0 68.0 31.0 NaN 87.0 66.0 NaN 72.0 33.0 ... 66.0
B 42.0 NaN 77.0 98.0 NaN 48.0 91.0 43.0 NaN 89.0 ... NaN
C 77.0 55.0 NaN 72.0 61.0 87.0 NaN 99.0 96.0 75.0 ... 76.0
D 95.0 NaN NaN 58.0 NaN 97.0 64.0 49.0 52.0 45.0 ... 71.0
21 22 24 25 26 28 27 29 30
A NaN 40.0 66.0 87.0 97.0 68.0 NaN 68.0 NaN
B 95.0 NaN 47.0 79.0 47.0 NaN 83.0 81.0 57.0
C NaN 75.0 46.0 84.0 NaN 50.0 41.0 38.0 52.0
D NaN 74.0 41.0 55.0 60.0 NaN NaN 84.0 NaN
【讨论】:
【参考方案2】:这是一个使用显式索引的自定义解决方案:
旁注,np.r_
不适合我,这就是我构建此解决方案的原因。
import numpy as np
import pandas as pd
# Make a sample df of 1_000 rows & 100 cols
data = np.zeros(shape=(1_000,100))
df = pd.DataFrame(data)
# Create a custom function for indexing
def all_nums_in_range(*tuple_pairs, len_df):
"""
Input pairs of tuples for index slicing
Include `len_df` to ensure length of array matches indexed df
"""
# Create an array with values to use as an index
num_range = np.zeros(shape=(len_df,), dtype=bool)
# Update
for (start, end) in tuple_pairs:
num_range[start:end] = True
return num_range
# Now apply
num_range = all_nums_in_range((0,50), (75, 80), len_df=100)
df.iloc[:, num_range]
【讨论】:
以上是关于使用 iloc 从数据框中切片多个列范围的主要内容,如果未能解决你的问题,请参考以下文章
Python 3.x - iloc 抛出错误 - “单个位置索引器超出范围”
pandas使用iloc函数基于dataframe数据列的索引抽取单列或者多列数据其中多列索引需要嵌入在列表方括号[]中或使用:符号形成起始和终止范围索引