总结--DataFrame根据部分列的异常值删除多行
Posted oliveQ
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了总结--DataFrame根据部分列的异常值删除多行相关的知识,希望对你有一定的参考价值。
删除3sigma的异常值
import pandas as pd
import numpy as np
from collections import Counter
def Detect_3Outliers(df,n,features):
outlier_indices = []
for col in features:
left=df[col].mean()-3*df[col].std()
right=df[col].mean()+3*df[col].std()
outlier_list_col = df[(df[col]<=left)|(df[col]>=right)].index
outlier_indices.extend(outlier_list_col)
outlier_indices = Counter(outlier_indices)
multiple_outliers = list(k for k, v in outlier_indices.items() if v > n)
# select observations containing more than n outliers
return multiple_outliers
删除箱式图的异常值
import pandas as pd
import numpy as np
from collections import Counter
def Detect_Box_Outliers(df, n, features):
outlier_indices = []
for col in features:
Q1 = np.percentile(df[col], 25)
# 1st quartile (25%)
Q3 = np.percentile(df[col], 75)
# 3rd quartile (75%)
IQR = Q3 - Q1
# Interquartile range (IQR)
outlier_step = 1.5 * IQR
# outlier step
outlier_list_col = df[(df[col]<Q1-outlier_step)|(df[col]>Q3+outlier_step)].index
# Determine a list of indices of outliers for feature col
outlier_indices.extend(outlier_list_col)
# append the found outlier indices for col to the list of outlier indices
outlier_indices = Counter(outlier_indices)
multiple_outliers = list(k for k, v in outlier_indices.items() if v > n)
# select observations containing more than n outliers
return multiple_outliers
调用
axis = 0是删除行
Outliers_to_drop是获得的行标签
Outliers_to_drop = Detect_3Outliers(Data,1,Flylabel)
# 重置索引
Data = Data.drop(Outliers_to_drop, axis = 0).reset_index(drop=True)
以上是关于总结--DataFrame根据部分列的异常值删除多行的主要内容,如果未能解决你的问题,请参考以下文章
在大熊猫DataFrame中按组删除异常值的更快方法[重复]
R语言使用isna函数查看列表和dataframe中是否包含缺失值将dataframe中数据列中的异常值标注为缺失值NA使用na.omit函数删除dataframe中包含缺失值NA的数据行
sql server 2005多表连接查询,去除部分列不查询