总结--DataFrame根据部分列的异常值删除多行

Posted oliveQ

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了总结--DataFrame根据部分列的异常值删除多行相关的知识,希望对你有一定的参考价值。

删除3sigma的异常值

转载来源

import pandas as pd
import numpy as np
from collections import Counter
def Detect_3Outliers(df,n,features):
    outlier_indices = []
    for col in features:
        left=df[col].mean()-3*df[col].std()
        right=df[col].mean()+3*df[col].std()
        outlier_list_col = df[(df[col]<=left)|(df[col]>=right)].index
        outlier_indices.extend(outlier_list_col)
    outlier_indices = Counter(outlier_indices)
    multiple_outliers = list(k for k, v in outlier_indices.items() if v > n)  
    # select observations containing more than n outliers
    return multiple_outliers

删除箱式图的异常值

转载来源

import pandas as pd
import numpy as np
from collections import Counter
def Detect_Box_Outliers(df, n, features):
    outlier_indices = []
    for col in features:
        Q1 = np.percentile(df[col], 25)
        # 1st quartile (25%)
        Q3 = np.percentile(df[col], 75)
        # 3rd quartile (75%)
        IQR = Q3 - Q1
        # Interquartile range (IQR)
        outlier_step = 1.5 * IQR
        # outlier step
        outlier_list_col = df[(df[col]<Q1-outlier_step)|(df[col]>Q3+outlier_step)].index 
        # Determine a list of indices of outliers for feature col
        outlier_indices.extend(outlier_list_col)
        # append the found outlier indices for col to the list of outlier indices
    outlier_indices = Counter(outlier_indices)
    multiple_outliers = list(k for k, v in outlier_indices.items() if v > n)
    # select observations containing more than n outliers
    return multiple_outliers

调用

axis = 0是删除行
Outliers_to_drop是获得的行标签

Outliers_to_drop = Detect_3Outliers(Data,1,Flylabel)
# 重置索引
Data = Data.drop(Outliers_to_drop, axis = 0).reset_index(drop=True)

以上是关于总结--DataFrame根据部分列的异常值删除多行的主要内容,如果未能解决你的问题,请参考以下文章

pandas去重

在大熊猫DataFrame中按组删除异常值的更快方法[重复]

R语言使用isna函数查看列表和dataframe中是否包含缺失值将dataframe中数据列中的异常值标注为缺失值NA使用na.omit函数删除dataframe中包含缺失值NA的数据行

sql server 2005多表连接查询,去除部分列不查询

根据列值删除Python Pandas中的DataFrame行[重复]

根据列值有效地从宽 Spark Dataframe 中删除列