ROC曲线评估和异常点去除

Posted 0211ji

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ROC曲线评估和异常点去除相关的知识,希望对你有一定的参考价值。

1、详细链接见 https://www.cnblogs.com/mdevelopment/p/9456486.html

复习ROC曲线:

      ROC曲线是一个突出ADS分辨能力的曲线,用来区分正常点和异常点。ROC曲线将TPR召回率描绘为FPR假阳性率的函数。

  曲线下的面积(AUC)越大,曲线越接近水平渐近线,ADS效果越好。

 

def evaluate(scores, labels):
     """
     It retures the auc and prauc scores.
     :param scores: list<float> | the anomaly scores predicted by CellPAD.
     :param labels: list<float> | the true labels.
     :return: the auc, prauc.
     """
    from sklearn import metrics          调用方式为:metrics.评价指标函数名称(parameter)

     fpr, tpr, thresholds = metrics.roc_curve(labels, scores, pos_label=1)

     计算ROC曲线的横纵坐标值,TPR,FPR  

     TPR = TP/(TP+FN) = recall(真正例率,敏感度)       FPR = FP/(FP+TN)(假正例率,1-特异性)
    precision, recall, thresholds = metrics.precision_recall_curve(labels, scores, pos_label=1)

      使用python画precision-recall曲线的代码
     auc = metrics.auc(fpr, tpr)

     auc(xyreorder=False) : ROC曲线下的面积;较大的AUC代表了较好的performance
    pruc = metrics.auc(recall, precision)
   return auc, pruc

 

2、

def detect_anomaly(self, predicted_series, practical_series):

   通过比较预测值和实际值来计算每个点的掉落率。
   然后,它运行filter_anomaly()函数以通过参数“ rule”过滤掉异常。

     """
     It calculates the drop ratio of each point by comparing the predicted value and practical value.
     Then it runs filter_anomaly() function to filter out the anomalies by the parameter "rule".
     :param predicted_series: the predicted values of a KPI series
     :param practical_series: the practical values of a KPI series
     :return: drop_ratios, drop_labels and drop_scores
     """
     drop_ratios = []
     for i in range(len(practical_series)):

          dp=(实际值-预测值)/(预测值+10的7次方)
          dp = (practical_series[i] - predicted_series[i]) / (predicted_series[i] + 1e-7)
          drop_ratios.append(dp)
     drop_scores = []

     如有负数,改为正数
     for r in drop_ratios:
          if r < 0:
              drop_scores.append(-r)
         else:
             drop_scores.append(0.0)

    drop_labels = self.filter_anomaly(drop_ratios) 
    return drop_ratios, drop_labels, drop_scores

3、由2调用filter_anomaly()函数

def filter_anomaly(self, drop_ratios):

    """

     它计算不同方法的阈值(规则),然后调用filter_by_threshold()。
     It calculates the threshold for different approach(rule) and then calls filter_by_threshold().
     - gauss: threshold = mean - self.sigma * std
     - threshold: the given threshold variable
     - proportion: threshold = sort_scores[threshold_index]
     :param drop_ratios: list<float> | a measure of predicted drop anomaly degree
     :return: list<bool> | the drop labels
    """
    if self.rule == ‘gauss‘:
        mean = np.mean(drop_ratios)
        std = np.std(drop_ratios)    方差, 总体标准偏差
        threshold = mean - self.sigma * std 阈值=平均数-方差*sigma
        drop_labels = self.filter_by_threshold(drop_ratios, threshold)
        return drop_labels

    if self.rule == "threshold":
         threshold = self.threshold
        drop_labels = self.filter_by_threshold(drop_ratios, threshold)
        return drop_labels

    if self.rule == "proportion":
        sort_scores = sorted(np.array(drop_ratios))    从小到大排序
        threshold_index = int(len(drop_ratios) * self.proportion)
       threshold = sort_scores[threshold_index]
       drop_labels = self.filter_by_threshold(drop_ratios, threshold)
       return drop_labels

4、由3调用filter_by_threshold函数

def filter_by_threshold(self, drop_scores, threshold):
     """

      通过比较其下降分数和阈值来判断一个点是否为异常。
     It judges whether a point is an anomaly by comparing its drop score and the threshold.
     :param drop_scores: list<float> | a measure of predicted drop anomaly degree.
     :param threshold: float | the threshold to filter out anomalies.
     :return: list<bool> | a list of labels where a point with a "true" label is an anomaly.
     """
     drop_labels = []
     for r in drop_scores:
           if r < threshold:
               drop_labels.append(True)
         else:
              drop_labels.append(False)
     return drop_labels

以上是关于ROC曲线评估和异常点去除的主要内容,如果未能解决你的问题,请参考以下文章

交叉验证分析每一折(fold of Kfold)验证数据的评估指标并绘制综合ROC曲线

分类模型评估之ROC-AUC曲线和PRC曲线

模型评估-2

数据分析-评估指标(F1score和ROC曲线)

机器学习框架及评估指标详解

机器学习--模型评估指标之混淆矩阵,ROC曲线和AUC面积