ROC曲线评估和异常点去除
Posted 0211ji
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ROC曲线评估和异常点去除相关的知识,希望对你有一定的参考价值。
1、详细链接见 https://www.cnblogs.com/mdevelopment/p/9456486.html
复习ROC曲线:
ROC曲线是一个突出ADS分辨能力的曲线,用来区分正常点和异常点。ROC曲线将TPR召回率描绘为FPR假阳性率的函数。
曲线下的面积(AUC)越大,曲线越接近水平渐近线,ADS效果越好。
def evaluate(scores, labels):
"""
It retures the auc and prauc scores.
:param scores: list<float> | the anomaly scores predicted by CellPAD.
:param labels: list<float> | the true labels.
:return: the auc, prauc.
"""
from sklearn import metrics 调用方式为:metrics.评价指标函数名称(parameter)
fpr, tpr, thresholds = metrics.roc_curve(labels, scores, pos_label=1)
计算ROC曲线的横纵坐标值,TPR,FPR
TPR = TP/(TP+FN) = recall(真正例率,敏感度) FPR = FP/(FP+TN)(假正例率,1-特异性)
precision, recall, thresholds = metrics.precision_recall_curve(labels, scores, pos_label=1)
使用python画precision-recall曲线的代码
auc = metrics.auc(fpr, tpr)
auc(x, y, reorder=False) : ROC曲线下的面积;较大的AUC代表了较好的performance
pruc = metrics.auc(recall, precision)
return auc, pruc
2、
def detect_anomaly(self, predicted_series, practical_series):
通过比较预测值和实际值来计算每个点的掉落率。
然后,它运行filter_anomaly()函数以通过参数“ rule”过滤掉异常。
"""
It calculates the drop ratio of each point by comparing the predicted value and practical value.
Then it runs filter_anomaly() function to filter out the anomalies by the parameter "rule".
:param predicted_series: the predicted values of a KPI series
:param practical_series: the practical values of a KPI series
:return: drop_ratios, drop_labels and drop_scores
"""
drop_ratios = []
for i in range(len(practical_series)):
dp=(实际值-预测值)/(预测值+10的7次方)
dp = (practical_series[i] - predicted_series[i]) / (predicted_series[i] + 1e-7)
drop_ratios.append(dp)
drop_scores = []
如有负数,改为正数
for r in drop_ratios:
if r < 0:
drop_scores.append(-r)
else:
drop_scores.append(0.0)
drop_labels = self.filter_anomaly(drop_ratios)
return drop_ratios, drop_labels, drop_scores
3、由2调用filter_anomaly()函数
def filter_anomaly(self, drop_ratios):
"""
它计算不同方法的阈值(规则),然后调用filter_by_threshold()。
It calculates the threshold for different approach(rule) and then calls filter_by_threshold().
- gauss: threshold = mean - self.sigma * std
- threshold: the given threshold variable
- proportion: threshold = sort_scores[threshold_index]
:param drop_ratios: list<float> | a measure of predicted drop anomaly degree
:return: list<bool> | the drop labels
"""
if self.rule == ‘gauss‘:
mean = np.mean(drop_ratios)
std = np.std(drop_ratios) 方差, 总体标准偏差
threshold = mean - self.sigma * std 阈值=平均数-方差*sigma
drop_labels = self.filter_by_threshold(drop_ratios, threshold)
return drop_labels
if self.rule == "threshold":
threshold = self.threshold
drop_labels = self.filter_by_threshold(drop_ratios, threshold)
return drop_labels
if self.rule == "proportion":
sort_scores = sorted(np.array(drop_ratios)) 从小到大排序
threshold_index = int(len(drop_ratios) * self.proportion)
threshold = sort_scores[threshold_index]
drop_labels = self.filter_by_threshold(drop_ratios, threshold)
return drop_labels
4、由3调用filter_by_threshold函数
def filter_by_threshold(self, drop_scores, threshold):
"""
通过比较其下降分数和阈值来判断一个点是否为异常。
It judges whether a point is an anomaly by comparing its drop score and the threshold.
:param drop_scores: list<float> | a measure of predicted drop anomaly degree.
:param threshold: float | the threshold to filter out anomalies.
:return: list<bool> | a list of labels where a point with a "true" label is an anomaly.
"""
drop_labels = []
for r in drop_scores:
if r < threshold:
drop_labels.append(True)
else:
drop_labels.append(False)
return drop_labels
以上是关于ROC曲线评估和异常点去除的主要内容,如果未能解决你的问题,请参考以下文章