对于因果模型的常见评估函数:SHD 和 FDR
Posted ViviranZ
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了对于因果模型的常见评估函数:SHD 和 FDR相关的知识,希望对你有一定的参考价值。
结构汉明距离(Structural Hamming Distance)
结构性汉明距离(SHD)是通过邻接矩阵来比较图形的标准距离。它包括计算两个(二进制)邻接矩阵之间的差异:每条缺失或不在目标图中的边缘都被算作一个错误。请注意,对于有向图来说,两个错误可以被计算在内: 错误方向的边是假的,而良好方向的边是缺失的;double_for_anticausal参数说明了这个问题。将其设置为 "假 "将被视为一个错误。
python的SHD调用代码为:
cdt.metrics.SHD(target, pred, double_for_anticausal=True)
其中参数为:
一个例子:
from cdt.metrics import SHD
from numpy.random import randint
tar, pred = randint(2, size=(10, 10)), randint(2, size=(10, 10))
SHD(tar, pred, double_for_anticausal=False)
参考:
https://fentechsolutions.github.io/CausalDiscoveryToolbox/html/metrics.html#:~:text=The%20Structural%20Hamming%20Distance%20%28SHD%29%20is%20a%20standard,the%20target%20graph%20is%20counted%20as%20a%20mistake.
误发现率(False Discovery Rate)
误发现率是判断所有discovery中错误的和反向的比例,即:
其中
好复杂,直接存一下NOTEARS里对于几个常见的判断标准的实现过程把
def count_accuracy(B_true, B_est):
"""Compute various accuracy metrics for B_est.
true positive = predicted association exists in condition in correct direction
reverse = predicted association exists in condition in opposite direction
false positive = predicted association does not exist in condition
Args:
B_true (np.ndarray): [d, d] ground truth graph, 0, 1
B_est (np.ndarray): [d, d] estimate, 0, 1, -1, -1 is undirected edge in CPDAG
Returns:
fdr: (reverse + false positive) / prediction positive
tpr: (true positive) / condition positive
fpr: (reverse + false positive) / condition negative
shd: undirected extra + undirected missing + reverse
nnz: prediction positive
"""
if (B_est == -1).any(): # cpdag
if not ((B_est == 0) | (B_est == 1) | (B_est == -1)).all():
raise ValueError('B_est should take value in 0,1,-1')
if ((B_est == -1) & (B_est.T == -1)).any():
raise ValueError('undirected edge should only appear once')
else: # dag
if not ((B_est == 0) | (B_est == 1)).all():
raise ValueError('B_est should take value in 0,1')
if not is_dag(B_est):
raise ValueError('B_est should be a DAG')
d = B_true.shape[0]
# linear index of nonzeros
pred_und = np.flatnonzero(B_est == -1)
pred = np.flatnonzero(B_est == 1)#我们获得的矩阵中是父节点的节点的位置(flatten之后的矩阵中)
cond = np.flatnonzero(B_true)#真实邻接矩阵中是父节点的节点的位置(flatten之后的矩阵中)
cond_reversed = np.flatnonzero(B_true.T)#该函数输入一个矩阵,返回扁平化后矩阵中非零元素的位置
cond_skeleton = np.concatenate([cond, cond_reversed])#能够一次完成多个数组的拼接
# true pos
true_pos = np.intersect1d(pred, cond, assume_unique=True)#返回两个输入数组中经过排序的、唯一的值
# treat undirected edge favorably
true_pos_und = np.intersect1d(pred_und, cond_skeleton, assume_unique=True)
true_pos = np.concatenate([true_pos, true_pos_und])
# false pos
false_pos = np.setdiff1d(pred, cond_skeleton, assume_unique=True)
false_pos_und = np.setdiff1d(pred_und, cond_skeleton, assume_unique=True)
false_pos = np.concatenate([false_pos, false_pos_und])
# reverse
extra = np.setdiff1d(pred, cond, assume_unique=True)
reverse = np.intersect1d(extra, cond_reversed, assume_unique=True)
# compute ratio
pred_size = len(pred) + len(pred_und)
cond_neg_size = 0.5 * d * (d - 1) - len(cond)
fdr = float(len(reverse) + len(false_pos)) / max(pred_size, 1)
tpr = float(len(true_pos)) / max(len(cond), 1)
fpr = float(len(reverse) + len(false_pos)) / max(cond_neg_size, 1)
# structural hamming distance
pred_lower = np.flatnonzero(np.tril(B_est + B_est.T))
cond_lower = np.flatnonzero(np.tril(B_true + B_true.T))
extra_lower = np.setdiff1d(pred_lower, cond_lower, assume_unique=True)
missing_lower = np.setdiff1d(cond_lower, pred_lower, assume_unique=True)
shd = len(extra_lower) + len(missing_lower) + len(reverse)
return 'fdr': fdr, 'tpr': tpr, 'fpr': fpr, 'shd': shd, 'nnz': pred_size
以上是关于对于因果模型的常见评估函数:SHD 和 FDR的主要内容,如果未能解决你的问题,请参考以下文章