对于因果模型的常见评估函数:SHD 和 FDR

Posted ViviranZ

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了对于因果模型的常见评估函数:SHD 和 FDR相关的知识,希望对你有一定的参考价值。

结构汉明距离(Structural Hamming Distance)


结构性汉明距离(SHD)是通过邻接矩阵来比较图形的标准距离。它包括计算两个(二进制)邻接矩阵之间的差异:每条缺失或不在目标图中的边缘都被算作一个错误。请注意,对于有向图来说,两个错误可以被计算在内: 错误方向的边是假的,而良好方向的边是缺失的;double_for_anticausal参数说明了这个问题。将其设置为 "假 "将被视为一个错误。

python的SHD调用代码为:

cdt.metrics.SHD(target, pred, double_for_anticausal=True)

其中参数为:

一个例子:

from cdt.metrics import SHD
from numpy.random import randint
tar, pred = randint(2, size=(10, 10)), randint(2, size=(10, 10))
SHD(tar, pred, double_for_anticausal=False)

参考: 

https://fentechsolutions.github.io/CausalDiscoveryToolbox/html/metrics.html#:~:text=The%20Structural%20Hamming%20Distance%20%28SHD%29%20is%20a%20standard,the%20target%20graph%20is%20counted%20as%20a%20mistake.

误发现率(False Discovery Rate)

误发现率是判断所有discovery中错误的和反向的比例,即:

其中

好复杂,直接存一下NOTEARS里对于几个常见的判断标准的实现过程把

def count_accuracy(B_true, B_est):
    """Compute various accuracy metrics for B_est.

    true positive = predicted association exists in condition in correct direction
    reverse = predicted association exists in condition in opposite direction
    false positive = predicted association does not exist in condition

    Args:
        B_true (np.ndarray): [d, d] ground truth graph, 0, 1
        B_est (np.ndarray): [d, d] estimate, 0, 1, -1, -1 is undirected edge in CPDAG

    Returns:
        fdr: (reverse + false positive) / prediction positive
        tpr: (true positive) / condition positive
        fpr: (reverse + false positive) / condition negative
        shd: undirected extra + undirected missing + reverse
        nnz: prediction positive
    """
    if (B_est == -1).any():  # cpdag
        if not ((B_est == 0) | (B_est == 1) | (B_est == -1)).all():
            raise ValueError('B_est should take value in 0,1,-1')
        if ((B_est == -1) & (B_est.T == -1)).any():
            raise ValueError('undirected edge should only appear once')
    else:  # dag
        if not ((B_est == 0) | (B_est == 1)).all():
            raise ValueError('B_est should take value in 0,1')
        if not is_dag(B_est):
            raise ValueError('B_est should be a DAG')
    d = B_true.shape[0]
    # linear index of nonzeros
    pred_und = np.flatnonzero(B_est == -1)
    pred = np.flatnonzero(B_est == 1)#我们获得的矩阵中是父节点的节点的位置(flatten之后的矩阵中)
    cond = np.flatnonzero(B_true)#真实邻接矩阵中是父节点的节点的位置(flatten之后的矩阵中)
    cond_reversed = np.flatnonzero(B_true.T)#该函数输入一个矩阵,返回扁平化后矩阵中非零元素的位置
    cond_skeleton = np.concatenate([cond, cond_reversed])#能够一次完成多个数组的拼接
    # true pos
    true_pos = np.intersect1d(pred, cond, assume_unique=True)#返回两个输入数组中经过排序的、唯一的值
    # treat undirected edge favorably
    true_pos_und = np.intersect1d(pred_und, cond_skeleton, assume_unique=True)
    true_pos = np.concatenate([true_pos, true_pos_und])
    # false pos
    false_pos = np.setdiff1d(pred, cond_skeleton, assume_unique=True)
    false_pos_und = np.setdiff1d(pred_und, cond_skeleton, assume_unique=True)
    false_pos = np.concatenate([false_pos, false_pos_und])
    # reverse
    extra = np.setdiff1d(pred, cond, assume_unique=True)
    reverse = np.intersect1d(extra, cond_reversed, assume_unique=True)
    # compute ratio
    pred_size = len(pred) + len(pred_und)
    cond_neg_size = 0.5 * d * (d - 1) - len(cond)
    fdr = float(len(reverse) + len(false_pos)) / max(pred_size, 1)
    tpr = float(len(true_pos)) / max(len(cond), 1)
    fpr = float(len(reverse) + len(false_pos)) / max(cond_neg_size, 1)
    # structural hamming distance
    pred_lower = np.flatnonzero(np.tril(B_est + B_est.T))
    cond_lower = np.flatnonzero(np.tril(B_true + B_true.T))
    extra_lower = np.setdiff1d(pred_lower, cond_lower, assume_unique=True)
    missing_lower = np.setdiff1d(cond_lower, pred_lower, assume_unique=True)
    shd = len(extra_lower) + len(missing_lower) + len(reverse)
    return 'fdr': fdr, 'tpr': tpr, 'fpr': fpr, 'shd': shd, 'nnz': pred_size

以上是关于对于因果模型的常见评估函数:SHD 和 FDR的主要内容,如果未能解决你的问题,请参考以下文章

活动节假日促销等营销方式的因果效应评估——方法模型篇

活动节假日促销等营销方式的因果效应评估——方法模型篇

活动节假日促销等营销方式的因果效应评估——方法模型篇

活动节假日促销等营销方式的因果效应评估——特征工程篇

活动节假日促销等营销方式的因果效应评估——特征工程篇

Python实现R包brainwaver中的compute.FDR函数