神经网络中的 Softmax 函数（Python）

Posted 2023-02-23

技术标签:

【中文标题】神经网络中的 Softmax 函数（Python）【英文标题】：Softmax function in neural network (Python) 【发布时间】：2018-05-02 12:45:01 【问题描述】：

我正在学习神经网络并在 python 中实现它。我首先定义了一个softmax函数，我按照这个问题Softmax function - python给出的解决方案。以下是我的代码：

def softmax(A):
    """
    Computes a softmax function. 
    Input: A (N, k) ndarray.
    Returns: (N, k) ndarray.
    """
    s = 0
    e = np.exp(A)
    s = e / np.sum(e, axis =0)
    return s

给了我一个测试代码，看看sofmax 函数是否正确。 test_array 是测试数据，test_output 是 softmax(test_array) 的正确输出。以下是测试代码：

# Test if your function works correctly.
test_array = np.array([[0.101,0.202,0.303],
                       [0.404,0.505,0.606]]) 
test_output = [[ 0.30028906,  0.33220277,  0.36750817],
               [ 0.30028906,  0.33220277,  0.36750817]]
print(np.allclose(softmax(test_array),test_output))

但是根据我定义的softmax 函数。通过softmax(test_array) 测试数据返回

print (softmax(test_array))

[[ 0.42482427  0.42482427  0.42482427]
 [ 0.57517573  0.57517573  0.57517573]]

谁能告诉我我定义的函数softmax有什么问题？

【问题讨论】：

【参考方案1】：

问题在于你的总和。你在轴 0 上求和，你应该保持轴 0 不变。

要汇总同一示例中的所有条目，即在同一行中，您必须改用轴 1。

def softmax(A):
    """
    Computes a softmax function. 
    Input: A (N, k) ndarray.
    Returns: (N, k) ndarray.
    """
    e = np.exp(A)
    return e / np.sum(e, axis=1, keepdims=True)

使用keepdims 保持形状并能够将e 除以总和。

在您的示例中，e 的计算结果为：

[[ 1.10627664  1.22384801  1.35391446]
 [ 1.49780395  1.65698552  1.83308438]]

那么每个示例的总和（return 行中的分母）为：

[[ 3.68403911]
 [ 4.98787384]]

然后该函数将每一行除以其总和，并给出test_output 中的结果。

正如 MaxU 所指出的，在取幂之前删除最大值是一个很好的做法，以避免溢出：

e = np.exp(A - np.sum(A, axis=1, keepdims=True))

【讨论】：

【参考方案2】：

试试这个：

In [327]: def softmax(A):
     ...:     e = np.exp(A)
     ...:     return  e / e.sum(axis=1).reshape((-1,1))

In [328]: softmax(test_array)
Out[328]:
array([[ 0.30028906,  0.33220277,  0.36750817],
       [ 0.30028906,  0.33220277,  0.36750817]])

或更好的版本，它可以防止大值取幂时溢出：

def softmax(A):
    e = np.exp(A - np.max(A, axis=1).reshape((-1, 1)))
    return  e / e.sum(axis=1).reshape((-1,1))

【讨论】：

【参考方案3】：

您可以自己打印np.sum(e, axis=0)。你会看到它是一个包含 3 个元素 [ 2.60408059 2.88083353 3.18699884] 的数组。然后e / np.sum(e, axis=0)代表上面的3元素数组除以e的每个元素（这也是一个3元素数组）。显然这不是你想要的。

你应该把np.sum(e, axis=0)改成np.sum(e, axis=1, keepdims=True)，这样你就会得到

[[ 3.68403911]                  
 [ 4.98787384]]

相反，这才是你真正想要的。你会得到正确的结果。

我建议您阅读the rules of broadcasting in numpy。它描述了加/减/乘/除如何作用于两个不同大小的数组。

【讨论】：

【参考方案4】：

也许这很有启发性：

>>> np.sum(test_output, axis=1)
array([ 1.,  1.])

请注意，每一行都是标准化的。换句话说，他们希望您独立计算每一行的 softmax。

【讨论】：

谢谢@Mateen Ulhaq

以上是关于神经网络中的 Softmax 函数（Python）的主要内容，如果未能解决你的问题，请参考以下文章

神经网络 softmax函数

softmax及python实现

softmax交叉熵损失函数求导

[人工智能-深度学习-12]：神经网络基础 - 激活函数之SoftMax与多分类神经网络模型

如何在 python 中使用 softmax 输出进行神经网络和机器学习来解释多项 Logit 模型？ [复制]

Softmax函数原理及Python实现