多层感知器实现:权重变得疯狂
Posted
技术标签:
【中文标题】多层感知器实现:权重变得疯狂【英文标题】:Multilayer perceptron implementation: weights go crazy 【发布时间】:2013-08-02 05:14:36 【问题描述】:我正在编写具有单个输出单元(二进制分类)的 MLP 的简单实现。我需要它用于教学目的,所以我不能使用现有的实现:(
我设法创建了一个有效的虚拟模型并实现了训练功能,但 MLP 没有收敛。实际上,输出单元的梯度在多个时期内仍然很高,因此其权重接近无穷大。
我的实现:
import numpy as np
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
X = np.loadtxt('synthetic.txt')
t = X[:, 2].astype(np.int)
X = X[:, 0:2]
# Sigmoid activation function for output unit
def logistic(x):
return 1/(1 + np.exp(-x))
# derivative of the tanh activation function for hidden units
def tanh_deriv(x):
return 1 - np.tanh(x)*np.tanh(x)
input_num = 2 # number of units in the input layer
hidden_num = 2 # number of units in the hidden layer
# initialize weights with random values:
weights_hidden = np.array((2 * np.random.random( (input_num + 1, hidden_num + 1) ) - 1 ) * 0.25)
weights_out = np.array((2 * np.random.random( hidden_num + 1 ) - 1 ) * 0.25)
def predict(x):
global input_num
global hidden_num
global weights_hidden
global weights_out
x = np.append(x.astype(float), 1.0) # input to the hidden layer: features + bias term
a = x.dot(weights_hidden) # activations of the hidden layer
z = np.tanh(a) # output of the hidden layer
q = logistic(z.dot(weights_out)) # input to the output (decision) layer
if q >= 0.5:
return 1
return 0
def train(X, t, learning_rate=0.2, epochs=50):
global input_num
global hidden_num
global weights_hidden
global weights_out
weights_hidden = np.array((2 * np.random.random( (input_num + 1, hidden_num + 1) ) - 1 ) * 0.25)
weights_out = np.array((2 * np.random.random( hidden_num + 1 ) - 1 ) * 0.25)
for epoch in range(epochs):
gradient_out = 0.0 # gradients for output and hidden layers
gradient_hidden = []
for i in range(X.shape[0]):
# forward propagation
x = np.array(X[i])
x = np.append(x.astype(float), 1.0) # input to the hidden layer: features + bias term
a = x.dot(weights_hidden) # activations of the hidden layer
z = np.tanh(a) # output of the hidden layer
q = z.dot(weights_out) # activations to the output (decision) layer
y = logistic(q) # output of the decision layer
# backpropagation
delta_hidden_s = [] # delta and gradient for a single training sample (hidden layer)
gradient_hidden_s = []
delta_out_s = t[i] - y # delta and gradient for a single training sample (output layer)
gradient_out_s = delta_out_s * z
for j in range(hidden_num + 1):
delta_hidden_s.append(tanh_deriv(a[j]) * (weights_out[j] * delta_out_s))
gradient_hidden_s.append(delta_hidden_s[j] * x)
gradient_out = gradient_out + gradient_out_s # accumulate gradients over training set
gradient_hidden = gradient_hidden + gradient_hidden_s
print "\n#", epoch, "Gradient out: ",gradient_out,
print "\n Weights out: ", weights_out
# Now updating weights
weights_out = weights_out - learning_rate * gradient_out
for j in range(hidden_num + 1):
weights_hidden.T[j] = weights_hidden.T[j] - learning_rate * gradient_hidden[j]
train(X, t, 0.2, 50)
以及输出单元在历元上的梯度和权重的演变:
0 Gradient out: [ 11.07640724 -7.20309009 0.24776626]
Weights out: [-0.15397237 0.22232593 0.03162811]
1 Gradient out: [ 23.68791197 -19.6688382 -1.75324703]
Weights out: [-2.36925382 1.66294395 -0.01792515]
2 Gradient out: [ 79.08612305 -65.76066015 -7.70115262]
Weights out: [-7.10683621 5.59671159 0.33272426]
3 Gradient out: [ 99.59798656 -93.90973727 -21.45674943]
Weights out: [-22.92406082 18.74884362 1.87295478]
...
49 Gradient out: [ 107.89975864 -105.8654327 -104.69591522]
Weights out: [-1003.67912726 976.87213404 922.38862049]
我尝试了不同的数据集,不同数量的隐藏单元。我试图用加法而不是减法来更新权重......没有任何帮助......
谁能告诉我可能出了什么问题? 提前致谢
【问题讨论】:
嗨,你能附上'synthetic.txt'吗,我会调试它并希望发现需要的更正,我已经发现了一些缺失的部分,比如需要添加到输出的偏置项层,以及改变更新偏差的机制,这与更新其他权重完全不同。谢谢 嗨,其实我已经解决了这个问题。你是对的,我错过了隐藏层的偏差。另外,我重写了平方和误差函数的反向传播。感谢您的关注。 【参考方案1】:我不认为您应该使用平方和误差函数进行二元分类。相反,您应该使用交叉熵误差函数,它基本上是一个似然函数。这样,您的预测距离正确答案的时间越长,错误就会变得更加昂贵。请阅读 Christopher Bishop 的“模式识别和机器学习”中关于“网络训练”第 235 页的部分,这将为您提供有关如何在 FFNN 上进行监督学习的正确概述。
偏置单元非常重要,因此它们使传递函数成为可能。沿 x 曲线移动。权重将改变传递函数的陡度。曲线。请注意偏差和权重之间的差异,因为它可以很好地理解为什么它们都需要出现在 FFNN 中。
【讨论】:
以上是关于多层感知器实现:权重变得疯狂的主要内容,如果未能解决你的问题,请参考以下文章
TensorFlow从0到1之TensorFlow多层感知机实现MINIST分类(22)