具有整流线性单元的 1-hidden layer 神经网络

Posted 2023-03-13

技术标签:

【中文标题】具有整流线性单元的 1-hidden layer 神经网络【英文标题】：1-hidden layer neural network with rectified linear units 【发布时间】：2018-05-30 20:49:32 【问题描述】：

我的目标是实现一个具有校正线性单元nn.relu() 和 1024 个隐藏节点的 1 隐藏层神经网络。

# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np
import os
import sys
import tarfile
from IPython.display import display, Image
from scipy import ndimage
from sklearn.linear_model import LogisticRegression
from six.moves.urllib.request import urlretrieve
from six.moves import cPickle as pickle
from six.moves import range
import tensorflow as tf

url = 'https://commondatastorage.googleapis.com/books1000/'
last_percent_reported = None
data_root = '.' # Change me to store data elsewhere

def download_progress_hook(count, blockSize, totalSize):
  """A hook to report the progress of a download. This is mostly intended for users with
  slow internet connections. Reports every 5% change in download progress.
  """
  global last_percent_reported
  percent = int(count * blockSize * 100 / totalSize)

  if last_percent_reported != percent:
    if percent % 5 == 0:
      sys.stdout.write("%s%%" % percent)
      sys.stdout.flush()
    else:
      sys.stdout.write(".")
      sys.stdout.flush()

    last_percent_reported = percent

def maybe_download(filename, expected_bytes, force=False):
  """Download a file if not present, and make sure it's the right size."""
  dest_filename = os.path.join(data_root, filename)
  if force or not os.path.exists(dest_filename):
    print('Attempting to download:', filename) 
    filename, _ = urlretrieve(url + filename, dest_filename, reporthook=download_progress_hook)
    print('\nDownload Complete!')
  statinfo = os.stat(dest_filename)
  if statinfo.st_size == expected_bytes:
    print('Found and verified', dest_filename)
  else:
    raise Exception(
      'Failed to verify ' + dest_filename + '. Can you get to it with a browser?')
  return dest_filename

# If error in download get it here: http://yaroslavvb.com/upload/notMNIST/
train_filename = maybe_download('notMNIST_large.tar.gz', 247336696)
test_filename = maybe_download('notMNIST_small.tar.gz', 8458043)

num_classes = 10
np.random.seed(133)

def maybe_extract(filename, force=False):
  root = os.path.splitext(os.path.splitext(filename)[0])[0]  # remove .tar.gz
  if os.path.isdir(root) and not force:
    # You may override by setting force=True.
    print('%s already present - Skipping extraction of %s.' % (root, filename))
  else:
    print('Extracting data for %s. This may take a while. Please wait.' % root)
    tar = tarfile.open(filename)
    sys.stdout.flush()
    tar.extractall(data_root)
    tar.close()
  data_folders = [
    os.path.join(root, d) for d in sorted(os.listdir(root))
    if os.path.isdir(os.path.join(root, d))]
  if len(data_folders) != num_classes:
    raise Exception(
      'Expected %d folders, one per class. Found %d instead.' % (
        num_classes, len(data_folders)))
  print(data_folders)
  return data_folders

train_folders = maybe_extract(train_filename)
test_folders = maybe_extract(test_filename)

pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f,encoding='latin1')
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    del save  # hint to help gc free up memory
    print('Training set', train_dataset.shape, train_labels.shape)
    print('Validation set', valid_dataset.shape, valid_labels.shape)
    print('Test set', test_dataset.shape, test_labels.shape)

image_size = 28
num_labels = 10

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

batch_size = 128
hidden_nodes = 1024

graph = tf.Graph()
with graph.as_default():

    x_train = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size)) 
    y_ =  tf.placeholder(tf.float32, shape=(batch_size, num_labels)) 
    x_valid = tf.constant(valid_dataset)
    x_test = tf.constant(test_dataset)

    hidden_layer = tf.contrib.layers.fully_connected(x_train,hidden_nodes)

    logits = tf.contrib.layers.fully_connected(hidden_layer, num_labels, activation_fn=None)
    loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=y_ ) )

    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

    train_prediction = tf.nn.softmax(logits)
    valid_relu = tf.contrib.layers.fully_connected(x_valid,hidden_nodes)
    valid_prediction = tf.nn.softmax(tf.contrib.layers.fully_connected(valid_relu,num_labels))

    test_relu = tf.contrib.layers.fully_connected(x_test,hidden_nodes, activation_fn=None)
    test_prediction = tf.nn.softmax(tf.contrib.layers.fully_connected(test_relu,num_labels, activation_fn=None))

steps = 3001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()

    for step in range(steps):
        # Selecting some portion within training data 
        # Note: Better to randomize dataset for Minibatch SGD
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        # Generate the Minibatch
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        # Feed the batch size to dict
        feed_dict = x_train: batch_data, y_:batch_labels
        _, l, prediction = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        if(step % 500 == 0):
            print("Minibatch Loss at step %d: %f"% (step, l))
            print("Minibatch accuracy: %.1f%%" % accuracy(prediction,batch_labels))
            print("Validation accuracy :%.1f%% "% accuracy(valid_prediction.eval(),valid_labels))



    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

我关注this tutorial，它比我的代码更准确。

我想通过使用tf.contrib.layers.fully_connected 作为隐藏层来获得类似的结果，我做对了吗？

编辑：

在 logits 中将输入更改为 hidden_layer

重做的valid_relu、valid_prediction、test_relu、test_prediction

结果：

Minibatch Loss at step 0: 2.389448
Minibatch accuracy: 5.5%
Validation accuracy :8.2% 
Minibatch Loss at step 500: 0.342108
Minibatch accuracy: 92.2%
Validation accuracy :8.2% 
Minibatch Loss at step 1000: 0.543803
Minibatch accuracy: 84.4%
Validation accuracy :8.2% 
Minibatch Loss at step 1500: 0.299978
Minibatch accuracy: 93.8%
Validation accuracy :8.2% 
Minibatch Loss at step 2000: 0.294090
Minibatch accuracy: 93.8%
Validation accuracy :8.2% 
Minibatch Loss at step 2500: 0.333070
Minibatch accuracy: 90.6%
Validation accuracy :8.2% 
Minibatch Loss at step 3000: 0.365324
Minibatch accuracy: 89.1%
Validation accuracy :8.2% 
Test accuracy: 6.8%

【问题讨论】：

改用 Estimator ... 【参考方案1】：

你一开始是对的。以下是一些补充：

由于您正在摆脱手动 FC 层以支持 tf.contrib.layers.fully_connected，因此也删除 w 和 b。这将节省您为这些权重选择正确初始化的时间：

hidden_layer = tf.contrib.layers.fully_connected(x_train, hidden_nodes)
logits = tf.contrib.layers.fully_connected(hidden_layer, num_labels, 
                                           activation_fn=None)

即使在教程中，将数据集作为常量直接放入图中并复制推理节点也是一种不好的做法。相反，只需将 valid_dataset 和 test_dataset 推送为 feed_dict 并评估 train_prediction。

# BAD idea: this potentially large value is stored in the graph, can lead to OOM
x_valid = tf.constant(valid_dataset)
x_test = tf.constant(test_dataset)
...
# BAD idea: model duplication
valid_relu = tf.contrib.layers.fully_connected(x_valid, hidden_nodes)
valid_prediction = tf.nn.softmax(tf.matmul(valid_relu, w) + b)
test_relu = tf.contrib.layers.fully_connected(x_test, hidden_nodes)
test_prediction = tf.nn.softmax(tf.matmul(test_relu, w) + b)

还要注意tensorflow.contrib 是一个实验包。特别是，fully_connected 层已经“毕业”到tf.layers.dense。它做同样的工作，但它的 API 保证是稳定的，而 fully_connected 可以在下一个版本中被弃用。

【讨论】：

我得到了新的 ValueError：layerfully_connected_1 的输入 0 与该层不兼容：：预期 min_ndim=2，发现 ndim=0。收到完整形状：[] 更正：输入应该是隐藏层输出。如果出现任何新错误，请使用 reproducible example 更新问题

以上是关于具有整流线性单元的 1-hidden layer 神经网络的主要内容，如果未能解决你的问题，请参考以下文章

使用整流线性单元的反向传播

R中的“神经网络”包，整流线性单元（ReLU）激活函数？

MLP(multi-layer perceptron)

吴恩达 Deep learning 第一周深度学习概论

神经网络中的激活函数

开关电源学习 MP2315S简介