案例研究——用PyTorch解决一个图像识别问题

Posted 2021-04-07 计算材料学与大数据

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了案例研究——用PyTorch解决一个图像识别问题相关的知识，希望对你有一定的参考价值。

为了加深我们对PyTorch的理解，我们将使用PyTorch解决深度学习中的手写数

字问题。问题如下：给定一个图片数据集，每张图片的大小为28x28像素，训练一个模型能识别每张图片标签的模型。

首先，下载数据集（可以在https://datahack.analyticsvidhya.com/contest/practice-problem-identify-the-digits/上下载，也可以向小编索取该资源），该数据集包含有测试集和训练集，所有图片均为“.png”格式。让我们探索该数据集：

说明：

此脚本的工作路径为：/home/geniusshaoming/Downloads

数据集所在文件夹/data路径如下：/home/geniusshaoming/Downloads/data

/data文件夹的目录或文件有：Sample_Submission.csv test/ test.csv train/ train.csv

模型完整脚本如下：

#导入需要的模块
%pylab inline
import pylab
import os
import numpy as np
import pandas as pd
from scipy.misc import imread
from sklearn.metrics import accuracy_score
#设置种子值，以便我们可以控制模型的随机性
seed = 128
rng = np.random.RandomState(seed)
root_dir = os.path.abspath('.')
print(root_dir)
data_dir = os.path.join(root_dir,'data')
print(data_dir)
#检查路径是否存在
#os.path.exists(data_dir)
#os.path.exists(root_dir)
#Step 1: 数据加载和预处理
#a) 读取数据集“train.csv”和“test.csv”，"train.csv"文件中有一个filename和对应的标签，“test.csv”文件中无对应的标签，需要模型做预测
#加载数据集
train = pd.read_csv(os.path.join(data_dir, 'train.csv'))
test = pd.read_csv(os.path.join(data_dir, 'test.csv'))
sample_submission = pd.read_csv(os.path.join(data_dir, 'Sample_Submission.csv'))
train.head()
#b) 查看我们的图像
img_name = rng.choice(train.filename)
filepath = os.path.join(data_dir, 'train', img_name)
img = imread(filepath, flatten=True)
pylab.imshow(img, cmap='gray')
pylab.axis('off')
pylab.show()
#d)为了更容易操作数据，我们将训练集，测试集转换为nampy数组。
# 加载图片创建训练集、测试集
#temp是一个list，元素个数为总图片个数，每个元素是一个28×28的图片，我们需要将其转换成总图片数×每个图片总特征数的二维矩阵。
temp = []
for img_name in train.filename:
  image_path = os.path.join(data_dir,'train',img_name)
  img = imread(image_path, flatten=True)
  img = img.astype('float32')
  temp.append(img)
train_x = np.stack(temp)
np.stack(train_x).shape
train_x /= 255.0
#将训练集reshape成一个具有784列（每张图片的特征数目），*行的二维矩阵。
train_x = train_x.reshape(-1, 784).astype('float32')
train_x.shape
#转化测试集
temp = []
for img_name in test.filename:
  image_path = os.path.join(data_dir,'test', img_name)
  img = imread(image_path, flatten=True)
  img = img.astype('float32')
  temp.append(img)
test_x = np.stack(temp)
test_x /= 255.0
test_x = test_x.reshape(-1, 784).astype('float32')
train_y = train.label.values
#e)这是一个典型的机器学习问题，我们需要创建一个验证集测试我们模型的性能，训练集：测试集=70：30
#创建验证集
split_size = int(train_x.shape[0]*0.7)
train_x, val_x = train_x[:split_size], train_x[split_size:]
train_y, val_y = train_y[:split_size], train_y[split_size:]
#step 2:建立模型
#该模型包括三层，分别是输入层、隐藏层、输出层。输入层和输出层固定，分别具有28×28,10个神经元，输出层10个神经元表示0-9一共10个类别。
#隐藏层设置500个神经元（当然你可以进行测试，看看多少个神经元时网络性能最好）。训练模型的优化算法是Adam算法。
#导入torch库和自动微分模块中的Variable，Varaible是数据集的输入入口
import torch
from torch.autograd import Variable
#每层的神经元数目
input_num_units = 28*28
hidden_num_units = 500
output_num_units = 10
#其它超参数的设置epochs, batch_size, learning_rate
#当一个完整的数据集通过神经网络一次并且返回一次结果，这个过程称为一个 epoch。
#比如对于一个有 10000 个训练样本的数据集。将 10000个样本分成大小为 500 的 batch，那么完成一个 epoch 需要 20 个 iteration。
epochs = 11
batch_size = 128
learning_rate = 0.001
#b)训练模型
#定义模型
model = torch.nn.Sequential(
  torch.nn.Linear(input_num_units, hidden_num_units),
  torch.nn.ReLU(),
  torch.nn.Linear(hidden_num_units, output_num_units),
)
loss_fn = torch.nn.CrossEntropyLoss()
# 使用Adam优化算法
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
#辅助函数
#数据的预处理得到数据集的一个batch
def preproc(unclean_batch_x):
  #将数值转换到0-1之间
  temp_batch = unclean_batch_x / unclean_batch_x.max()
  return temp_batch
#创建一个batch
def batch_creator(batch_size):
  dataset_name = 'train'
  dataset_length = train_x.shape[0]
  #在数据集中的长度[0-dataset_length]随机选取batch_size个数值
  batch_mask = rng.choice(dataset_length, batch_size)
  #在数据集中获取随机选取的图片的数值矩阵
  batch_x = eval(dataset_name + '_x')[batch_mask]
  batch_x = preproc(batch_x)
  #获取对应的标签
  if dataset_name == 'train':
    batch_y = eval(dataset_name).ix[batch_mask, 'label'].values
  #返回每个batch的标签和图片数值矩阵
  return batch_x, batch_y
#训练神经网络，一个完整的数据集需要total_batch个batches
total_batch = int(train.shape[0]/batch_size)
for epoch in range(epochs):
  avg_cost = 0
  for i in range(total_batch):
    #创建batch
    batch_x, batch_y = batch_creator(batch_size)
    # 向网络中输入一个batch
    x, y = Variable(torch.from_numpy(batch_x)), Variable(torch.from_numpy(batch_y), requires_grad=False)
    pred = model(x)
    #得到损失函数
    loss = loss_fn(pred, y)
    #反向传播过程
    loss.backward()
    optimizer.step()
    avg_cost += loss.data[0]/total_batch
  print(epoch, avg_cost)
# 获得训练集准确率
x, y = Variable(torch.from_numpy(preproc(train_x))), Variable(torch.from_numpy(train_y), requires_grad=False)
pred = model(x)
final_pred = np.argmax(pred.data.numpy(), axis=1)
print("Acc at train: ",accuracy_score(train_y, final_pred))
# 获得验证集准确率
x, y = Variable(torch.from_numpy(preproc(val_x))), Variable(torch.from_numpy(val_y), requires_grad=False)
pred = model(x)
final_pred = np.argmax(pred.data.numpy(), axis=1)
print("Acc at val: ",accuracy_score(val_y, final_pred))

训练集测试分数:

0.8779008746355685

验证集测试分数:

0.867482993197279

如此简单的网络在仅5个epoch下就能达到如此高的准确率，实在令人印象深刻。

以上是关于案例研究——用PyTorch解决一个图像识别问题的主要内容，如果未能解决你的问题，请参考以下文章

基于Pytorch框架实现ENAS算法优化的图像识别技术探索-α迭代随笔

图像识别算法及案例

2021届搜狗智能图像识别-图像算法研究员（北京）

[Python图像识别] 四十八.Pytorch构建Faster-RCNN模型实现小麦目标检测

在 PyTorch 中进行数据增强后得到糟糕的图像

数据诊所 | 一个小案例，从图像识别技术看农业智能化……