李宏毅机器学习2021卷积神经网络HW3-Image Classification(更新ing)
Posted 奇跡の山
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了李宏毅机器学习2021卷积神经网络HW3-Image Classification(更新ing)相关的知识,希望对你有一定的参考价值。
学习心得
(1)回顾【李宏毅机器学习CP21】(task6)卷积神经网络,CNN强大在于卷积层强大的特征提取能力,当然我们可以利用CNN将特征提取出来后,用全连接层或决策树、支持向量机等各种机器学习算法模型来进行分类。
(2)Pytorch的vision
库:https://github.com/pytorch/vision
(3)数据加载的基本原理:使用Dataset封装数据集,然后使用Dataloader实现数据并行加载。
文章目录
一、作业目标and要求
1.object和要求
目标:
(1)用CNN进行图片分类
(2)用数据增强(
d
a
t
a
data
data
a
u
g
m
e
n
t
a
t
i
o
n
s
augmentations
augmentations)提高performance
(3)学会利用unlabeled数据
要求:
(1)不能使用额外的数据(禁止使用其他图片数据集或预训练的模型)
(2)不能上网寻找label
作业难度分级:
初级: Build a simple convolutional neural network as the baseline. (2 pts)
中级: Design a better architecture or adopt different data augmentations to improve the performance. (2 pts)
高级: Utilize provided unlabeled data to obtain better results. (2 pts)
2.数据说明
通过卷积神经网络(Convolutional Neural Networks, CNN)对食物图片进行分类。
数据集中的食物图采集于网上,总共11类:Bread, Dairy product, Dessert, Egg, Fried food, Meat, Noodles/Pasta, Rice, Seafood, Soup, Vegetable/Fruit. 每一类用一个数字表示。比如:0表示Bread.
● Training set: 280 * 11 labeled images + 6786 unlabeled images
● Validation set: 30 * 11 labeled images
● Testing set: 3347 images
如打开训练集training
是下面这样的,其中training
数据集的unlabeled
部分和testing
数据集命名格式为[编号].jpg,而validation
数据集和training
数据集的labeled
文件的图片文命名为[类别]_[编号].jpg:
文件目录树为
│ hw3_CNN.ipynb
│
└─food-11
├─testing
├─training
└─validation
二、原始代码
1.导入所需要的库
# Import necessary packages.
import numpy as np
import torch
import torch.nn as nn
import torchvision.transforms as transforms
from PIL import Image
# "ConcatDataset" and "Subset" are possibly useful when doing semi-supervised learning.
from torch.utils.data import ConcatDataset, DataLoader, Subset
from torchvision.datasets import DatasetFolder
# This is for the progress bar.
from tqdm.auto import tqdm
torchvision
实现了常用的图像数据加载功能,例如Imagenet、CIFAR10、MNIST等,以及常用的数据转换操作,极大方便了数据加载。
torch.utils.data.DataLoader
实现的是对数据集的处理。
2.定义Dataset, Data Loader, and Transforms
(1)Dataset
在PyTorch中,可以利用torch.utils.data
的Dataset
和DataLoader
来包装data,使后续的training和testing更为方便。
Data需要overload两个函数:__len__
和__getitem__
:
1)__len__
必须要回传dataset的大小
2)__getitem__
定义了当程序利用[]
取值时,dataset应该要怎么回传资料
实际上我们不会直接使用到这两个函数,但是使用DataLoader
在enumerate Dataset
时会使用到,没有实做的话代码运行阶段会出现error。
(2)Transforms
Torchvision
提供很多工具进行图像处理、数据wrapping和数据增强(augmentation)。因为我们的数据根据对应的标签类别分别存储在文件夹里,我们可以方便地直接使用,torchvision
主要包含3方面:
(1)model
提供dl中各种经典网络的网络结构及与训练好的模型,包括AlexNet、VGG系列、ResNet系列、Inception系列等。
(2)datasets
提供常用的数据集加载,设计上都是继承torch.utils.data.Dataset
,主要包括MNIST、CIFAR10/100、ImageNet、COCO等。Dataset
对象是一个数据集,可按照下标访问,返回形如(data, label)
的数据。举例一个常用的dataset——torchvision.datasets.DatasetFolder
去wrapping数据。而ImageFolder
假设所有的文件按文件夹保存,每个文件夹下存储同一个类别的图片,文件夹名为类名,其构造函数为:
ImageFolder(root, transform=None,target _transform=None,loader=defalut_loader)
(3)transforms
提供常用的数据预处理操作,主要包括对Tensor及PIL Image对象的操作。
——transforms
的转换范围两部,第一步:构建转换操作,如trans = transforms.Normalize(mean=x, std=y)
;第二步:执行转换操作:例如output=transf(input)
。另外还可以将多个处理操作用Compose拼接起来,构成一个处理转化你的流程。
PS:将Compose将这些操作连接起来(就像nn.Sequential
),这些操作定义后是以对象的形式存在的,真正使用时需要调用它的__call__
方法,类似于nn.Module
。
更多的可以参考PyTorch的官方文档——Transforms部分。
Dataloader
是一个可迭代的对象,它将dataset
返回的每一条数据样本拼接成一个batch,并提供多线程加速优化和数据打乱等操作。当程序对dataset
的所有数据遍历完一遍后,对Dataloader
也就完成了一遍迭代。
# 在训练中进行数据增强是很重要的
# However, not every augmentation is useful.
# 思考哪种类型的数据增强有利于food recognition.
train_tfm = transforms.Compose([
# Resize the image into a fixed shape (height = width = 128)
transforms.Resize((128, 128)),
# You may add some transforms here.
# ToTensor() should be the last one of the transforms.
transforms.ToTensor(),
# 将图片(Image)转成Tensor,归一化至[0, 1]
])
# 在testing and validation中我们不需要进行数据增强。
# All we need here is to resize the PIL image and transform it into Tensor.
test_tfm = transforms.Compose([
transforms.Resize((128, 128)),
transforms.ToTensor(),
])
# Batch size for training, validation, and testing.
# A greater batch size usually gives a more stable gradient.
# But the GPU memory is limited, so please adjust it carefully.
batch_size = 128
# Construct datasets.
# The argument "loader" tells how torchvision reads the data.
train_set = DatasetFolder("food-11/training/labeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm)
valid_set = DatasetFolder("food-11/validation", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm)
unlabeled_set = DatasetFolder("food-11/training/unlabeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm)
test_set = DatasetFolder("food-11/testing", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm)
# Construct data loaders.
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
valid_loader = DataLoader(valid_set, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)
Torchvison
的transforms
还可以通过Lambda
封装自定义的转换策略,如相对PIL Image进行随机旋转,则可以写成trans=T.Lambda(lambda img: img.rotata(random()*360))
。
3.定义模型Model
最基础的模型:先是一个卷积神经网络,再是一个全连接的前向传播神经网络。
卷积神经网络的一级卷积层由卷积层cov+批标准化batchnorm+激活函数ReLU+最大池化MaxPool构成。
调整参数时,最重要的是先调整重要参数:卷积层的卷积核个数、激活函数(可以对输出进行非线性运算)的种类以及输入图像的预处理;其他参数的影响不大(微调即可)。
class Classifier(nn.Module):
def __init__(self):
super(Classifier, self).__init__()
# The arguments for commonly used modules:
# torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
# torch.nn.MaxPool2d(kernel_size, stride, padding)
# input image size: [3, 128, 128]
self.cnn_layers = nn.Sequential(
nn.Conv2d(3, 64, 3, 1, 1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2, 2, 0),
nn.Conv2d(64, 128, 3, 1, 1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(2, 2, 0),
nn.Conv2d(128, 256, 3, 1, 1),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.MaxPool2d(4, 4, 0),
)
self.fc_layers = nn.Sequential(
nn.Linear(256 * 8 * 8, 256),
nn.ReLU(),
nn.Linear(256, 256),
nn.ReLU(),
nn.Linear(256, 11)
)
def forward(self, x):
# input (x): [batch_size, 3, 128, 128]
# output: [batch_size, 11]
# Extract features by convolutional layers.
x = self.cnn_layers(x)
# The extracted feature map must be flatten before going to fully-connected layers.
x = x.flatten(1)
# The features are transformed by fully-connected layers to obtain the final logits.
x = self.fc_layers(x)
return x
笔记:继承的类什么时候使用nn.Module
,什么时候使用nn.functional
?
——如果模型有可学习的参数,最好使用前者,否则(即模型没有可学习的参数)则两种都可以使用,二者在性能上没有太大差异。由于激活函数(ReLU、sigmoid、tanh)、池化(MaxPool)等层没有可学习参数,可以使用对应的functional
函数代替,而卷积、全连接等具有可学习参数的网络建议使用nn.Module
。
ps:虽然dropout操作也没有可学习参数,但一般还是建议
4.训练Training
使用training set进行训练,用validation set选择最好的参数。
def get_pseudo_labels(dataset, model, threshold=0.65):
# This functions generates pseudo-labels of a dataset using given model.
# It returns an instance of DatasetFolder containing images whose prediction confidences exceed a given threshold.
# You are NOT allowed to use any models trained on external data for pseudo-labeling.
device = "cuda" if torch.cuda.is_available() else "cpu"
# Make sure the model is in eval mode.
model.eval()
# Define softmax function.
softmax = nn.Softmax(dim=-1)
# Iterate over the dataset by batches.
for batch in tqdm(dataloader):
img, _ = batch
# Forward the data
# Using torch.no_grad() accelerates the forward process.
with torch.no_grad():
logits = model(img.to(device))
# Obtain the probability distributions by applying softmax on logits.
probs = softmax(logits)
# ---------- TODO ----------
# Filter the data and construct a new dataset.
# # Turn off the eval mode.
model.train()
return dataset
# "cuda" only when GPUs are available.
device = "cuda" if torch.cuda.is_available() else "cpu"
# Initialize a model, and put it on the device specified.
model = Classifier().to(device)
model.device = device
# For the classification task, we use cross-entropy as the measurement of performance.
criterion = nn.CrossEntropyLoss()
# Initialize optimizer, you may fine-tune some hyperparameters such as learning rate on your own.
optimizer = torch.optim.Adam(model.parameters(), lr=0.0003, weight_decay=1e-5)
# The number of training epochs.
n_epochs = 80
# Whether to do semi-supervised learning.
do_semi = False
for epoch in range(n_epochs):
# ---------- TODO ----------
# In each epoch, relabel the unlabeled dataset for semi-supervised learning.
# Then you can combine the labeled dataset and pseudo-labeled dataset for the training.
if do_semi:
# Obtain pseudo-labels for unlabeled data using trained model.
pseudo_set = get_pseudo_labels(unlabeled_set, model)
# Construct a new dataset and a data loader for training.
# This is used in semi-supervised learning only.
concat_dataset = ConcatDataset([train_set, pseudo_set])
train_loader = DataLoader(concat_dataset, batch_size=batch_size, shuffle=True, num_workers=8, pin_memory=True)
# ---------- Training ----------
# Make sure the model is in train mode before training.
model.train()
# These are used to record information in training.
train_loss = []
train_accs = []
# Iterate the training set by batches.
for batch in tqdm(train_loader):
# A batch consists of image data and corresponding labels.
imgs, labels = batch
# Forward the data. (Make sure data and model are on the same device.)
logits = model(imgs.to(device))
# Calculate the cross-entropy loss.
# We don't need to apply softmax before computing cross-entropy as it is done automatically.
loss = criterion(logits, labels.to(device))
# Gradients stored in the parameters in the previous step should be cleared out first.
optimizer.zero_grad()
# Compute the gradients for parameters.
loss.backward()
# Clip the gradient norms for stable training.
grad_norm = nn.utils.clip_grad_norm_(model.parameters(), max_norm=10)
# Update the parameters with computed gradients.
optimizer.step()
# Compute the accuracy for current batch.
acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()
# Record the loss and accuracy.
train_loss.append(loss.item())
train_accs.append(acc)
# The average loss and accuracy of the training set is the average of the recorded values.
train_loss = sum(train_loss) / len(train_loss)
train_acc = sum(train_accs) / len(train_accs)
# Print the information.
print(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")
# ---------- 验证集处理Validation ----------
# Make sure the model is in eval mode so that some modules like dropout are disabled and work normally.
model.eval()
# These are used to record information in validation.
valid_loss = []
valid_accs = []
# Iterate the validation set by batches.
for batch in tqdm(valid_loader):
# A batch consists of image data and corresponding labels.
imgs, labels = batch
# We don't need gradient in validation.
# Using torch.no_grad() accelerates the forward process.
with torch.no_grad():
logits = model(imgs.to(device))
# We can still compute the loss (but not the gradient).
loss = criterion(logits, labels.to(device))
# Compute the accuracy for current batch.
acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()
# Record the loss and accuracy.
valid_loss.append(loss.item())
valid_accs.append(acc)
# The average loss and accuracy for entire validation set is the average of the recorded values.
valid_loss = sum(valid_loss) / len(valid_loss)
valid_acc = sum(valid_accs) / len(valid_accs)
# Print the information.
print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")
5.测试
(1)model.train() :启用 BatchNormalization 和 Dropout
(2)model.eval() :不启用 BatchNormalization 和 Dropout
# Make sure the model is in eval mode.
# Some modules like Dropout or BatchNorm affect if the model is in training mode.
model.eval()
# Initialize a list to store the predictions.
predictions = []
# Iterate the testing set by batches.
for batch in tqdm(test_loader):
# A batch consists of image data and corresponding labels.
# But here the variable "labels" is useless since we do not have the ground-truth.
# If printing out the labels, you will find that it is always 0.
# This is because the wrapper (DatasetFolder) returns images and labels for each batch,
# so we have to create fake labels to make it work normally.
imgs, labels = batch
# We don't need gradient in testing, and we don't even have labels to compute loss.
# Using torch.no_grad() accelerates the forward process.
with torch.no_grad():
logits = model(imgs.to(device))
# Take the class with greatest logit as prediction and record it.
predictions.extend(logits.argmax(dim=-1).cpu().numpy().tolist())
# Save predictions into the file.
with open("predict.csv", "w") as f:
# The first row must be "Id, Category"
f.write("Id,Category\\n")
# For the rest of the rows, each image id corresponds to a predicted class.
for i, pred in enumerate(predictions):
f.write(f"{i},{pred}\\n")
三、修改代码
1.残差神经网络
2.残差神经网络+dropout
附:Pytorch基础教程
(1)Autograd实现了反向传播
(2)torch.nn
构建于Autograd之上,可以用来定义和运行网络,nn.Module
是nn中最重要的类,可以把他看做是一个网络的封装,包含网络各层定义以及forward
方法——调用forward(input)
方法可返回前向传播的结果
Reference
(1)李宏毅2021年机器学习课程ppt
(2)《深度学习框架PyTorch入门与实践》陈云
(3)山大大佬Chris Paul:李宏毅HW3详细笔记
以上是关于李宏毅机器学习2021卷积神经网络HW3-Image Classification(更新ing)的主要内容,如果未能解决你的问题,请参考以下文章
李宏毅《机器学习》丨6. Convolutional Neural Network(卷积神经网络)
李宏毅《机器学习》丨6. Convolutional Neural Network(卷积神经网络)
李宏毅机器学习|图神经网络Graph Nerual Networks(GNN)|学习笔记-part2