[Pytorch系列-47]:工具集 - torchvision.transforms.Normalize和ToSensor的深入详解
Posted 文火冰糖的硅基工坊
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了[Pytorch系列-47]:工具集 - torchvision.transforms.Normalize和ToSensor的深入详解相关的知识,希望对你有一定的参考价值。
作者主页(文火冰糖的硅基工坊):文火冰糖(王文兵)的博客_文火冰糖的硅基工坊_CSDN博客
本文网址:https://blog.csdn.net/HiWangWenBing/article/details/121300054
目录
第1章 关于标准化的概念说明
1.1 提前声明:网络中对Normalize常见的误读
(1)归一化:有说,Normalize之后的数据被限制在(0,1)之间
(2)有说,Normalize之后的数据被限制在(-1,1)之间
(3)有说,Normalize对任意输入的图片,进行规范化处理,转换成正态分布的图片
上述说法都有偏颇,甚至是明显的错误。
实际上Normalize对图片数据的处理,不是单一的操作,并且他对输入数据有一些自身的假定条件。
1.2 概念澄清
[人工智能-深度学习-43]:输入预处理 - 规范化Normalization、标准化Standardization、正态分布、算术平均、方差_文火冰糖(王文兵)的博客-CSDN博客
1.3 Pytorch对原始图片的规范化的处理流程
在上图中,一个图片文件,要经过多次转换,才能送到神经网络中。
其中:
(1)Transforms.ToSensor(): 完成通道变换 + 归一化
(2)Transforms.Normalize():完成标准化操作
接来下就按照上述图片处理的过程,来阐述Pytorch对归一化和规范化的实现。
第3章 第1步:通过PIL导入图片文件
(1)导入库
#环境准备
import numpy as np # numpy数组库
import matplotlib.pyplot as plt # 画图库
import torch # torch基础库
import torchvision.models as models
from PIL import Image
print("Hello World")
print(torch.__version__)
print(torch.cuda.is_available())
print(torch.version.cuda)
print(torch.backends.cudnn.version())
(2)把图片文件转换成PIL对象
image_dir = '../datasets/imageNet-Mini1000/images/imagenet_2012_000002.png'
image_PIL = Image.open(image_dir)
print("PIL Image data")
#print("image_shape: ", image_PIL.shape)
#print("image_dtype: ", image_PIL.dtype)
print("image_type: ", type(image_PIL))
print(image_PIL)
plt.imshow(image_PIL)
PIL Image data
image_type: <class 'PIL.PngImagePlugin.PngImageFile'>
<PIL.PngImagePlugin.PngImageFile image mode=RGB size=224x224 at 0x1698A694AF0>
Out[94]:
<matplotlib.image.AxesImage at 0x1698a7b9700>
第4章 第2步:把PIL转换成Numpy
# 转换成 numpy
image_numpy = np.array(image_PIL)
print("image_shape: ", image_numpy.shape)
print("image_dtype: ", image_numpy.dtype)
print("image_type: ", type(image_numpy))
print(image_numpy[-1][0])
plt.imshow(image_numpy)
image_shape: (224, 224, 3) image_dtype: uint8 image_type: <class 'numpy.ndarray'> [ 98 101 103]
(1)numpy的图片格式的形状为:(224, 224, 3)
(2)原始图片的像素值为:[ 98 101 103],其值在[0, 255]之间。
第5章 第3步:归一化与通道转换(ToSensor)
5.1 ToSensor类的定义
ToTensor = transforms.ToTensor()
5.2 ToSensor类实例化对象的功能
(1)通道变换:
numpy的数组对图片组织的shape=(224, 224, 3),
pytorch对图片文件的组织是shape=(3,224, 224), 因此需要图片shape的变换。
(2)归一化:
对于任意数值范围的图片,如(0,255)或(0, 65535), 进行归一化处理,转换成 (0, 1)之间的数据表示
(3)归一化的算法
Yi = (Xi - MinValue) / (MaxValue - MinValue)
5.3 代码示例
# image_numpy转换成 tensor:归一化(不影响图片的效果)
image_tensor = transforms.ToTensor()(image_numpy)
print("image_shape: ", image_tensor.shape)
print("image_dtype: ", image_tensor.dtype)
print("image_type: ", type(image_tensor))
print(image_tensor[0])
# 无法显示图片,Pytorch tensor的维度不符合plt的要求
#plt.imshow(image_numpy)
image_shape: torch.Size([3, 224, 224]) image_dtype: torch.float32 image_type: <class 'torch.Tensor'> tensor([[0.0784, 0.0824, 0.0980, ..., 0.2588, 0.2157, 0.1765], [0.0784, 0.0980, 0.1333, ..., 0.2314, 0.1922, 0.1373], [0.0980, 0.0980, 0.1216, ..., 0.0784, 0.1216, 0.1412], ..., [0.4549, 0.3333, 0.3647, ..., 0.3059, 0.2392, 0.3804], [0.3020, 0.3255, 0.4000, ..., 0.2196, 0.2471, 0.2235], [0.3843, 0.3922, 0.3255, ..., 0.3804, 0.2314, 0.2157]])
备注:归一化后的数据在0和1之间。
# 重新转换(不是还原)numpy格式
image_numpy = image_tensor.numpy().transpose(1,2,0)
print("image_shape: ", image_numpy.shape)
print("image_dtype: ", image_numpy.dtype)
print("image_type: ", type(image_numpy))
#重新转换后,数值还保留在(0,1)之间
print(image_numpy[0][0])
plt.imshow(image_numpy)
image_shape: (224, 224, 3)
image_dtype: float32
image_type: <class 'numpy.ndarray'>
[0.07843138 0.04313726 0.00392157]
Out[104]:
<matplotlib.image.AxesImage at 0x1698a9ec1c0>
备注:归一化,并不影响图片的形状与内容。
第6章 第4步: Normalize类说明
6.1 Normalize类说明
(1)概述
在PyTorch团队专门开发的视觉工具包torchvision中,提供了常见的数据预处理操作,封装在transforms类中。
transforms类涵盖了大量对Tensor和对PIL Image的处理操作的“类”,其中,包含了对张量进行归一化的transforms.normalize()。
它的形参包括mean、std等,这些参数用于在实例化Normalization对象时,传递给构造函数。
(2)构造函数的参数说明
mean:定义了正态分布的均值。
std:定义了正态分布的方差。
3个数值:分别对应三个颜色通道。
备注:Normalize指定了需要映射图片的参考正态分布空间。
6.2 标准化的图形转换算法
Yi = (Xi - mean) / std
d = Xi - mean:表示Xi距离均值点的距离
d/std: 表明d距离是标准方差的倍数,即包含多少个标准差。
通过上述转换, 把输入图片中的实际的像素点的值,转换成指定的、正态分布空间中的点。
由于减去了均值,且除以了std,因此映射后的目标空间是标准正态分布空间。
经过标准化操作,实现对图片的标准化变换处理,把所有的数据的转换到标准正态分布的数据集中。
Yi的值,反应的是Xi点,在标准正态空间中的位置,即多少个标准方差。
比如Xi = 1, mean = 0.5, std = 0.25 , Yi = (1-0.5)/ 0.25 = 2.
则表明,Xi=1映射后的Yi,处于标准正态分布的第2个标准方差处。
归一化后,数值被限制在【0,1】之间,且在均值为中心,而不是以0为中心。
、
规范化后,数值以指定的mean位中心,以指定的方差为方差。
不同的图片,其像素点是不相同的,反应在标准正态分布图上,就是所有的图像点在正态分布图上的分布特征不同。
6.3 实例化案例
(1)官方案例:正态分布
normalize = T.Normalize(mean = [0.485, 0.456, 0.406],
std = [0.229, 0.224, 0.225])
(2)常见案例:正态分布
normalize = T.Normalize(mean = [0.5, 0.5, 0.5],
std = [0.2, 0.2, 0.2])
(3)上述定义的前置条件
上述的mean和std都是在[0,1] 之间,即数值分布的中心为【0,1】之间,方差也在【0,1】
这就意味着:
输入图片的像素值:必须在【0,1】之间
输出图片的像素值:区间于三个因素:
- 实际的像素值
- 预定义的均值mean
- 预定义的标准方差std
因此,输出值得范围在[-无穷,+无穷]之间,被映射到了标准正态分布的空间。
6.4 normalize代码示例
(1)源数据空间
#源空间数据
print(image_tensor)
tensor([[[0.0784, 0.0824, 0.0980, ..., 0.2588, 0.2157, 0.1765],
[0.0784, 0.0980, 0.1333, ..., 0.2314, 0.1922, 0.1373],
[0.0980, 0.0980, 0.1216, ..., 0.0784, 0.1216, 0.1412],
...,
[0.4549, 0.3333, 0.3647, ..., 0.3059, 0.2392, 0.3804],
[0.3020, 0.3255, 0.4000, ..., 0.2196, 0.2471, 0.2235],
[0.3843, 0.3922, 0.3255, ..., 0.3804, 0.2314, 0.2157]],
[[0.0431, 0.0510, 0.0510, ..., 0.1333, 0.1569, 0.1412],
[0.0431, 0.0471, 0.0549, ..., 0.1490, 0.1373, 0.1020],
[0.0627, 0.0510, 0.0667, ..., 0.0627, 0.0863, 0.1098],
...,
[0.4667, 0.3647, 0.3490, ..., 0.2784, 0.2627, 0.3922],
[0.3686, 0.3804, 0.4039, ..., 0.2235, 0.2471, 0.2157],
[0.3961, 0.3843, 0.3451, ..., 0.3843, 0.2627, 0.2431]],
[[0.0039, 0.0118, 0.0196, ..., 0.0902, 0.0941, 0.0902],
[0.0078, 0.0157, 0.0196, ..., 0.1059, 0.0863, 0.0588],
[0.0235, 0.0157, 0.0275, ..., 0.0353, 0.0392, 0.0706],
...,
[0.4235, 0.3451, 0.3647, ..., 0.2627, 0.2745, 0.3725],
[0.3569, 0.3569, 0.3804, ..., 0.2392, 0.2667, 0.2392],
[0.4039, 0.3843, 0.3608, ..., 0.3569, 0.2510, 0.2549]]])
(1)官网的方差定义
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
# 转换成 numpy
image_normalized = normalize(image_tensor)
print("image_shape: ", image_normalized.shape)
print("image_dtype: ", image_normalized.dtype)
print("image_type: ", type(image_normalized))
print(image_normalized)
#normalized之后的数据很显然不符合图像显示的格式
#plt.imshow(image_normalized)
image_shape: torch.Size([3, 224, 224]) image_dtype: torch.float32 image_type: <class 'torch.Tensor'> tensor([[[-1.7754, -1.7583, -1.6898, ..., -0.9877, -1.1760, -1.3473], [-1.7754, -1.6898, -1.5357, ..., -1.1075, -1.2788, -1.5185], [-1.6898, -1.6898, -1.5870, ..., -1.7754, -1.5870, -1.5014], ..., [-0.1314, -0.6623, -0.5253, ..., -0.7822, -1.0733, -0.4568], [-0.7993, -0.6965, -0.3712, ..., -1.1589, -1.0390, -1.1418], [-0.4397, -0.4054, -0.6965, ..., -0.4568, -1.1075, -1.1760]], [[-1.8431, -1.8081, -1.8081, ..., -1.4405, -1.3354, -1.4055], [-1.8431, -1.8256, -1.7906, ..., -1.3704, -1.4230, -1.5805], [-1.7556, -1.8081, -1.7381, ..., -1.7556, -1.6506, -1.5455], ..., [ 0.0476, -0.4076, -0.4776, ..., -0.7927, -0.8627, -0.2850], [-0.3901, -0.3375, -0.2325, ..., -1.0378, -0.9328, -1.0728], [-0.2675, -0.3200, -0.4951, ..., -0.3200, -0.8627, -0.9503]], [[-1.7870, -1.7522, -1.7173, ..., -1.4036, -1.3861, -1.4036], [-1.7696, -1.7347, -1.7173, ..., -1.3339, -1.4210, -1.5430], [-1.6999, -1.7347, -1.6824, ..., -1.6476, -1.6302, -1.4907], ..., [ 0.0779, -0.2707, -0.1835, ..., -0.6367, -0.5844, -0.1487], [-0.2184, -0.2184, -0.1138, ..., -0.7413, -0.6193, -0.7413], [-0.0092, -0.0964, -0.2010, ..., -0.2184, -0.6890, -0.6715]]])
image_numpy = image_normalized.numpy().transpose(1,2,0)
print(image_numpy)
plt.imshow(image_numpy)
[[[-1.2156862 -1.5686275 -1.9607843 ] [-1.1764705 -1.4901961 -1.882353 ] [-1.0196078 -1.4901961 -1.8039216 ] ... [ 0.5882354 -0.6666666 -1.0980392 ] [ 0.15686274 -0.43137252 -1.0588235 ] [-0.2352941 -0.58823526 -1.0980392 ]]
......................
备注:标准化后的图片。
(2)常见的方差定义
normalize = transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.2, 0.2, 0.2])
# 转换成 numpy
image_normalized = normalize(image_tensor)
print("image_shape: ", image_normalized.shape)
print("image_dtype: ", image_normalized.dtype)
print("image_type: ", type(image_normalized))
print(image_normalized)
#normalized之后的数据很显然不符合图像显示的格式
#plt.imshow(image_normalized)
image_shape: torch.Size([3, 224, 224]) image_dtype: torch.float32 image_type: <class 'torch.Tensor'> tensor([[[-2.1078, -2.0882, -2.0098, ..., -1.2059, -1.4216, -1.6176], [-2.1078, -2.0098, -1.8333, ..., -1.3431, -1.5392, -1.8137], [-2.0098, -2.0098, -1.8922, ..., -2.1078, -1.8922, -1.7941], ..., [-0.2255, -0.8333, -0.6765, ..., -0.9706, -1.3039, -0.5980], [-0.9902, -0.8725, -0.5000, ..., -1.4020, -1.2647, -1.3824], [-0.5784, -0.5392, -0.8725, ..., -0.5980, -1.3431, -1.4216]], [[-2.2843, -2.2451, -2.2451, ..., -1.8333, -1.7157, -1.7941], [-2.2843, -2.2647, -2.2255, ..., -1.7549, -1.8137, -1.9902], [-2.1863, -2.2451, -2.1667, ..., -2.1863, -2.0686, -1.9510], ..., [-0.1667, -0.6765, -0.7549, ..., -1.1078, -1.1863, -0.5392], [-0.6569, -0.5980, -0.4804, ..., -1.3824, -1.2647, -1.4216], [-0.5196, -0.5784, -0.7745, ..., -0.5784, -1.1863, -1.2843]], [[-2.4804, -2.4412, -2.4020, ..., -2.0490, -2.0294, -2.0490], [-2.4608, -2.4216, -2.4020, ..., -1.9706, -2.0686, -2.2059], [-2.3824, -2.4216, -2.3627, ..., -2.3235, -2.3039, -2.1471], ..., [-0.3824, -0.7745, -0.6765, ..., -1.1863, -1.1275, -0.6373], [-0.7157, -0.7157, -0.5980, ..., -1.3039, -1.1667, -1.3039], [-0.4804, -0.5784, -0.6961, ..., -0.7157, -1.2451, -1.2255]]])
(3)自定义方差定义
normalize = transforms.Normalize(mean=[0.2, 0.2, 0.2], std=[0.1, 0.1, 0.1])
# 转换成 numpy
image_normalized = normalize(image_tensor)
print("image_shape: ", image_normalized.shape)
print("image_dtype: ", image_normalized.dtype)
print("image_type: ", type(image_normalized))
print(image_normalized)
#normalized之后的数据很显然不符合图像显示的格式
#plt.imshow(image_normalized)
image_shape: torch.Size([3, 224, 224]) image_dtype: torch.float32 image_type: <class 'torch.Tensor'> tensor([[[-1.2157, -1.1765, -1.0196, ..., 0.5882, 0.1569, -0.2353], [-1.2157, -1.0196, -0.6667, ..., 0.3137, -0.0784, -0.6275], [-1.0196, -1.0196, -0.7843, ..., -1.2157, -0.7843, -0.5882], ..., [ 2.5490, 1.3333, 1.6471, ..., 1.0588, 0.3922, 1.8039], [ 1.0196, 1.2549, 2.0000, ..., 0.1961, 0.4706, 0.2353], [ 1.8431, 1.9216, 1.2549, ..., 1.8039, 0.3137, 0.1569]], [[-1.5686, -1.4902, -1.4902, ..., -0.6667, -0.4314, -0.5882], [-1.5686, -1.5294, -1.4510, ..., -0.5098, -0.6275, -0.9804], [-1.3725, -1.4902, -1.3333, ..., -1.3725, -1.1373, -0.9020], ..., [ 2.6667, 1.6471, 1.4902, ..., 0.7843, 0.6275, 1.9216], [ 1.6863, 1.8039, 2.0392, ..., 0.2353, 0.4706, 0.1569], [ 1.9608, 1.8431, 1.4510, ..., 1.8431, 0.6275, 0.4314]], [[-1.9608, -1.8824, -1.8039, ..., -1.0980, -1.0588, -1.0980], [-1.9216, -1.8431, -1.8039, ..., -0.9412, -1.1373, -1.4118], [-1.7647, -1.8431, -1.7255, ..., -1.6471, -1.6078, -1.2941], ..., [ 2.2353, 1.4510, 1.6471, ..., 0.6275, 0.7451, 1.7255], [ 1.5686, 1.5686, 1.8039, ..., 0.3922, 0.6667, 0.3922], [ 2.0392, 1.8431, 1.6078, ..., 1.5686, 0.5098, 0.5490]]])
第7章 标准化图像的还原
7.1 归一化映射的还原
归一化公式:Yi = (Xi - MinValue) / (MaxValue - MinValue)
还原公式: Xi = (MaxValue - MinValue) * Yi + MinValue
因此,还原,必须知道原先的最大值和最小值。
7.2 标准化映射的还原
标准化公式:Yi = (Xi - mean) / std
还原公式: Xi = Yi * std + mean
mean = [0.485, 0.456, 0.406]
std=[0.229, 0.224, 0.225]
image_numpy = mean + image_numpy * std
plt.imshow(image_numpy)
作者主页(文火冰糖的硅基工坊):文火冰糖(王文兵)的博客_文火冰糖的硅基工坊_CSDN博客
本文网址:https://blog.csdn.net/HiWangWenBing/article/details/121300054
以上是关于[Pytorch系列-47]:工具集 - torchvision.transforms.Normalize和ToSensor的深入详解的主要内容,如果未能解决你的问题,请参考以下文章