Pytorch学习笔记3.深度学习基础

Posted 贪钱算法还我头发

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Pytorch学习笔记3.深度学习基础相关的知识,希望对你有一定的参考价值。


根据龙良曲Pytorch学习视频整理,视频链接:
【计算机-AI】PyTorch学这个就够了!
(好课推荐)深度学习与PyTorch入门实战——主讲人龙良曲

13.梯度

  • 导数 derivative
  • 偏微分 partial derivate
  • 梯度 gradient(向量)

How to search for minima?

  • θ t + 1 = θ t − α t ▽ f ( θ t ) \\theta_{t+1}=\\theta_t-\\alpha_t\\triangledown f(\\theta_t) θt+1=θtαtf(θt)

Optimizer performance

14.激活函数

  • 连续不可导
  • Sigmoid / Logistic σ ′ = σ ( 1 − σ ) \\sigma'=\\sigma(1-\\sigma) σ=σ(1σ)
    torch.sigmoid()
    F.sigmoid() (import torch.nn.functional as F)
  • Tanh
    torch.tanh()
  • Relu
    torch.relu()
    F.relu() (import torch.nn.functional as F)

Typical Loss

  • Mean Squared Error
    MSE l o s s = ∑ [ y − ( x w + b ) ] 2 loss = \\sum [y-(xw+b)]^2 loss=[y(xw+b)]2
    L 2 − n o r m = ∣ ∣ y − ( x w + b ) ∣ ∣ 2 L2-norm=||y-(xw+b)||_2 L2norm=y(xw+b)2
  • Cross Entropy Loss
    binary
    multi-class
    +softmax
    Leave it to Logistic Regression Part
  • Softmax
    soft version of max
    S ( y i ) = e y i ∑ j e y j S(y_i)=\\frac{e^{y_i}}{\\sum_je^{y_j}} S(yi)=jeyjeyi
    ∂ p i ∂ p j = { p i ( 1 − p i ) i = j − p j ∗ p i i ≠ j \\frac{\\partial p_i}{\\partial p_j}=\\left\\{\\begin{matrix} p_i(1-p_i)&i=j \\\\ -p_j*p_i& i\\neq j \\end{matrix}\\right. pjpi={pi(1pi)pjpii=ji=j

Gradient API

  • torch.autograd.grad(loss, [w1, w2,...])
  • loss.backward()
import torch
import torch.nn.functional as F

x = torch.ones(1)
w = torch.full([1], 2.)
mse = F.mse_loss(torch.ones(1), x*w)
print(mse)  # tensor(1., grad_fn=<MseLossBackward>)

# torch.autograd.grad(mse, [w]) # RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

print(w.requires_grad_())   # tensor([2.], requires_grad=True)

# print(torch.autograd.grad(mse, [w]))    # 动态图未更新会报错RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

mse = F.mse_loss(torch.ones(1), x*w)
# print(torch.autograd.grad(mse, [w]))    # (tensor([2.]),

mse.backward()
print(w.grad)   # tensor([2.])


a = torch.rand(3, requires_grad=True)
print(a)    # tensor([0.0377, 0.4542, 0.1386], requires_grad=True)
p = F.softmax(a, dim=0)
# p.backward()    # 报错 RuntimeError: grad can be implicitly created only for scalar outputs
# retain_graph=True 不会清除计算图
print(torch.autograd.grad(p[0], [a], retain_graph=True))    # (tensor([ 0.1998, -0.1156, -0.0843]),)
print(torch.autograd.grad(p[1], [a], retain_graph=True))    # (tensor([-0.1156,  0.2434, -0.1278]),)
print(torch.autograd.grad(p[2], [a], retain_graph=True))    # (tensor([-0.0843, -0.1278,  0.2121]),)

15.感知机

单一输出感知机求导

∂ E ∂ w j 0 = ( O 0 − t ) O 0 ( 1 − O 0 ) x j 0 \\frac{\\partial E}{\\partial w_{j0}}=(O_0-t)O_0(1-O_0)x^0_j wj0E=(O0t)O0(1O0)xj0

多输出Loss层 (Multi-output Perception)

∂ E ∂ w j k = ( O k − t k ) O k ( 1 − O k ) x j 0 \\frac{\\partial E}{\\partial w_{jk}}=(O_k-t_k)O_k(1-O_k)x^0_j wjkE=(Oktk)Ok(1Ok)xj0

import torch
import torch.nn.functional as F

x = torch.randn(1, 10)
# w = torch.randn(1, 10, requires_grad=True)  # 单一层感知机
w = torch.randn(2, 10, requires_grad=True)  # 多输出Loss层
o = torch.sigmoid(x@w.t())
print(o.shape)  # torch.Size([1, 2])

loss = F.mse_loss(torch.ones(1, 1), o)  # broadcasting
print(loss.shape)   # torch.Size([])
print(loss)   # tensor(0.2094, grad_fn=<MseLossBackward>)

loss.backward()
print(w.grad)
"""
tensor([[-2.0498e-01,  2.4619e-02, -8.0208e-04, -1.3723e-01, -1.3014e-01,
         -1.4648e-01, -7.5119e-02,  4.9381e-02,  2.7161e-01,  4.8075e-02],
        [-4.8705e-03,  5.8495e-04, -1.9058e-05, -3.2607e-03, -3.0922e-03,
         -3.4804e-03, -1.7849e-03,  1.1733e-03,  6.4536e-03,  1.1423e-03]])
"""

16.链式法则

import torch
import torch.nn.functional as F

x = torch.tensor(1.)
w1 = torch.tensor(2., requires_grad=True)
b1 = torch.tensor(1.)
w2 = torch.tensor(2., requires_grad=True)
b2 = torch.tensor(1.)
y1 = x * w1 + b1
y2 = y1 * w2 + b2

dy2_dy1 = torch.autograd.grad(y2, [y1], retain_graph=True)[0]
dy1_dw1 = torch.autograd.grad(y1, [w1], retain_graph=True)[0]
dy2_dw1 = torch.autograd.grad(y2, [w1], retain_graph=True)[0]

print(dy2_dy1 * dy1_dw1)    # tensor(2.)
print(dy2_dw1)  # tensor(2.)

17.反向传播

For an output layer node k ∈ \\in K ∂ E ∂ W j k = O j δ k \\frac{\\partial E}{\\partial W_{jk}}=O_j\\delta_k WjkE=Ojδk

where δ k = O k ( 1 − O k ) ( O k − t k ) \\delta _k = O_k(1-O_k)(O_k-t_k) δk

以上是关于Pytorch学习笔记3.深度学习基础的主要内容,如果未能解决你的问题,请参考以下文章

PyTorch学习笔记:简介与基础知识

PyTorch学习笔记 基础知识

学习笔记:深度学习——基于PyTorch的BERT应用实践

「深度学习一遍过」必修18:基于pytorch的语义分割模型实现

PyTorch深度学习——逻辑斯蒂回归(分类问题)(B站刘二大人P6学习笔记)

对比学习:《深度学习之Pytorch》《PyTorch深度学习实战》+代码