一份整理 | PyTorch是什么,为何选择它
Posted CSU迦叶
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了一份整理 | PyTorch是什么,为何选择它相关的知识,希望对你有一定的参考价值。
PyTorch是什么
PyTorch是一个基于Python的科学计算包,主要提供以下两种用途:
- 在GPU算力或其他加速器上作为NumPy的替代
- 一个用于实现神经网络的自动求导库
PyTorch的特性
PyTorch的祖先是Chainer,HIPS autograd,twitter-autograd.
这些库有以下两个鲜明的特征:
- 动态的,运行时定义:一个动态框架仅仅通过运行期望的操作来对函数求导,这与实现一个预先符号求导的静态的图结构再多次运行的方式恰恰相反。这样一来就允许用户使用任何想要的宿主语言特性(如各种控制流结构),以每次迭代都要求执行的求导为代价。动态执行将PyTorch和TensorFlow,Caffe这样的静态架构区分开来了。
A dynamic framework defines the function to be differentinated simply by running the desired computation,as opposed to specifying a static graph structure which is differentiated symbolically ahead of time and then run many times.This permits users to use any host-language features they want(e.g.,arbirary control flow constructs),at the cost of requiring differentiation to be carried out every iteration.Dynamic execution distinguishes PyTorch from static frameworks like TensorFlow,Caffe,etc.
简言之,动态计算图不容易优化,当不同输入的网路结构不一致时,难以并行计算,但是灵活性比较高。
- 立即的,迫切的执行:一个迫切的框架在遇到张量运算时才会运行;它甚至避免去实现一个“前向的图”,仅仅记录下对于求导运算必要的内容。这就站在DyNet的对立面上了,DyNet采用懒评估的方式,在每一次训练迭代中都会真实地重建前向和反向图。立即执行允许CPU和GPU运算被管道化,但是放弃了整个网络进行优化和批处理。
An eager framework runs tensor computations as it encounters them;it avoids ever materializing a "forward graph", recording only what is necessary to differentiate the computation.This stands in contrast to DyNet,which employs lazy evaluation,literally rebuilding the forwards and backwards graph every training iteration.Immediate execution allows CPU and GPU computation to be pipelined,but gives up the opportunity for whole-network optimization and batching.
但是让PyTorch从所有动态迫切执行地自动求导库中成了实现最快的一个,还要依赖于它地以下设计和实现选择:
- 原地运算:原地运算会给自动求导带来危害,因为原地运算会使得在微分阶段将要用到的数据失效。此外,它们要求执行特殊的tape转换操作。针对这些问题,PyTorch实现了简单但有效的机制。
In-place operations pose a hazard for automatic differentiation,because an in-place operation can invalidate data that would be needed in the differentiation phase.Additionally,they require nontrivial tape transformations to be performed.PyTorch implements simple but effective mechanisms that address both of these problems.
- 没有tape:传统反转模式求导记录一个tape(也叫Wengert列表),其作用是描述最初的执行操作的顺序,这个优化允许避免一个拓扑分类的实现。PyTorch(以及Chainer)避免了这个tape,取而代之的是,每一个中间结果只记录和它们计算有关子集。这也就意味着PyTorch的用户可以随心所欲地混合与匹配独立的图,以任何想要的方式(没有直接的同步)。以这种方式结构化图的另一个好处是,当图的一部分死去,就会被自动释放,这对于我们想要尽快释放大的内存块来说是一个重要的考虑。
Traditional reverse-mode differentiation records a tape(also known as a Wengert list) describing the order in which operations were originally executed;this optimization allows implementations to avoid a topological sort.PyTorch (and Chainer) eschew this tape;instead,every intermediate result records only the subset of the computation graph that was relevant to their computaion.This means PyTorch users can mix and match indepedent graphs however they like,in whatever threads they like(without explicit synchronization).An added benefit of structuring graphs this way is that when a portion of the graph becomes dead,it is automatically freed;an important consideration when we want to free large memory chunks as quickly as possible.
- 用C++写核心逻辑:PyTorch起源于Python的库;但是,高昂的转换器消耗对于核心AD逻辑来说很快凸显出来。今天,大多数的核心逻辑都是用C++写的,并且将核心操作定义转化成C++仍在进行中。小心调优的C++代码是PyTorch能够实现相比其他框架低得多的开销的首要愿意。
PyTorch started its life as a Python library;however,it quickly became clear that interpreter overhead is too high for core AD logic.Today,most of it is written in C++,and we are in the process of moving core operator definitions to C++.Carefully tuned C++ code is one of the primary reasons PyTorch can achieve much lower overhead compared to other frameworks.
References
[1] DEEP LEARNING WITH PYTORCH: A 60 MINUTE BLITZ
[2] Paszke A, Gross S, Chintala S, et al. Automatic differentiation in pytorch[J]. 2017.
以上是关于一份整理 | PyTorch是什么,为何选择它的主要内容,如果未能解决你的问题,请参考以下文章
19.初识Pytorch之完整的模型套路-整理后的代码 Complete model routine - compiled code