Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

Posted ecoflex

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO相关的知识,希望对你有一定的参考价值。

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

https://statweb.stanford.edu/~owen/mc/Ch-var-is.pdf

 https://zhuanlan.zhihu.com/p/29934206

 

 

 

 

 

 

 

 

 

 

 

 blue curve is the lower bounded one

 

 

 

 

conjugate gradient to solve the optimization problem.

 

 

Fisher information matrix, natural policy gradient

 

 

 

 

 

 

 

 

 

 

 

To write down an optimization problem, we can solve more robustly with more sample efficiency to update policy

 But Lis Lpg is not constrained, so we use KL to ...

 

 

 

it\'s hard to choose beta

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

TRPO is much worse than A3C on imaging game, where PPO does better

see the slide: limitations of TRPO

 

 

 

 

 

 

 

 

 

以上是关于Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO的主要内容,如果未能解决你的问题,请参考以下文章

Deep RL Bootcamp Lecture 8 Derivative Free Methods

Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

Deep RL Bootcamp Lecture 7: SVG, DDPG, and Stochastic Computation Graphs

Deep RL Bootcamp Lecture 2: Sampling-based Approximations and Function Fitting

Deep RL Bootcamp TAs Research Overview