Reinforcement Learning

Posted 2020-07-26 come_on

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Reinforcement Learning相关的知识，希望对你有一定的参考价值。

the differences are between the three types of learning（supervised, unsupervised and reinforcement）

监督学*、无监督学*和强化学*的区别

supervised learning sort of takes the form of function approximation where you\'re given a bunch of x, y pairs And your goal is to finda function f that will map some new x to a proper y

监督学*是通过对有标签数据进行学*，找到一个能很好拟合函数，对新样本x能得到一个最准确的y（以尽可能正确地对训练集以外的示例标签进行预测）

Unsupervised learning is very similar to supervised learning except that it turns out that you\'re given a bunch of x\'s and your goal is to find some f. That gives you a compact description of the set of x\'s that you\'ve seen. So we call this clustering, or description as opposed to function approximation

无监督学*和监督学*类似，根据大量的无标签训练样本找到最佳拟合函数

reinforcement learning looks a lot like Supervised learning, in that we\'re going to be given a string of pairs of data, and we\'re going to try to learn some functions. But in the function approximation case, a supervized learning case, we were given a bunch of X and Y pairs. We were asked to learn F, but in reinforcement learning, we were given something totally different.Were instead going to be given x\'s and z\'s, and reinforcement learning is one mechanism for doing decision making.

强化学*看起来和监督学*类似，我们试图从一些数据对中学*一些函数。但监督学*的逼*函数是对x,y对而言，而强化学*是一些决策机制。

监督学*(supervised learning)和RL的区别在于，监督学*必须提供十分精确的例子。比如说学*下棋的时候，必须给出每一步的例子，进行训练。或者在训练一个声带系统发声的时候，需要给出每块声带肌肉震动收缩的例子。但是实际上，人们有时候很难得到完整精确的例子（比如说打球的时候，身体每块肌肉的运动的例子），却只能给出每次尝试以后的结果，比如说，这次击球的误差，声带系统发声的相似程度，或者告诉你这盘棋的最后结果。

而且RL学*的系统，给出的反馈往往不是实时的，而是有延时的，也就是你下棋，下了N步之后，在最后的一步才能得到评价输或赢的反馈。而你必须使用这些反馈去指导你之前做决策的过程。这种有延时的反馈信息，很难被监督学*所利用。监督学*更多的是去学*，同一个时间内，两个事情的对应关系。

强化学*模型的建立是通过程序的不断尝试和交互进行改进的

参考资料：

http://blog.csdn.net/ppn029012/article/details/8666328

以上是关于Reinforcement Learning的主要内容，如果未能解决你的问题，请参考以下文章