93 Fuzzy Qlearning and Dynamic Fuzzy Qlearning
Posted jtailong
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了93 Fuzzy Qlearning and Dynamic Fuzzy Qlearning相关的知识,希望对你有一定的参考价值。
Introduction
In the reinforcement learning paradigm, an agent receives from its envrionment a scalar reward value called (reinforcement). This feedback is rather poor: it can be loolean (true, false) or fuzzy (bad, fair, very good, ...), and, moreover, ti may be delayed. A sequence of control actions is often executed before receiving any information on the quality of the whole sequence. Therefore, it is difficult to evaluate the contribution of on individual action.
Q-learning
Q-learning is a form of competitve learning which provides agents with the capability of learning to act optimally by evaluatiing the consequences of actons. Q-learning keeps a Q-function which attempts to estimate the discounted future reinforcement fo taking actions from given states. A Q-function is a mapping from state-action pairs to predicted reinforcement. In order to explain the method, we adopt the implementation proposed by Bersini.
- The state space, (Usubset R^{n}), is partitioned into hypercubes or cells. Among these cells we can distinguish: (a) one particular cell, called the target cell, to which the quality value +1 is assigned, (b) a subset of cells, called viability zone, that the process must not leave. The quality value for viability zone is 0. This notion of viability zone comes from Aubin and eliminates strong constraints on a reference trajectory for the process. (c) the remaining cells, called failure zone, with the quality value -1.
- In each cell, a set of (J) agents compete to control a process. With (M) cells, the agent (j), $j in {1,ldots, J} $, acting in cell (c), (cin{1,ldots,M}), is characterized by its quality value Q[c,j]. The probability to agent (j) in cell (c) will be selected is given by a Boltzmann distribution.
- The selected agent controls the process as long as the process stays in the cell. When the process leaves the cell (c) to get into a cell (c^{'}), at time step (t), another agent is selected for cell (c^{'}) and the Q-function of the previous agent is incremented by:
[Delta Q[c,j] = alpha { r(t)+ gamma underset{k}{max}Q[k,c^{'}]-Q[c,j] } ]
where (alpha) is the learning rate ((alpha <1)), (gamma) the discount rate (gamma <1) and (r(t)) the reinforcement.
[r(t)left{egin{matrix} +1 &{ m if} c^{'} { m is the target cell reward } \ 0& { m if} c^{'} { m is in the viability zone}\ -1& { m if } c^{'} { m is in the failure zone (punishment)} end{matrix} ight.]
以上是关于93 Fuzzy Qlearning and Dynamic Fuzzy Qlearning的主要内容,如果未能解决你的问题,请参考以下文章
Paper: A novel method for forecasting time series based on fuzzy logic and visibility graph