gym 搭建 RL 环境
Posted tolshao
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了gym 搭建 RL 环境相关的知识,希望对你有一定的参考价值。
gym调用
gym的调用遵从以下的顺序
- env = gym.make('x')
- observation = env.reset()
- for i in range(time_steps):
env.render()
action = policy(observation)
observation, reward, done, info = env.step(action)
if done:
……
break - env.close()
例程
例程是一个简单的策略,杆左斜车左移,右斜则右移。
import gym
import numpy as np
env = gym.make('CartPole-v0')
t_all = []
action_bef = 0
for i_episode in range(5):
observation = env.reset()
for t in range(100):
env.render()
cp, cv, pa, pv = observation
if abs(pa)<= 0.1:
action = 1 -action_bef
elif pa >= 0:
action = 1
elif pa <= 0:
action = 0
observation, reward, done, info = env.step(action)
action_bef = action
if done:
# print("Episode finished after {} timesteps".format(t+1))
t_all.append(t)
break
if t ==99:
t_all.append(0)
env.close()
print(t_all)
print(np.mean(t_all))
gym的搭建
gym的函数构成
一个完整的gym环境包括以下函数:类构建、初始化、
- class Cartpoleenv(gym.env)
- def __ init __(self):
- def reset(self):
- def seed(self, seed = None): return [seed]
- def step(self, action): return self.state, reward, done, {}
- def render(self, mode='human'): return self.viewer.render()
- def close():
功能函数
- 参数限位
vel = np.clip(vel, vel_min, vel_max)
action输入校验
self.action_space.contains(action)
action和observation空间定义
Discrete: 0,1,2
low = np.array([min_0,min_1],dtype=np.float32)
high = np.array([max_0,max_1],dtype=np.float32)self.action_space = spaces.Discrete(3) self.observation_space = spaces.Box(
self.low, self.high, dtype=np.float32)
以上是关于gym 搭建 RL 环境的主要内容,如果未能解决你的问题,请参考以下文章
spring练习,在Eclipse搭建的Spring开发环境中,使用set注入方式,实现对象的依赖关系,通过ClassPathXmlApplicationContext实体类获取Bean对象(代码片段