How to use the CartPole-v0 function (OpenAI gym)



Python
library
pip

MeCab

numpy
digitize

mgrid

pad

polyfit

prod

shape

matplotlib
figure

pcolormesh

scatter

pytorch
BCELoss, MSELoss

device

Embedding

TensorDataset, Dataloader

RNN, LSTM
scikit-learn
SVC

scipy
interpolate
tkinter
postscript

image display

frame, grid

other
linear interpolation

OpenAI gym
CartPole-v0


Release date:2023/7/28         

In Japanese


■Description of CartPole-v0 function

Operate the inverted pendulum. Start with the pole standing on the cart as shown below and move the cart left and right to keep the pole standing.



■Concrete example using CartPole-v0 function

A simple way to do it is below. Here is an example of using this game as a material for reinforcement learning. Click here for how to install OpenAI gym and numpy.

import gym

env = gym.make('CartPole-v0')    # Cartpole definition
env.reset()    # State initialization

for i in range(100):
    env.render()    # animation
    observation, reward, done, info = env.step(env.action_space.sample())  # Move CartPole and return result
    print("Step:",i,done,"Reward:",reward,"Obs:",observation)


<Operate CartPole:env.step>
Enter 0 in "env.step()" to move left and 1 to move right. "env.action_space.sample()" is a function that randomly selects actions.

<Cartpole state:observation>
As a result of manipulating the cart, the states of the cart and poles are defined in observations. The range of values is the officially defined value.

 array0:Cart position -4.8~4.8
 array1:Cart speed -Inf~Inf
 array2:Pole angle -24~24[rad]
 array3:Pole speed -Inf~Inf


The relationship with the value is as follows.


The details of the angle are as follows.


<Reward acquisition conditions:reward=1>
When all the following conditions are met, you will receive a "reward = 1".

 ① Pole angle within ±0.21
 ② Cart position within ±2.4

You can move the cart even if you don't get any rewards, but if you do reinforcement learning, you need to stop one episode there. (The following error message appears and prompts you to reset once)

You are calling 'step()' even though this environment has already returned done = True. You should always call 'reset()' once you receive 'done = True' -- any further steps are undefined behavior.


<Game end condition:done=True>
The game ends (done=True) when reward=0.









List of related articles



Python
library
pip

MeCab

numpy
digitize

mgrid

pad

polyfit

prod

shape

matplotlib
figure

pcolormesh

scatter

pytorch
BCELoss, MSELoss

device

Embedding

TensorDataset, Dataloader

RNN, LSTM
scikit-learn
SVC

scipy
interpolate
tkinter
postscript

image display

frame, grid

other
linear interpolation

OpenAI gym
CartPole-v0