How to use the CartPole-v0 function (OpenAI gym)

TOP MENU

Python

library
・pip

・MeCab

numpy
・digitize

・mgrid

・pad

・polyfit

・prod

・shape

matplotlib
・figure

・pcolormesh

・scatter

pytorch
・BCELoss, MSELoss

・device

・Embedding

・TensorDataset, Dataloader

・RNN, LSTM

scikit-learn
・SVC

scipy
・interpolate

tkinter
・postscript

・image display

・frame, grid

other
・linear interpolation

OpenAI gym
・CartPole-v0

Release date:2023/7/28　　　　　　　　　

・In Japanese

■Description of CartPole-v0 function

Operate the inverted pendulum. Start with the pole standing on the cart as shown below and move the cart left and right to keep the pole standing.

■Concrete example using CartPole-v0 function

A simple way to do it is below. Here is an example of using this game as a material for reinforcement learning. Click here for how to install OpenAI gym and numpy.

import gym

env = gym.make('CartPole-v0')    # Cartpole definition
env.reset()    # State initialization

for i in range(100):
    env.render()    # animation
    observation, reward, done, info = env.step(env.action_space.sample())  # Move CartPole and return result
    print("Step:",i,done,"Reward:",reward,"Obs:",observation)

＜Operate CartPole:env.step＞
Enter 0 in "env.step()" to move left and 1 to move right. "env.action_space.sample()" is a function that randomly selects actions.

＜Cartpole state:observation＞
As a result of manipulating the cart, the states of the cart and poles are defined in observations. The range of values is the officially defined value.

　array0：Cart position　-4.8～4.8
　array1：Cart speed　-Inf～Inf
　array2：Pole angle　-24～24[rad]
　array3：Pole speed　-Inf～Inf

The relationship with the value is as follows.

The details of the angle are as follows.

＜Reward acquisition conditions:reward=1＞
When all the following conditions are met, you will receive a "reward = 1".

　① Pole angle within ±0.21
　② Cart position within ±2.4

You can move the cart even if you don't get any rewards, but if you do reinforcement learning, you need to stop one episode there. (The following error message appears and prompts you to reset once)

You are calling 'step()' even though this environment has already returned done = True. You should always call 'reset()' once you receive 'done = True' -- any further steps are undefined behavior.

＜Game end condition:done=True＞
The game ends (done=True) when reward=0.

List of related articles