Reinforcement Learning for the Space Turtle

1 collaborator

WHAT IS IT?

This model demonstrates the movement of a turtle to a target patch in a 3D environment according to a policy learned by reinforcement learning.

HOW IT WORKS

The environment the turtle is moving within is a 3 x 3 x 3 cube. The target patch is the patch with maximal coordinates (i.e. 3 3 3). The coordinates of patch the turtle starts moving from are set in a random manner by pressing the respective button. The trajectory is selected based on the policy computed with the aid of {ReinforcementLearning} package in R.

HOW TO USE IT

The 'Setup' button generates the world (i.e. the cube and the turtle). Original coordinates of the turtle are ( 1 1 1 ). One can start moving the turtle from these coordinates. Other coordinates to start with can be randomly set by pressing 'Set Start Patch' button (after presing 'Setup'). The 'Go' button/procedure will move the turtle to the target patch.

The 'Move' monitor shows the move direction. The moment the turtle reaches the target patch will be displayed as "Win" on the 'Satus' monitor. The number of moves is shown on the respective monitor.

THINGS TO NOTICE

Both the NetLogo model file and the .csv file with the policy should be located in the same directory. The path to the .csv file should be 'corrected' in NetLogo code based on the actual location of the file on your computer.

THINGS TO TRY

For a better visualization of the turtle position and trajectory the 3D View can be 'orbited'/moved.

NETLOGO FEATURES

This is a 3D model based on a policy generated by reinforcement learning, using ReinforcementLearning R package. The '.csv' file with the policy should be located in the same directory where the NetLogo model is downloaded.

CREDITS AND REFERENCES

This model was developed by Victor Iapascurta, MD. At time of development he was in the Department of Anesthesia and Intensive Care at University of Medicine and Pharmacy in Chisinau, Moldova / ICU at City Emergency Hospital in Chisinau. Please email any questions or comments to viapascurta@yahoo.com

The model was created in NetLogo 6.2.0, Wilensky, U. (1999). NetLogo. http://ccl.northwestern.edu/netlogo/. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.

Comments and Questions

Please start the discussion about this model! (You'll first need to log in.)

Click to Run Model

extensions [csv]

globals [
  curr-patch
  start-patch
  status
  move-dir

  px
  py
  pz

  model-policy ;; reinforcement learning model policy
  selected-move ;; a particular move selected from the policy
]

to setup
  clear-all
  file-close-all

  crt-world
  draw-cube
  draw-axes
  crt-turtle

  set status "Playing"

  set-start-patch
  set selected-move []
  set model-policy []
  set move-dir 0

  set px 0
  set py 0
  set pz 0

  orbit-right 5
  orbit-down 90

  reset-ticks
end 

to draw-axes
  create-turtles 1 [ set shape "line"
          set heading 90
          set color red
          set size world-width
          stamp
          die ]
  create-turtles 1 [ set shape "line"
          set color orange
          set heading 0
          set size world-height
          stamp
          die ]
  create-turtles 1 [ set shape "line"
          set pitch 90
          set color blue
          set size world-depth
          stamp
          die ]
  ask patch max-pxcor 0 0 [ set plabel "x-axis"
                            set plabel-color red]

  ask patch 0 max-pycor 0 [ set plabel "y-axis"
                            set plabel-color orange]

  ask patch 0 0 max-pzcor [ set plabel "z-axis"
                            set plabel-color blue]
end 

to draw-cube
  ask patches
    [
      if (pxcor >= 1 and pxcor <= max-pxcor and
          pycor >= 1 and pycor <= max-pycor and
          pzcor >= 1 and pzcor <= max-pzcor)
        [ set pcolor [0 255 0 20] ]
  ]
end 

to crt-turtle
  crt 1 [
      set color yellow
      setxyz 1 1 1
      set heading 90
      set pitch 0
      set roll 0
  ]
end 

to crt-world

  resize-world 0 3 0 3 0 3
end 

to go
  ifelse(status = "Win")
    [stop]
    [ play-model
    tick]
end 

to set-start-patch ;; the patch to start with; coordinates are random
  set px  one-of [1 2 3]
  set py  one-of [1 2 3]
  set pz  one-of [1 2 3]
  set start-patch patch px  py  pz

  set px 0
  set py 0
  set pz 0
end 

to move-to-start
  ask turtles [
  move-to start-patch
  ]
end 

to play-model ; the procedure of moving the turtle according to the RL model
  ask turtles [
    set curr-patch patch-here

;; loading model policy from a separate .csv file, located in the same folder
    set model-policy csv:from-file "C:/Users/Victor/Desktop/RL/NetLogo_models/Final_RL_S_turtle/Policy_m.csv" ;; "~/path to .csv file/Policy_m.csv"

;; iterating over the model policy and selecting the optimal move
    foreach model-policy [ i -> if xcor = read-from-string item 6 item 0 i and
                                  ycor = read-from-string item 8 item 0 i and
                                  zcor = read-from-string item 10 item 0 i
;; selected-move is the particular move selected out of the policy
                                  [set selected-move lput i selected-move]
                         ]
    pen-down

    if item 1 item 0 selected-move = "forward"
       [ set move-dir "forward"
         fd 1]

    if item 1 item 0 selected-move = "backward"
       [ set move-dir "backward"
         bk 1]

    if item 1 item 0 selected-move = "right"
       [ set move-dir "right"
         set heading 180
         fd 1
         set heading 90]

    if item 1 item 0 selected-move = "left"
       [ set move-dir "left"
         set heading 0
         fd 1
         set heading 90]

    if item 1 item 0 selected-move = "up"
       [ set move-dir "up"
         set pitch 90
         fd 1
         set pitch 0]

     if item 1 item 0 selected-move = "down"
        [ set move-dir "down"
          set pitch -90
          fd 1
          set pitch 0]

    set selected-move []

    if (patch-here = patch 3 3 3)
      [set status "Win"
       stop]

    ]
end

There is only one version of this model, created over 4 years ago by Victor Iapascurta.

Attached files

File	Type	Description	Last updated
Policy_m.csv	data	Model Policy	over 4 years ago, by Victor Iapascurta	Download
Reinforcement Learning for the Space Turtle.png	preview	Preview for 'Reinforcement Learning for the Space Turtle'	over 4 years ago, by Victor Iapascurta	Download

This model does not have any ancestors.

This model does not have any descendants.

NetLogo