Slime-RL

No preview image

This model is seeking new collaborators — would you please help?

1 collaborator

Default-person Stefano Mariani (Author)

Tags

emergence 

Tagged by Stefano Mariani almost 2 years ago

reinforcement learning 

Tagged by Stefano Mariani almost 2 years ago

self-organisation 

Tagged by Stefano Mariani almost 2 years ago

Visible to everyone | Changeable by the author
Model was written in NetLogo 6.3.0 • Viewed 325 times • Downloaded 22 times • Run 0 times
Download the 'Slime-RL' modelDownload this modelEmbed this model

Do you have questions or comments about this model? Ask them here! (You'll first need to log in.)


## GOALS

Teach learning slimes (= red turtles, aka "learning-turtles") to aggregate in clusters using **basic Q-learning**.

## TL;RD: quick info about RL here

Roughly, the Netlogo screen is split in two:

* on left side of the simulation arena are the parameters of the basic slime mold aggregation model (actually there are more than the original model, as I extended the number of parameters directly configurable at run-time from the GUI)

* on the right side there are RL-related parameters

**Important**: as the left side parameters are related to basic slime behaviours, they obviously also affect learning (e.g. decreasing the evaporation rate makes learning more difficult). Hence, **you are strongly advised to keep them fixed** once you find suitable behaviour with `setup` & `go` (no RL involved there).

To **keep track of experiments** remember to:

1. Explictly modify experiment name in procedure `setup-learning`

2. Configure RL stuff within `ask Learners [...` in procedure `setup-learning

3. Configure actions reported by logging procedure accordingly

4. Explicitly modify lines `"e-greedy", "ACION SPACE", "OBSERVATION SPACE"`, and `"REWARD"` in procedure `log-params` at the very end of file

### NON-RL parameters

As already said, these parameters describe slimes behaviour in the original model, but also indirectly affect learning, making it more difficult or easier (e.g. decreasing the evaporation rate makes learning more difficult).

* `population` controls the number of non-learning slimes (= blue turtles)

* `wiggle-angle` controls how much slimes steer around---no effect on learning

* `look-ahead` controls how far slimes can smell pheromone (higher values enable forming elongated clusters)---no effect on learning

* `sniff-threshold` controls how sensitive slimes are to pheromone (higher values make slimes less sensitive to pheromone)---unclear effect on learning, could be negligible

* `sniff-angle` controls how wide is the cone within which slimes can smell pheromone in nearby patches (higher values make slimes able to smell pheromone in a wider cone)---unclear effect on learning, could be negligible

* `chemical-drop` controls how much pheromone slimes deposit on their patch---unclear effect on learning, could be negligible

* `diffuse-share` controls how much pheromone diffuses in nearby patches (higher values mean more pheromone is diffused)---unclear effect on learning, but **likely lower values make learning more difficult**

* `evaporation-rate` controls how much pheromone is retained over time (higher values mean less pheromone evaporates)---unclear effect on learning, but **likely lower values make learning more difficult**

### RL parameters

All the following parameters have a direct effect on Q-learning of learning slimes.

* `cluster-threshold` controls the minimum number of slimes needed to consider an aggregate within `cluster-radius` a cluster (the higher the more difficult to consider an aggregate a cluster)---**the higher the more difficult to obtain a positive reward** for being within a cluster for learning slimes

* `cluster-radius` controls the range considered by slimes to count other slimes within a cluster (the higher the easier to form clusters, as turtles far apart are still counted together)---**the higher the easier it is to obtain a positive reward** for being within a cluster for learning slimes

* `learning-turtles` controls the number of learning slimes (= red turtles)

* `ticks-per-episode` controls how long a learning episode last (on episode end, slimes position are randomly reset and pheromone is cleared)---slimes should be given enough time to form clusters, hence it is strongly advisable to set this parameter at the very least **2x as low as allowed by non learning slimes forming clusters**

* `episodes` controls how many learning episodes are automatically run

* `learning-rate` is the classical Q-learning param, controlling "how fast" slimes learn---higher values cause bigger adjustements to Q-values

* `discount-factor` is the classical Q-learning param, controlling how much future rewards are given value over immediate ones---higher values cause bigger value given to future rewards

* `reward` is the raw reward value considered by the reward function (check code to see how it is used)

* `penalty` is the raw penalty (= negative reward) value considered by the reward function (check code to see how it is used)

### PLOTS

The top plots tracks the average "size" of clusters (in terms of number of turtles therein) based on two parameters:

* `cluster-threshold` is the minimum number of turtles needed to consider the aggregate a cluster (the higher the more difficult to form legit clusters, hence the more difficult to obtain a positive reward for learning turtles)

* `cluster-radius` is the range considered to count turtles in a cluster (the higher the easier to form clusters, as turtles far apart are still counted together)

This plot is better suited to monitor non-learning turtles behaviour during a `setup` & `go`: the higher the value the less-and-bigger clusters are produced

The bottom plot is meant to monitor learning, as it plots the average reward per episode (average of the individual rewards of each learning turtle).

### Other params

TBD

-----

##

ORIGINAL INFO BELOW

## WHAT IS IT?

This model is inspired by the aggregation behavior of slime-mold cells.

The slime mold spends much of its life as thousands of distinct single-celled units, each moving separately. Under the right conditions, those many cells will coalesce into a single, larger organism. When the environment is less hospitable, the slime mold acts as a single organism; when the weather turns cooler and the mold enjoys a large food supply, "it" becomes a "they." The slime mold oscillates between being a single creature and a swarm.

This model shows how creatures can aggregate into clusters without the control of a "leader" or "pacemaker" cell. This finding was first described by Evelyn Fox Keller and Lee Segel in a paper in 1970.

Before Keller began her investigations, the conventional belief had been that slime mold swarms formed at the command of "pacemaker" cells that ordered the other cells to begin aggregating. In 1962, Shafer showed how the pacemakers could use cyclic AMP as a signal of sorts to rally the troops; the slime mold generals would release the compounds at the appropriate moments, triggering waves of cyclic AMP that washed through the entire community, as each isolated cell relayed the signal to its neighbors. Slime mold aggregation, in effect, was a giant game of Telephone — but only a few elite cells placed the original call.

For the twenty years that followed the publication of Shafer's original essay, mycologists assumed that the missing pacemaker cells were a sign of insufficient data, or poorly designed experiments. But Keller and Segel took another, more radical approach. They shows that Shafer had it wrong -- that the community of slime mold cells were organizing themselves without any need for pacemakers. This was one of the first examples of emergence and self-organization in biology.

Initially, biologists did not accept this explanation. Indeed, the pacemaker hypothesis would continue as the reigning model for another decade. Now, slime mold aggregation is recognized as a classic case study in bottom-up self-organizing behavior.

In this model, each turtle drops a chemical pheromone (shown in green). The turtles also "sniff" ahead, trying to follow the gradient of other turtles' chemicals. Meanwhile, the patches diffuse and evaporate the pheromone. Following these simple, decentralized rules, the turtles aggregate into clusters.

## HOW TO USE IT

Click the SETUP button to set up a collection of slime-mold cells. Click the GO button to start the simulation.

The POPULATION slider controls the number of slime mold cells in the simulation. Changes in the POPULATION slider do not have any effect until the next SETUP command.

The other sliders affect the way turtles move. Changes to them will immediately affect the model run.

SNIFF-THRESHHOLD -- The minimum amount of chemical that must be present in a turtle's patch before the turtle will look for a chemical gradient to follow. This parameter causes the turtles to aggregate only when there are enough other cells nearby. The default value is 1.0.

SNIFF-ANGLE -- The amount, in degrees, that a turtle turns to the left and right to check for greater chemical concentrations. The default value is 45.

WIGGLE-ANGLE -- The maximum amount, in degrees, that a turtle will turn left or right in its random movements. When WIGGLE-ANGLE is set to zero, the turtle will remain at the same heading until it finds a chemical gradient to follow. The default value is 40.

WIGGLE-BIAS -- The bias of a turtle's average wiggle. When WIGGLE-BIAS = 0, the turtle's average movement is straight ahead. When WIGGLE-BIAS > 0, the turtle will tend to move more right than left. When BIAS < 0, the turtle will tend to move more left than right. The default value is 0.

There are several other critical parameters in the model that are not accessible by sliders. They can be changed by modifying the code in the procedures window. They are:

- the evaporation rate of the chemical -- set to 0.9

- the diffusion rate of the chemical -- set to 1

- the amount of chemical deposited at each step -- set to 2

## THINGS TO NOTICE

With 100 turtles, not much happens. The turtles wander around dropping chemical, but the chemical evaporates and diffuses too quickly for the turtles to aggregate.

With 400 turtles, the result is quite different. When a few turtles happen (by chance) to wander near one another, they create a small "puddle" of chemical that can attract any number of other turtles in the vicinity. The puddle then becomes larger and more attractive as more turtles enter it and deposit their own chemicals. This process is a good example of positive feedback: the more turtles, the larger the puddle; and the larger the puddle, the more likely it is to attract more turtles.

## THINGS TO TRY

Try different values for the SNIFF-THRESHOLD, SNIFF-ANGLE, WIGGLE-ANGLE, and WIGGLE-BIAS sliders. How do they affect the turtles' movement and the formation of clumps?

Change the SNIFF-ANGLE and WIGGLE-ANGLE sliders after some clumps have formed. What happens to the clumps? Try the same with SNIFF-THRESHOLD and WIGGLE-BIAS.

## EXTENDING THE MODEL

Modify the program so that the turtles aggregate into a single large cluster.

How do the results change if there is more (or less) randomness in the turtles' motion?

Notice that the turtles only sniff for chemical in three places: forward, SNIFF-ANGLE to the left, and SNIFF-ANGLE to the right. Modify the model so that the turtles sniff all around. How does their clustering behavior change? Modify the model so that the turtles sniff in even fewer places. How does their clustering behavior change?

What "critical number" of turtles is needed for the clusters to form? How does the critical number change if you modify the evaporation or diffusion rate?

Can you find an algorithm that will let you plot the number of distinct clusters over time?

## NETLOGO FEATURES

Note the use of the `patch-ahead`, `patch-left-and-ahead`, and `patch-right-and-ahead` primitives to do the "sniffing".

## RELATED MODELS

Ants uses a similar idea of creatures that both drop chemical and follow the gradient of the chemical.

## CREDITS AND REFERENCES

Keller, E & Segel, L. (1970). Initiation of slime mold aggregation viewed as an instability. Journal of Theoretical Biology,

Volume 26, Issue 3, March 1970, Pages 399–415.

Wilensky, U., & Resnick, M. (1999). Thinking in levels: A dynamic systems approach to making sense of the world. Journal of Science Education and Technology, 8(1), 3-19.

Johnson, S. (2001). Emergence: The Connected Lives of Ants, Brains, Cities, and Software. New York: Scribner.

Resnick, M. (1996). Beyond the centralized mindset. Journal of the Learning Sciences, 5(1), 1-22.

See also http://www.creepinggarden.com for video of slime mold.

## HOW TO CITE

If you mention this model or the NetLogo software in a publication, we ask that you include the citations below.

For the model itself:

* Wilensky, U. (1997). NetLogo Slime model. http://ccl.northwestern.edu/netlogo/models/Slime. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.

Please cite the NetLogo software as:

* Wilensky, U. (1999). NetLogo. http://ccl.northwestern.edu/netlogo/. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.

## COPYRIGHT AND LICENSE

Stefano Mariani (stefano.mariani@unimore.it)

Original copyright info below.

Copyright 1997 Uri Wilensky.

![CC BY-NC-SA 3.0](http://ccl.northwestern.edu/images/creativecommons/byncsa.png)

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/3.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.

Commercial licenses are also available. To inquire about commercial licenses, please contact Uri Wilensky at uri@northwestern.edu.

This model was created as part of the project: CONNECTED MATHEMATICS: MAKING SENSE OF COMPLEX PHENOMENA THROUGH BUILDING OBJECT-BASED PARALLEL MODELS (OBPML). The project gratefully acknowledges the support of the National Science Foundation (Applications of Advanced Technologies Program) -- grant numbers RED #9552950 and REC #9632612.

This model was developed at the MIT Media Lab using CM StarLogo. See Resnick, M. (1994) "Turtles, Termites and Traffic Jams: Explorations in Massively Parallel Microworlds." Cambridge, MA: MIT Press. Adapted to StarLogoT, 1997, as part of the Connected Mathematics Project.

This model was converted to NetLogo as part of the projects: PARTICIPATORY SIMULATIONS: NETWORK-BASED DESIGN FOR SYSTEMS LEARNING IN CLASSROOMS and/or INTEGRATED SIMULATION AND MODELING ENVIRONMENT. The project gratefully acknowledges the support of the National Science Foundation (REPP & ROLE programs) -- grant numbers REC #9814682 and REC-0126227. Converted from StarLogoT to NetLogo, 2000.

Comments and Questions

Please start the discussion about this model! (You'll first need to log in.)

Click to Run Model

;; CHECK ESPECIALLY CAREFULLY COMMENTS WITH "NB" OR "WARNING"
;; 1) Explictly modify experiment name in procedure setup-learning
;; 2) Configure 'actions' global variable
;; 3) Configure RL stuff within "ask Learners [..." in procedure setup-learning accordingly
;; 4) Explicitly modify lines "e-greedy", "OBSERVATION SPACE", and "REWARD" in procedure log-params at the very end of file

extensions[qlearningextension table]

globals [
  actions
  action-distribution     ;; table "action -> number of turtles choosing that action"
  turtle-distribution     ;; table "turtle -> [table action -> number of times action chosen]"
  filename                ;; the file where to report simulation results (automatically appended with a timestamp)
  g-reward-list           ;; list with one entry for each turtle, that is the average reward got so far by such turtle
  g-std-reward-list       ;; list with one entry for each turtle, that is the standard deviation of the average reward got so far by such turtle
  g-max-reward-list       ;; list with one entry for each turtle, that is the maximum reward got so far by such turtle
  g-min-reward-list       ;; list with one entry for each turtle, that is the minimum reward got so far by such turtle
  g-mean-distance-vector  ;; list with one entry for each turtle, that is the average distance from that turtle to any other turtle
  g-std-distance-vector   ;; list with one entry for each turtle, that is the standard deviation of the average distance from that turtle to any other turtle
  g-min-distance-vector   ;; list with one entry for each turtle, that is the minimum distance from that turtle to any other turtle
  g-max-distance-vector   ;; list with one entry for each turtle, that is the maximum distance from that turtle to any other turtle
  episode                 ;; progressive number of the currently running episode (hence number of episodes run)
  is-there-cluster        ;; is there at least one cluser in the whole environment? (boolean)
  first-cluster           ;; whether the cluster now formed is the first one of the episode
  cluster-tick            ;; tick number relative to an episode when the first cluster of that episode is formed
]

patches-own [chemical]    ;; amount of pheromone in the patch

Breed[Learners Learner]   ;; turtles that are learning (shown in red)

Learners-own [
  chemical-here           ;; whether there is pheromone on the patch-here (boolean)
  chemical-gradient       ;; direction where gradient is stronger
  cluster-gradient        ;; direction where cluster is stronger
  p-chemical              ;; amount of pheromone on the patch-here
  reward-list             ;; list of rewards got so far
]

turtles-own [             ;; these variables are also inherited by learners
  ticks-in-cluster        ;; how many ticks the turtle has stayed within a cluster
  cluster                 ;; number of turtles within cluster-radius
  in-cluster              ;; whether the turtle is within a cluster (boolean = cluster > cluster-threshold)
  last-action             ;; name of the action taken by the turtle in last tick
  distance-vector         ;; list with one entry for each turtle, that is the distance from this turtle to that turtle
]

;;;;;;;;;;;;;;;;;;;;;;
;; SETUP procedures ;;
;;;;;;;;;;;;;;;;;;;;;;

to setup                           ;; NO RL here (some RL variables are initialised anyway to avoid errors)
  clear-all

  create-turtles population
  [ set color blue
    set size 2
    setxy random-xcor random-ycor
    set ticks-in-cluster 0
    set cluster 0
    set in-cluster false
    set distance-vector []
    if label?
      [ set label who ] ]

  ask patches [ set chemical 0 ]
  set episode 1
  set is-there-cluster false
  set first-cluster true
  set cluster-tick ticks-per-episode

  reset-ticks
  setup-global-plot "Average cluster size in # of turtles within cluster-radius" "# of turtles" 0

  ;set actions ["move-and-drop" "walk-and-drop"]
  ;set actions ["away-and-drop" "stand-still"]
  set actions ["away-and-drop" "drop-chemical"]
  ;set actions ["away-and-drop" "walk-and-drop"]
  ;set actions ["random-walk" "move-toward-chemical" "drop-chemical"]  ;; NB MODIFY ACTIONS LIST HERE
  setup-action-distribution-table actions
  type "Actions distribution: " print action-distribution

  setup-turtle-distribution-table turtles
  type "Turtles distribution: " print turtle-distribution

  if log-data?
    [ set filename (word "BS-scatter01-" date-and-time ".txt")  ;; NB MODIFY HERE EXPERIMENT NAME
      print filename
      file-open filename
      log-params-nolearn ]

  ask turtles [
    if not (breed = Learners) [
      foreach [self] of turtles [
        t -> if t != self [ set distance-vector lput precision distance t 2 distance-vector ]
      ]
    set g-mean-distance-vector []
    set g-std-distance-vector []
    set g-min-distance-vector []
    set g-max-distance-vector []
    set g-mean-distance-vector lput precision mean distance-vector 2 g-mean-distance-vector
    set g-std-distance-vector lput precision standard-deviation distance-vector 2 g-std-distance-vector
    set g-min-distance-vector lput precision min distance-vector 2 g-min-distance-vector
    set g-max-distance-vector lput precision max distance-vector 2 g-max-distance-vector
    ]
  ]
end 

to setup-learning                  ;; RL
  clear-all

  create-turtles population
  [ set color blue
    set size 2
    setxy random-xcor random-ycor
    set ticks-in-cluster 0
    set cluster 0
    set in-cluster false
    set distance-vector []
    if label?
      [ set label who ] ]

  ask patches [ set chemical 0 ]
  set episode 1
  set is-there-cluster false
  set first-cluster true
  set cluster-tick ticks-per-episode

  reset-ticks
  setup-global-plot "Average cluster size in # of turtles within cluster-radius" "# of turtles" 0

  ;set actions ["random-walk" "stand-still"]
  ;set actions ["random-walk" "move-toward-cluster"]
  ;set actions ["random-walk" "stand-still" "move-toward-cluster"]
  ;set actions ["move-away-chemical" "random-walk" "drop-chemical" "move-toward-chemical"]
  set actions ["random-walk" "drop-chemical" "move-toward-chemical"]
  ;set actions ["move-and-drop" "walk-and-drop"]
  ;set actions ["move-toward-chemical" "random-walk" "move-and-drop" "walk-and-drop" "drop-chemical"]  ;; NB MODIFY ACTIONS LIST HERE
  setup-action-distribution-table actions
  type "Actions distribution: " print action-distribution

  create-Learners learning-turtles
  [ set color red
    set size 2
    setxy random-xcor random-ycor
    set chemical-here false
    set chemical-gradient max-one-of neighbors [chemical]
    set cluster-gradient max-one-of neighbors [count turtles-on neighbors]
    set p-chemical 0
    set reward-list []
    set distance-vector []
    if label?
      [ set label who ] ]

  setup-turtle-distribution-table Learners
  type "Turtles distribution: " print turtle-distribution

  if log-data?
    [ set filename (word "manual-cluster-rew8-mixed35-randomdroponly-" date-and-time ".txt")  ;; NB MODIFY HERE EXPERIMENT NAME
      print filename
      file-open filename
      log-params ]
  set g-reward-list []
  set g-std-reward-list []
  set g-max-reward-list []
  set g-min-reward-list []

  ask Learners [
    ;qlearningextension:state-def ["cluster-gradient" "in-cluster"]
    ;qlearningextension:state-def ["chemical-gradient" "in-cluster"]
    qlearningextension:state-def ["chemical-gradient"] ;; reporter                    ;; reporter could report variables that the agent does not own
    ;qlearningextension:state-def ["chemical-here" "in-cluster"]                        ;; WARNING non-boolean state variables make the Q-table explode in size, hence Netlogo crashes 'cause out of memory!
    ;(qlearningextension:actions [random-walk] [stand-still])
    ;(qlearningextension:actions [move-away-chemical] [random-walk] [drop-chemical] [move-toward-chemical]) ;; admissible actions to be learned in policy WARNING: be sure to not use explicitly these actions in learners!
    (qlearningextension:actions [random-walk] [drop-chemical] [move-toward-chemical])
    ;(qlearningextension:actions [move-toward-chemical] [random-walk] [move-and-drop] [walk-and-drop] [drop-chemical]) ;; NB MODIFY ACTIONS LIST ACCORDING TO "actions" GLOBAL VARIABLE
    ;(qlearningextension:actions [move-and-drop] [walk-and-drop])
    qlearningextension:reward [rewardFunc8]                                            ;; the reward function used
    qlearningextension:end-episode [isEndState] resetEpisode                           ;; the termination condition for an episode and the procedure to call to reset the environment for the next episode
    ; 10000 -> .9 .999 / .9993, 5000, 3000 episodes -> .9 .9985, 1500 ep -> .9 .9965, 500 ep -> .9 .985
    qlearningextension:action-selection "e-greedy" [0.9 0.9985]                          ;; 1st param is chance of random action, 2nd parameter is decay factor applied (after each episode the 1st parameter is updated, the new value corresponding to the current value multiplied by the 2nd param)
    qlearningextension:learning-rate learning-rate
    qlearningextension:discount-factor discount-factor
    foreach [self] of turtles [
      t -> if t != self [ set distance-vector lput precision distance t 2 distance-vector ]
    ]
    set g-mean-distance-vector []
    set g-std-distance-vector []
    set g-min-distance-vector []
    set g-max-distance-vector []
    set g-mean-distance-vector lput precision mean distance-vector 2 g-mean-distance-vector
    set g-std-distance-vector lput precision standard-deviation distance-vector 2 g-std-distance-vector
    set g-min-distance-vector lput precision min distance-vector 2 g-min-distance-vector
    set g-max-distance-vector lput precision max distance-vector 2 g-max-distance-vector
  ]

  setup-global-plot "Average reward per episode" "average reward" 0
end 

;;;;;;;;;;;;;;;;;;;
;; GO procedures ;;
;;;;;;;;;;;;;;;;;;;

to go                                              ;; NO RL
  if episode <= episodes                        ;; = learning episodes not finished
  [ ask turtles
    [ check-cluster

    ifelse scatter?
      [ ifelse chemical > sniff-threshold              ;; ignore pheromone unless there's enough here
          [ away-and-drop ]
          [ drop-chemical ] ]
      [ ifelse chemical > sniff-threshold              ;; ignore pheromone unless there's enough here
          [ move-and-drop ]
          [ walk-and-drop ] ]
    ;drop-chemical                                  ;; drop chemical onto patch

    if table:has-key? action-distribution last-action
      [ let n table:get action-distribution last-action
        table:put action-distribution last-action n + 1 ]
      ;[ type "WARNING: " type who type " choose action " type last-action print " that is NOT in  table!" ]
    let turtles-table table:get turtle-distribution who
    if table:has-key? turtles-table last-action
      [ let n table:get turtles-table last-action
        table:put turtles-table last-action n + 1 ]
      ;[ type "WARNING: " type who type " choose action " type last-action print " that is NOT in  table!" ]
  ]

  diffuse chemical diffuse-share                   ;; diffuse chemical to neighboring patches
  ask patches
  [ set chemical chemical * evaporation-rate       ;; evaporate chemical
    set pcolor scale-color green chemical 0.1 3 ]  ;; update display of chemical concentration

  let c-avg avg-cluster?
  plot-global "Average cluster size in # of turtles within cluster-radius" "# of turtles" c-avg
  log-ticks "average cluster size in # of turtles: " c-avg

  if is-there-cluster
      [ if first-cluster
        [ set cluster-tick ticks - (ticks-per-episode * (episode - 1))
          set first-cluster false
          type "t" type cluster-tick print ") FIRST CLUSTER!" ] ]

  if log-data?
      [ if (((ticks + 1) mod print-every) = 0)                       ;; log experiment data
        [
          ask turtles [
              foreach [self] of turtles [
                t -> if t != self [ set distance-vector lput precision distance t 2 distance-vector ]
              ]
              set g-mean-distance-vector lput precision mean distance-vector 2 g-mean-distance-vector
              set g-std-distance-vector lput precision standard-deviation distance-vector 2 g-std-distance-vector
              set g-min-distance-vector lput precision min distance-vector 2 g-min-distance-vector
              set g-max-distance-vector lput precision max distance-vector 2 g-max-distance-vector
          ]
          let g-mean-distance precision mean g-mean-distance-vector 2
          let g-std-distance precision standard-deviation g-std-distance-vector 2
          let g-min-distance precision min g-min-distance-vector 2
          let g-max-distance precision max g-max-distance-vector 2
          file-open filename
          ;;        Episode,                         Tick,                          Avg cluster size X tick,       Avg reward X episode,     Actions distribution until tick (how many turtles choose each available action)
          file-type episode file-type ", " file-type ticks file-type ", " file-type cluster-tick file-type ", " file-type c-avg file-type ", " file-type g-mean-distance file-type ", " file-type g-std-distance file-type ", " file-type g-min-distance file-type ", " file-type g-max-distance file-type ", "
          print-table action-distribution ", "
          print-table-table turtle-distribution ", "
        ]
      ]

    if (((ticks + 1) mod ticks-per-episode) = 0) [                         ;; an episode has just ended
    ;if (ticks > 1) and (is-there-cluster = true) [
      clear-patches                                                        ;; clear chemical
      set is-there-cluster false                                           ;; reset state variables
      set first-cluster true
      set cluster-tick ticks-per-episode
      ;plot-global "Average reward per episode" "average reward" g-avg-rew
      log-episodes "average reward per episode: " 0
      type "Actions distribution: " print action-distribution
      type "Turtles distribution: " print turtle-distribution
      setup-action-distribution-table actions
      setup-turtle-distribution-table turtles
      set g-reward-list []
      set episode episode + 1

      ask turtles [                                                        ;; reset non learners too
        if not (breed = Learners)
          [
            setxy random-xcor random-ycor
            set ticks-in-cluster 0
            set cluster 0
            set in-cluster false
          ]
      ]
    ]

  tick
  ]
  file-close-all
end 

to learn                                       ;; RL
  if episode <= episodes                        ;; = learning episodes not finished
  [ ask turtles
    [ if not (breed = Learners)                ;; handle non learning slimes as for 'go' procedure
        [ check-cluster
        ifelse chemical > sniff-threshold
          [ move-toward-chemical ]
            ;drop-chemical ]
          [ random-walk ;]
            drop-chemical ]
        ;drop-chemical ]
      ]
    ]

    ask Learners                               ;; handle learning slimes
    [ check-cluster
      set p-chemical [chemical] of patch-here
      ifelse chemical > sniff-threshold
      [ set chemical-here true ]               ;; set state variables
        ;move-toward-chemical ]
      [ set chemical-here false ]
        ;random-walk ]
      set chemical-gradient face-chem-gradient
      set cluster-gradient face-cluster-gradient
      qlearningextension:learning              ;; select an action for the current state, perform the action, get the reward, update the Q-table, verify if the new state is an end state and if so will run the procedure passed to the extension in the end-episode primitive
      ;if (ticks > 0) and ((ticks mod ticks-per-episode) = 0) [
        ;type "Q-table: " print(qlearningextension:get-qtable) ]

      ifelse table:has-key? action-distribution last-action
        [ let n table:get action-distribution last-action
          table:put action-distribution last-action n + 1 ]
        [ type "WARNING: " type who type " choose action " type last-action print " that is NOT in  table!" ]
      let learner-table table:get turtle-distribution who
      ifelse table:has-key? learner-table last-action
        [ let n table:get learner-table last-action
          table:put learner-table last-action n + 1 ]
        [ type "WARNING: " type who type " choose action " type last-action print " that is NOT in  learner table!" ]

    ]

    diffuse chemical diffuse-share
    ask patches
    [ set chemical chemical * evaporation-rate
      set pcolor scale-color green chemical 0.1 3 ]

    let c-avg avg-cluster?
    plot-global "Average cluster size in # of turtles within cluster-radius" "# of turtles" c-avg
    log-ticks "average cluster size in # of turtles: " c-avg

    let g-avg-rew 0
    let g-std-rew 0
    let g-min-rew 0
    let g-max-rew 0

    if is-there-cluster
      [ if first-cluster
        [ set cluster-tick ticks - (ticks-per-episode * (episode - 1))
          set first-cluster false
          type "t" type cluster-tick print ") FIRST CLUSTER!" ] ]


    if log-data?
      [ if (((ticks + 1) mod print-every) = 0)                       ;; log experiment data
        [
          ask Learners [
            foreach [self] of turtles [
              t -> if t != self [ set distance-vector lput precision distance t 2 distance-vector ]
            ]
            set g-mean-distance-vector lput precision mean distance-vector 2 g-mean-distance-vector
            set g-std-distance-vector lput precision standard-deviation distance-vector 2 g-std-distance-vector
            set g-min-distance-vector lput precision min distance-vector 2 g-min-distance-vector
            set g-max-distance-vector lput precision max distance-vector 2 g-max-distance-vector
          ]
          set g-avg-rew avg? g-reward-list
          set g-std-rew avg? g-std-reward-list
          set g-min-rew avg? g-min-reward-list
          set g-max-rew avg? g-max-reward-list
          let g-mean-distance precision mean g-mean-distance-vector 2
          let g-std-distance precision standard-deviation g-std-distance-vector 2
          let g-min-distance precision min g-min-distance-vector 2
          let g-max-distance precision max g-max-distance-vector 2
          file-open filename
          ;;        Episode,                         Tick,                          First cluster tick                    Avg cluster size X tick,       Avg reward X episode,
          file-type episode file-type ", " file-type ticks file-type ", " file-type cluster-tick file-type ", " file-type c-avg file-type ", " file-type g-avg-rew file-type ", " file-type g-std-rew file-type ", " file-type g-min-rew file-type ", " file-type g-max-rew file-type ", " file-type g-mean-distance file-type ", " file-type g-std-distance file-type ", " file-type g-min-distance file-type ", " file-type g-max-distance file-type ", "
          ;; Actions distribution until tick (how many turtles choose each available action)
          print-table action-distribution ", "
          print-table-table turtle-distribution ", "
        ]
      ]

    if (((ticks + 1) mod ticks-per-episode) = 0) [                         ;; an episode has just ended
    ;if (ticks > 1) and (is-there-cluster = true) [
      clear-patches                                                        ;; clear chemical
      set is-there-cluster false                                           ;; reset state variables
      set first-cluster true
      set cluster-tick ticks-per-episode
      set g-avg-rew avg? g-reward-list
      set g-std-rew avg? g-std-reward-list
      set g-min-rew avg? g-min-reward-list
      set g-max-rew avg? g-max-reward-list
      plot-global "Average reward per episode" "average reward" g-avg-rew
      log-episodes "average reward per episode: " g-avg-rew
      type "Actions distribution: " print action-distribution
      type "Turtles distribution: " print turtle-distribution
      setup-action-distribution-table actions
      setup-turtle-distribution-table Learners
      set g-reward-list []
      set g-std-reward-list []
      set g-max-reward-list []
      set g-min-reward-list []
      set episode episode + 1

      ask turtles [                                                        ;; reset non learners too
        if not (breed = Learners)
          [
            setxy random-xcor random-ycor
            set ticks-in-cluster 0
            set cluster 0
            set in-cluster false
          ]
      ]
    ]

    tick
  ]
  file-close-all
end 

;;;;;;;;;;;;;;;;;;;;;;;;;
;; LEARNING procedures ;;
;;;;;;;;;;;;;;;;;;;;;;;;;

to-report rewardFunc1  ;; fixed reward if in cluster, otherwise penalty
  let r penalty
  if in-cluster = true
    [ set r reward ]
  set reward-list lput r reward-list
  report r
end 

to-report rewardFunc2  ;; monotonic reward based on ticks-in-cluster
  set reward-list lput ticks-in-cluster reward-list
  report ticks-in-cluster
end 

to-report rewardFunc3  ;; reward and penalty based on ticks-in-cluster
  let rew 0
  if (ticks > 0)
    [ set rew
        ((ticks-in-cluster / ticks-per-episode) * reward)
        +
        (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * penalty)
      set reward-list lput rew reward-list ]
  report rew
end 

to-report rewardFunc4  ;; reward based on both ticks-in-cluster and cluster size, penalty based on ticks-in-cluster
  let rew cluster
  if (ticks > 0)
    [ set rew
        ((ticks-in-cluster / ticks-per-episode) * (cluster / cluster-threshold) * reward)
        +
        (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * penalty)
      set reward-list lput rew reward-list ]
  report rew
end 

to-report rewardFunc5  ;; additive variation of rewardFunc4
  let rew cluster
  if (ticks > 0)
    [ set rew
        ((ticks-in-cluster / ticks-per-episode) * reward)
        +
        ((cluster / cluster-threshold) * reward)
        +
        (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * penalty)
      set reward-list lput rew reward-list ]
  report rew
end 

to-report rewardFunc6  ;; variation of rewardFunc5: more 'weight' to cluster size, less 'weight' to penalty
  let rew cluster
  if (ticks > 0)
    [ set rew
        ((ticks-in-cluster / ticks-per-episode) * reward)
        +
        ((cluster / cluster-threshold) * reward ^ 2)
        +
        ((ticks-per-episode - ticks-in-cluster) * (penalty / 10))
      set reward-list lput rew reward-list ]
  report rew
end 

to-report rewardFunc7  ;; no ticks-in-cluster
  let rew cluster
  if (ticks > 0)
    [ set rew
        ((cluster ^ 2 / cluster-threshold) * reward)
        +
        (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * penalty)
      set reward-list lput rew reward-list ]
  report rew
end 

to-report rewardFunc8  ;; variation of rewardFunc6: ratio of ticks not in cluster, instead of absolute difference
  let rew cluster
  ;if (ticks > 0)
    ;[
    set rew
        ((ticks-in-cluster / ticks-per-episode) * reward)
        +
        ((cluster / cluster-threshold) * (reward ^ 2))
        +
        (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * penalty)
      set reward-list lput rew reward-list
   ;]
  report rew
end 

to-report rewardFunc9  ;; variation of rewardFunc8: ratio of ticks in cluster give reward only if cluster of correct size
  let rew cluster
  ;if (ticks > 0)
    ;[
    set rew
        ((ticks-in-cluster / ticks-per-episode) * (cluster / cluster-threshold) * reward)
        +
        ((cluster / cluster-threshold) * (reward ^ 2))
        +
        (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * penalty)
      set reward-list lput rew reward-list
   ;]
  report rew
end 

to-report scatter01  ;; incentivise scattering, not clustering! (essentially, the contrary of rewardFunc8)
  let rew cluster
  ;if (ticks > 0)
    ;[
    set rew
        ((ticks-in-cluster / ticks-per-episode) * penalty)
        +
        ((cluster / cluster-threshold) * (0 - (penalty ^ 2)))
        +
        (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * reward)
      set reward-list lput rew reward-list
   ;]
  report rew
end 

to-report scatter02  ;; added distance, std dev
  let rew 0
  ;if (ticks > 0)
    ;[
    set rew
        ((ticks-in-cluster / ticks-per-episode) * penalty)
        +
        ((cluster / cluster-threshold) * penalty)
        +
        (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * (reward))
        +
        (reward) / precision standard-deviation distance-vector 2
      set reward-list lput rew reward-list
   ;]
  report rew
end 

to-report scatter03  ;; added distance, min
  let rew 0
  ;if (ticks > 0)
    ;[
    set rew
        ((ticks-in-cluster / ticks-per-episode) * penalty)
        +
        ((cluster / cluster-threshold) * penalty)
        +
        (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * (reward))
        +
        (reward) * precision min distance-vector 2
      set reward-list lput rew reward-list
   ;]
  report rew
end 

to-report adaptive01
  ifelse episode < (episodes / 2)
    [ report rewardFunc8 ]
    [ report scatter01 ]
end 

to-report isEndState
  ;if is-there-cluster = true [
  if (((ticks + 1) mod ticks-per-episode) = 0) [
    ;set is-there-cluster false
    if switch-reward and episode = round (episodes / 2)
    [
      qlearningextension:action-selection "e-greedy" [0.9 0.999]
      print "Switched reward function!"
    ]
    report true
  ]
  report false
end 

to resetEpisode
  let avg-rew avg? reward-list
  set g-reward-list lput avg-rew g-reward-list
  if length reward-list > 0
    [ set g-std-reward-list lput precision standard-deviation reward-list 2 g-std-reward-list
      set g-min-reward-list lput precision min reward-list 2 g-min-reward-list
      set g-max-reward-list lput precision max reward-list 2 g-max-reward-list ]

  ;set-current-plot-pen (word who)
  ;plot avg-rew

  set reward-list []
  set ticks-in-cluster 0
  set distance-vector []
  ;ask patch-here [ set chemical 0 ]
  ;ask [neighbors] of patch-here [ set chemical 0 ]

  setxy random-xcor random-ycor
end 

;;;;;;;;;;;;;;;;
;; RL actions ;;
;;;;;;;;;;;;;;;;

to move-toward-cluster  ;; turtle procedure
  ;if breed = Learners
    set last-action "move-toward-cluster"
  let ahead count-from-me look-ahead 0
  let myright count-from-me look-ahead 1
  let myleft count-from-me look-ahead -1
  ifelse (myright >= ahead) and (myright >= myleft)
  [ rt sniff-angle ]
  [ if myleft >= ahead
    [ lt sniff-angle ] ]
  fd 1
end 

to move-toward-chemical  ;; turtle procedure
  ;if breed = Learners
    set last-action "move-toward-chemical"
  ;; examine the patch ahead of you and two nearby patches;
  ;; turn in the direction of greatest chemical
  let ahead [chemical] of patch-ahead look-ahead
  let myright [chemical] of patch-right-and-ahead sniff-angle look-ahead
  let myleft [chemical] of patch-left-and-ahead sniff-angle look-ahead
  ifelse (myright >= ahead) and (myright >= myleft)
  [ rt sniff-angle ]
  [ if myleft >= ahead
    [ lt sniff-angle ] ]
  fd 1                    ;; default don't turn
end 

to move-away-chemical  ;; turtle procedure
  ;if breed = Learners
    set last-action "move-away-chemical"
  ;; examine the patch ahead of you and two nearby patches;
  ;; turn in the direction of greatest chemical
  let ahead [chemical] of patch-ahead look-ahead
  let myright [chemical] of patch-right-and-ahead sniff-angle look-ahead
  let myleft [chemical] of patch-left-and-ahead sniff-angle look-ahead
  ifelse (myright >= ahead) and (myright >= myleft)
  [ lt sniff-angle ]
  [ ifelse myleft >= ahead
    [ rt sniff-angle ]
    [ lt 180 ] ]
  fd 1                    ;; default don't turn
end 

to random-walk  ;; turtle procedure
  ;if breed = Learners
    set last-action "random-walk"
  ifelse (random-float 1) > 0.5
    [ rt random-float wiggle-angle ]
    [ lt random-float wiggle-angle ]
  fd 1
end 

to drop-chemical  ;; turtle procedure
  ;if breed = Learners
    set last-action "drop-chemical"
  set chemical chemical + chemical-drop
end 

to dont-drop-chemical  ;; turtle procedure
  ;if breed = Learners
    set last-action "dont-drop-chemical"
end 

to move-and-drop  ;; turtle procedure (can't reuse code due to last-action saving (would compromise tracking of last actions performed!))
  ;if breed = Learners
     set last-action "move-and-drop"
  let ahead [chemical] of patch-ahead look-ahead
  let myright [chemical] of patch-right-and-ahead sniff-angle look-ahead
  let myleft [chemical] of patch-left-and-ahead sniff-angle look-ahead
  ifelse (myright >= ahead) and (myright >= myleft)
  [ rt sniff-angle ]
  [ if myleft >= ahead
    [ lt sniff-angle ] ]
  fd 1                    ;; default don't turn
  set chemical chemical + chemical-drop
end 

to away-and-drop  ;; turtle procedure (can't reuse code due to last-action saving (would compromise tracking of last actions performed!))
  ;if breed = Learners
     set last-action "away-and-drop"
  let ahead [chemical] of patch-ahead look-ahead
  let myright [chemical] of patch-right-and-ahead sniff-angle look-ahead
  let myleft [chemical] of patch-left-and-ahead sniff-angle look-ahead
  ifelse (myright >= ahead) and (myright >= myleft)
  [ lt sniff-angle ]
  [ ifelse myleft >= ahead
    [ rt sniff-angle ]
    [ lt 180 ] ]
  fd 1                    ;; default don't turn
  set chemical chemical + chemical-drop
end 

to walk-and-drop  ;; turtle procedure (can't reuse code due to last-action saving (would compromise tracking of last actions performed!))
  ;if breed = Learners
     set last-action "walk-and-drop"
  ifelse (random-float 1) > 0.5
    [ rt random-float wiggle-angle ]
    [ lt random-float wiggle-angle ]
  fd 1
  set chemical chemical + chemical-drop
end 

to stand-still
  ;if breed = Learners
    set last-action "stand-still"
  ;ifelse (random-float 1) > 0.5
    ;[ rt random-float wiggle-angle ]
    ;[ lt random-float wiggle-angle ]
end 

;;;;;;;;;;;;;;;;;;;;;
;; RL observations ;;
;;;;;;;;;;;;;;;;;;;;;

to-report count-from-me [howfar direction]
  let counter 0
  let candidates turtles-here
  while [counter < howfar]
  [ set counter counter + 1
    if direction = 0
      [ set candidates (turtle-set candidates turtles-on patch-ahead counter) ]
    if direction = 1
      [ set candidates (turtle-set candidates turtles-on patch-right-and-ahead sniff-angle counter) ]
    if direction = -1
      [ set candidates (turtle-set candidates turtles-on patch-left-and-ahead sniff-angle counter) ]
  ]
  report count candidates
end 

to-report face-cluster-gradient  ;; turtle procedure
  ;; examine the patch ahead of you and two nearby patches;
  ;; turn in the direction of greatest chemical
  let ahead count-from-me look-ahead 0
  let myright count-from-me look-ahead 1
  let myleft count-from-me look-ahead -1
  ifelse (myright >= ahead) and (myright >= myleft)
  [ rt sniff-angle ]
  [ if myleft >= ahead
    [ lt sniff-angle ] ]
  report patch-ahead look-ahead
end 

to-report face-chem-gradient  ;; turtle procedure
  ;; examine the patch ahead of you and two nearby patches;
  ;; turn in the direction of greatest chemical
  let ahead [chemical] of patch-ahead look-ahead
  let myright [chemical] of patch-right-and-ahead sniff-angle look-ahead
  let myleft [chemical] of patch-left-and-ahead sniff-angle look-ahead
  ifelse (myright >= ahead) and (myright >= myleft)
  [ rt sniff-angle ]
  [ if myleft >= ahead
    [ lt sniff-angle ] ]
  report patch-ahead look-ahead
end 

to check-cluster  ;; turtle procedure
  set cluster count turtles in-radius cluster-radius
  ifelse cluster >= cluster-threshold
    [ set in-cluster true
      set is-there-cluster true
      set ticks-in-cluster ticks-in-cluster + 1 ]
    [ set in-cluster false ]
end 

;;;;;;;;;;;;;;;;;;;;;
;; SHOW procedures ;;
;;;;;;;;;;;;;;;;;;;;;

;to setup-individual-plot
;  set-current-plot "Average cluster size in # of turtles within cluster-radius"
;  create-temporary-plot-pen (word who)
;  let p-color scale-color one-of base-colors who 0 count turtles
;  set-plot-pen-color p-color
;end

to setup-global-plot [p-name pen-name pen-color]
  set-current-plot p-name
  create-temporary-plot-pen pen-name
  set-plot-pen-color pen-color
end 

;to plot-individual
;  set-current-plot "Average cluster size in # of turtles within cluster-radius"
;  set-current-plot-pen (word who)
;  plot cluster
;end

to plot-global [p-name pen-name what]
  set-current-plot p-name
  set-current-plot-pen pen-name
  plot what
end 

to log-ticks [msg what]
  if (((ticks + 1) mod print-every) = 0)
    [ type "t" type ticks type ") " type msg print what ]
end 

to log-episodes [msg what]
  type "E" type episode type ") " type msg print what
end 

;;;;;;;;;;;;;;;;;;;;;
;; HELP procedures ;;
;;;;;;;;;;;;;;;;;;;;;

;to l-check-cluster
;  set l-cluster count turtles in-radius cluster-radius
;  set l-cluster l-cluster + count Learners in-radius cluster-radius
;  ifelse l-cluster > cluster-threshold
;    [ set l-in-cluster true
;      set is-there-cluster true
;      set l-ticks-in-cluster l-ticks-in-cluster + 1 ]
;    [ set l-in-cluster false ]
;end

to-report avg? [collection]
  let summ 0
  let lengthh 0
  foreach collection [ i ->
    set summ summ + i
    set lengthh lengthh + 1
  ]
  if lengthh > 0
    [ report summ / lengthh ]
  report 0
end 

to-report avg-cluster?
  let c-sum 0
  let c-length 0
  foreach sort turtles [ t ->
    ;if ([cluster] of t) > cluster-threshold
      ;[
        set c-sum c-sum + ([cluster] of t)
        set c-length c-length + 1
      ;]
  ]
  let c-avg 0
  if not (c-length = 0)
    [ set c-avg c-sum / c-length ]
  report c-avg
end 

to setup-action-distribution-table [collection]
  set action-distribution table:make
  foreach collection [ c ->
    table:put action-distribution c 0
  ]
end 

to setup-turtle-distribution-table [agentset]
  set turtle-distribution table:make
  foreach sort agentset [ c ->
    ;type [who] of c type " "
    let turtle-action-distribution table:make
    foreach actions [ a ->
      table:put turtle-action-distribution a 0
    ]
    table:put turtle-distribution [who] of c turtle-action-distribution
  ]
end 

to print-actions [collection sep]
  ;foreach but-last collection [ c ->
  foreach collection [ c ->
    file-type c file-type sep
  ]
  ;file-type last collection
  ;file-print ""
end 

to print-turtle-actions [turtleL actionsL sep]
  foreach but-last turtleL [ t ->
    foreach actionsL [ a ->
      file-type t file-type "-" file-type a file-type sep
    ]
  ]
  foreach but-last actionsL [ a ->
    file-type last turtleL file-type "-" file-type a file-type sep
  ]
  file-type last turtleL file-type "-" file-type last actionsL
  file-print ""
end 

to print-table [tab sep]
  ;foreach but-last table:keys tab [ k ->
  foreach table:keys tab [ k ->
    file-type table:get tab k file-type sep
  ]
  ;file-type table:get tab last table:keys tab
  ;file-print ""
end 

to print-table-table [tabtab sep]
  foreach but-last table:keys tabtab [ t-who ->
    foreach table:keys table:get tabtab t-who [ t-a ->
      file-type table:get table:get tabtab t-who t-a file-type sep
    ]
  ]
  foreach but-last table:keys table:get tabtab last table:keys tabtab [ t-a ->
    file-type table:get table:get tabtab last table:keys tabtab t-a file-type sep
  ]
  file-type table:get table:get tabtab last table:keys tabtab last table:keys table:get tabtab last table:keys tabtab
  file-print ""
end 

;;;;;;;;;;;;;;;;;;;;;;;;
;; LOGGING procedures ;;
;;;;;;;;;;;;;;;;;;;;;;;;

to log-params  ;; NB explicitly modify lines "e-greedy", "OBSERVATION SPACE", and "REWARD" (everything else is logged automatically)
  file-print "--------------------------------------------------------------------------------"
  file-type "TIMESTAMP: " file-print date-and-time
  file-print "PARAMS:"
  file-type "  population " file-print population
  file-type "  wiggle-angle " file-print wiggle-angle
  file-type "  look-ahead " file-print look-ahead
  file-type "  sniff-threshold " file-print sniff-threshold
  file-type "  sniff-angle " file-print sniff-angle
  file-type "  chemical-drop " file-print chemical-drop
  file-type "  diffuse-share " file-print diffuse-share
  file-type "  evaporation-rate " file-print evaporation-rate
  file-type "  cluster-threshold " file-print cluster-threshold
  file-type "  cluster-radius " file-print cluster-radius
  file-type "  learning-turtles " file-print learning-turtles
  file-type "  ticks-per-episode " file-print ticks-per-episode
  file-type "  episodes " file-print episodes
  file-type "  learning-rate " file-print learning-rate
  file-type "  discount-factor " file-print discount-factor
  file-type "  reward " file-print reward
  file-type "  penalty " file-print penalty
  file-type "  e-greedy " file-type 0.9 file-type " " file-type 0.9985 file-print ""                                     ;; NB: CHANGE ACCORDING TO ACTUAL CODE!
  file-type "ACTION SPACE: "
  print-actions actions " " file-print ""
  ;file-type "OBSERVATION SPACE: " file-type "cluster-gradient " file-print "in-cluster"
  ;file-type "OBSERVATION SPACE: " file-type "chemical-gradient " file-print "in-cluster"                                  ;; NB: CHANGE ACCORDING TO ACTUAL CODE!
  file-type "OBSERVATION SPACE: " file-print "chemical-gradient "
  file-type "REWARD: " file-print "rewardFunc8"                                                                       ;; NB: CHANGE ACCORDING TO ACTUAL CODE!
  file-print "--------------------------------------------------------------------------------"
  ;;        Episode,                         Tick,                          Avg cluster size X tick,       Avg reward X episode,
  file-type "Episode, " file-type "Tick, " file-type "First cluster tick, " file-type "Avg cluster size X tick, " file-type "Avg reward X episode, " file-type "Std dev reward X episode, " file-type "Min reward X episode, " file-type "Max reward X episode, " file-type "Avg distance, " file-type "Std dev distance, " file-type "Min distance, " file-type "Max distance, "
  ;; Actions distribution until tick (how many turtles choose each available action)
  print-actions actions ", "
  print-turtle-actions sort Learners actions ", "
end 

to log-params-nolearn  ;; NB explicitly modify lines "e-greedy", "OBSERVATION SPACE", and "REWARD" (everything else is logged automatically)
  file-print "--------------------------------------------------------------------------------"
  file-type "TIMESTAMP: " file-print date-and-time
  file-print "PARAMS:"
  file-type "  population " file-print population
  file-type "  wiggle-angle " file-print wiggle-angle
  file-type "  look-ahead " file-print look-ahead
  file-type "  sniff-threshold " file-print sniff-threshold
  file-type "  sniff-angle " file-print sniff-angle
  file-type "  chemical-drop " file-print chemical-drop
  file-type "  diffuse-share " file-print diffuse-share
  file-type "  evaporation-rate " file-print evaporation-rate
  file-type "  cluster-threshold " file-print cluster-threshold
  file-type "  cluster-radius " file-print cluster-radius
  file-type "  learning-turtles " file-print learning-turtles
  file-type "  ticks-per-episode " file-print ticks-per-episode
  file-type "  episodes " file-print episodes
  ;file-type "  learning-rate " file-print learning-rate
  ;file-type "  discount-factor " file-print discount-factor
  ;file-type "  reward " file-print reward
  ;file-type "  penalty " file-print penalty
  ;file-type "  e-greedy " file-type 0.9 file-type " " file-type 0.999 file-print ""                                     ;; NB: CHANGE ACCORDING TO ACTUAL CODE!
  file-type "ACTION SPACE: "
  print-actions actions " " file-print ""
  ;file-type "OBSERVATION SPACE: " file-type "cluster-gradient " file-print "in-cluster"
  ;file-type "OBSERVATION SPACE: " file-type "chemical-gradient " file-print "in-cluster"                                  ;; NB: CHANGE ACCORDING TO ACTUAL CODE!
  ;file-type "OBSERVATION SPACE: " file-print "chemical-gradient "
  ;file-type "REWARD: " file-print "rewardFunc8"                                                                       ;; NB: CHANGE ACCORDING TO ACTUAL CODE!
  file-print "--------------------------------------------------------------------------------"
  ;;        Episode,                         Tick,                          Avg cluster size X tick,       Avg reward X episode,     Actions distribution until tick (how many turtles choose each available action)
  file-type "Episode, " file-type "Tick, " file-type "First cluster tick, " file-type "Avg cluster size X tick, " file-type "Avg distance, " file-type "Std dev distance, " file-type "Min distance, " file-type "Max distance, "
  print-actions actions ", "
  print-turtle-actions sort turtles actions ", "
end 

; Copyright 2022 Stefano Mariani

There are 4 versions of this model.

Uploaded by When Description Download
Stefano Mariani about 1 month ago Updates for iccci 2024 submission Download this version
Stefano Mariani 6 months ago Scattering and adaptive experiments added Download this version
Stefano Mariani 11 months ago Several improvements, amongst which logging, new reward structures Download this version
Stefano Mariani almost 2 years ago Initial upload Download this version

Attached files

No files

This model does not have any ancestors.

This model does not have any descendants.