Bar Experiment

We define the G(S) to be the sum over all time of the the utilities that all the agents received for their actions. That is
- G(S) = SUM_t SUM_k=1..t l_k(x_k(S,t))
where x_k(S,t) is the number of agents that attended on night k at week t and l_k(y) = a_k*y*exp(-y/c) is the utility that is derived from that attendance. This function is maximized when y=c.
Two different choices for a_k were explored. One were attendance on all nights is equally weighted and one were we are only concerned with attendance on one specific night.
Three different reward functions were tested (where d_w is the night selected by w).
- Uniform Division reward UD = l_{d_w}(x_{d_w}(S,t))/x_{d_w}(S,t)
- Global reward GR = SUM_k=1..7l_k(x_k(S,t))
- Wonderful Life reward WL = l_{d_w}(x_{d_w}(S,t)) - l_{d_w}(x_{d_w}(CL_w(S),t))
Each agent is in its own subworld.
The microlearning algorithms used is a basic reinforcement algorithm with Boltzmann stochastic decisions.