Our algorithm starts with the partially expanded root situation, which consists only of the payoff matrix for the agent. ZK strategies are used for the situations of other agents, forming the initial leaves. The algorithm proceeds by expanding, at each step, the leaf node that has the highest expected gain, as long as this gain is greater than K. For testing purposes, we set K=0, but setting K to some small negative number would allow for the expansion of leaves that do not show an immediate gain, but might lead to some gains later on. The function Expected_Strategy(l) takes as input a leaf node situation l and determines the strategy that its expansion, to some number of levels, is expected to return. It does this by calculating the expected value of the corresponding f(x). The strategy is propagated up by the function Propagate_Strategy(t,l), which returns the expected and . These are then used to calculate the expected gain for leaf l. The leaf with the maximum expected gain, if it is greater than K, is expanded, a new strategy propagated, and the whole process is repeated. Otherwise, we stop expanding and return the current .
Figure 4: Limited Rationality RMM (LR-RMM) Algorithm
The algorithm, shown in Figure 4, is started by calling Expand_Situation(t). Before the call, t must be set to the root situation, and to the strategy that t evaluates to. The strategy returned by Expand_Situation could then be stored away so that it can be used, at a later time, by Expected_Strategy. The decision to do this would depend on the particular implementation. One might only wish to remember those strategies that result from expanding a certain number of nodes or more, that is, the ``better'' strategies. In other cases, it could be desirable to remember all previous strategies that correspond to situations that the agent has never experienced before, on the theory that it is better to have some information rather than none. This is the classic time versus memory tradeoff.