Agents with no models must learn everything they know from observations they make about the environment, and from any rewards they get. In our economic society this means that buyers see the bids they receive and the good received after striking a contract, while sellers see the request for bids and the profit they made (if any). In general, agents get some input, take an action, then receive some reward. This is the same basic framework under which most learning mechanism are presented. We decided to use a form of reinforcement learning [11] [15] for implementing this kind of learning in our agents, since it is a simple method and the domain is simple enough for it to do a reasonable job.
Both buyers and sellers will use the equations in the next sections for determining what actions to take. However, with a small probability they will choose to explore, instead of exploit, and will pick their action at random (except for the fact that sellers never bid below cost). The value of is initially 1 but decreases with time to some empirically chosen, fixed minimum value . That is,
where is some annealing factor.