The sample complexities we have been talking about give us an idea of the time it would take for the learning algorithm to learn a fixed function. However, in a lot of MASs the function that the agents are trying to learn is constantly changing, mostly because other agents are also learning and changing their behaviors. Our results still apply because an agent that takes longer to learn a fixed function will also take longer to learn a changing function, since this problem is broken down to the problem of learning different fixed functions over time. Still, we wish to know more about the relative effectiveness of learning algorithms with different sample complexities, when learning target functions that change at different rates.
The first thing we need to do is to characterize the rate at which the learning algorithm learns, and the rate at which the target function changes. We do this using a differential equation that tells us what the error (as defined in Equation 1) of the model will be at time t+1 given the error at time t.
The learning algorithm is trying to learn the function f, where is merely one of the possible models that the agent can have. If this function has an error of 1 it means that none of its mappings is correct (i.e. it takes the wrong action in every state). An agent with such a function at time t, will observe the next action and will definitely learn something since all its mappings are incorrect. Its error at time t+1 will drop accordingly. But, as it learns more and more, the probability that the next action it observes is not one that it already knows keeps decreasing and, therefore, its error at time t+1 keeps getting closer and closer to its error at time t. We model this behavior by assuming that the error el of a learning algorithm l at time t+1 is given by
where is the slope of the learning curve. can be determined from the sample complexity and the learning algorithm used. In general, it is safe to say that bigger sample complexities will lead to bigger values. If then the algorithm never learns anything, if it learns everything from just one example. We have plotted such a function in Figure 1 for .
In a similar way, the moving target will introduce an error to the
current model at each time. If the current error is 1 then no
amount of moving will make it bigger (by definition). While, as the
error gets closer to 0, the moving target function is expected to
introduce larger errors. We can characterize the error function
em for the moving target M with the differential equation:
v is where the line intersects the x-axis and the slope is (1-v). We have also shown this curve in Figure 1, for v = .1. It is easy to see that if v=0 then the target function is always fixed, while if v=1 it is moving in the worst possible way. We use v as a measure of the velocity at which the target function moves away from the learned model.
In Figure 1 we trace the successive errors for a
model given the two curves, assuming that the curves have been defined
using the same time scales. We see that the error will converge to
some point or, rather, a pair of points. These points can be determined
by combining Equation 3 and 4 into one error
The solution such that e(t+1) = e(t) is
Figure: Plot of the errors for the Learning function el and the Moving Target function em for , and v=.1.
Equation 6 gives us the minimum error that we expect given
el and em. It corresponds to the error right after the agent
has tried to learn the target function. Equation 7 gives the
error right after the target moves. We now have the necessary math to
examine the differences in the learning and the moving target error
functions. Lets say we have two agents with different sample
complexities and, therefore, different learning rates and
. Using Equation 7 we can
calculate the difference in their expected maximum errors.
This is a rather complicated equation so we have plotted it in Figure 2 for different values of and . We notice that, for small values of v, increases linearly as a function of v, but it quickly reaches a maximum and starts to decrease. This means that the difference in the maximum expected error between the agent with the lower sample complexity (i.e. smaller ), and the one with higher sample complexity, will increase as a function of v for small values of v, and will decrease for bigger values. Still, we notice that is always a positive value, and is only 0 for v=0 and v=1, an unlikely situation.
Figure: Difference in the error of an agent with high sample complexity () minus one with low sample complexity (), as given by Equation 8, plotted as a function of v.
Proof Equation 8 shows that for all legal values of , given our assumptions about the learning and moving target error functions. Note that, even thought the lower sample complexities always lead to smaller errors, only for the smaller values of v do we find a correlation between increased v and increased difference in error .
This theorem leads us to look for metrics (e.g. price volatility in a market system) that can be used to determine how fast the system changes (i.e. the v) and, in turn, how much better the models of the agents with smaller sample complexity will be. For example, the designer of a 1-level agent in a market system might, after some time has passed, notice that the price volatility in the system is close to zero. He should then, using Theorem 3, consider making his agent a 0-level modeler.