Home > Doc > A Behavioral Approach to learning in Economics > Operationalization of Learning

A Behavioral Approach to learning in Economics

Operationalization of Learning

In order to (experimentally) test for the effects of the proposed determinants of learning on behavior, the underlying concept of learning must be operationalized. Recall that, by definition, learning is an enduring change in behavior, or in the capacity to behave in a given fashion, which results from practice or other forms of experience.

Thus, if learning occurs, its effects must be detectable in terms of changes in behavior or in the capacity to behave. This raises the question how these changes can be conceptually captured or “measured”. It seems natural – and follows from the model of a learning loop – to assume that learning tends to improve behavior in terms of outcomes at least in the long run in most cases. In this view, “better” conditions for learning are assumed to lead to improvements of actions in terms of “better” outcomes.[70]

The latter may be measured in various ways depending on the specific situation. For example, changes in outcomes may be gauged by the sum of received gains, incomes or payoffs over time (similar to the concept used to measure the “winner’s curse”; see KAGEL, 1995), or in terms of quantities (consumed, produced, exchanged, etc.) in an experiment.

Another question concerns the goal and the content of learning itself. In accord with the above operationalization – and with economic theory more generally –, it appears sound to assume that the implicit goal of learning is to maximize individual rewards (or payoffs), and the content of learning, therefore, is to behave (or play) optimally with respect to (i) the given situation (or game) and (ii) to the behavior (or play) of others involved in that situation (or game) in order to achieve that goal. Note that in an interactive setting where the outcome for a single individual (economic actor, player) is not only the result of his or her own behavior but is affected by the behavior of other individuals, this individual has not only to learn to behave optimally with respect to the given situation but also with respect to the (dynamic) behavior of others.

Therefore, learning to behave optimally does not imply a fixed or specific behavior with respect to a given situation as a result of the learning process. Hence, one cannot conclude from the observation of a certain type of behavior (say, cooperation) whether learning has taken place. It is, to the contrary, a characteristic of learning that the behavioral repertoire is enlarged, and behavior therefore becomes more flexible and disperse. For example, in the repeated Prisoner’s Dilemma game, learning may induce playing a cooperative strategy with some opponents, but a defective strategy with other opponents in order to maximize individual payoffs.

Thus, when we observe a player that has continuously cooperated in the repeated Prisoner’s Dilemma, thereby collecting a higher overall payoff than some other player who failed to cooperate with his opponent in the same type of game, we cannot conclude from the differences in payoffs that one player has learned and the other has not. The defective player may simply have been paired with a defective player or a player that was trying to exploit him so that he himself learned to defect (see HAUK, 1997, for supporting experimental evidence).

We can, however, observe many players under different learning conditions and look for systematic differences in outcomes or payoffs. If we find such systematic differences by comparing behavioral outcomes under varying learning conditions, we have evidence for or against the contingent learning hypotheses as described in the next section. This comparativestatic method allows to analyze the effects of learning conditions in a systematic way, but it is a priori not clear which conditions yield which outcomes in which situations or games.

To establish this evidence, controlled experiments are needed, and it may very well be that some hypotheses are falsified in some cases since they are deduced mainly from behavior observed in individual decision making tasks, not in interactive tasks or games. Despite the difficulty of defining the content of learning and operationalizing its effect on observable outcomes in general terms, learning may generally be assumed to yield a best response behavior that maximizes individual outcomes (e.g, payoffs) with respect to the situation and the interaction with others.[71] Moreover, the operationalization of learning, as proposed above, may require different experimental methods depending on the type of situation or game under investigation.

Whereas the content of learning is relatively straightforward in games of pure cooperation or coordination (namely, learning to coordinate or cooperate), in some games it is more difficult to operationalize the effects of learning on outcomes. A typical example involves zero-sum games where not all of the players can increase their payoffs simultaneously, even if they all learn. Generally, there exist two main methods to capture the effects of learning – and the effects of changes in learning determinants.

The traditional way is to look at the convergence of behavior to some equilibrium or stable behavioral pattern over time. The effects of learning processes (on outcomes) and learning determinants (on learning processes) may then be assessed in terms of the characteristics and robustness of the equilibrium selected by the learning process,[72] and by the speed of convergence. A possible assumption for the operationalization would be that behavior converges faster and is more stable under favorable learning conditions than under unfavorable conditions. The advantage of this method is that it is conceptionally lucid, though no straightforward criteria exist to determine when behavior can be assumed to have settled.

Also, normative implications are not easy to deduce. An additional way to test learning effects is to vary learning determinants asymmetrically, so that some players in a zero-sum game face learning conditions that are assumed to be favorable for learning while other players face conditions that are thought to hinder learning. The general prediction is that players facing unfavorable learning conditions tend to perform relatively worse than their opponents in terms of accumulated payouts in a comparative-static analysis.

The effects of most contingent learning determinants (e.g., structural information, uncertainty, feedback) can be tested in zero-sum games if introduced asymmetrically, whereas some cannot (e.g., the number of actors). The advantage of this method is that it directly maps learning conditions to the outcomes of individual behavior.


70 Note that changes in the capacity to behave – as included in the above definition of learning – cannot be measured with this operationalization. This may not be a problem though, because to economic theory learning and its determinants are not interesting per se, but it may be important only in terms of (aggregate) outcomes.

71 Note that this formulation omits any notion of intrinsic value of behavior itself, e.g., the fun of playing a game or being part of a situation/experiment, as can sometimes be observed in experiments.

72 Equilibria may be characterized by game theoretic concepts or by the Pareto-criterion (where possible).

Prof. Tilman Slembeck

Next: Hypotheses about Determinants of Learning Processes

Summary: Index