In the game theoretic literature (where z is qualified by some solution concept, e.g., the Nash equilibrium), several other mechanisms are introduced in addition to Bayesian learning to provide a better foundation for equilibrium theory, especially in settings with non-equilibrium expectations.[20] These models also involve various assumptions about the players’ rationality, knowledge and beliefs, and about the ways players interact.[21]
While one type of games models the “learning” behavior of individual players, another type focuses on the aggregate behavior of a population (or its subgroups). To illustrate the involved concepts of learning, for each class a common adjustment mechanism will be presented below. a) In individual level approaches, a common way to model learning is a quasi-Bayesian updating mechanism called “fictitious play” where players are assumed to behave as if they think they are facing an exogenous, stationary, unknown distribution of opponents’ strategies.[22] Hence, learning is modeled as statistical updating of information about the frequency that each strategy is played, starting from exogenous prior beliefs about the distribution of opponents’ strategies.
The process is backward oriented, i.e., players are anticipative only in a rather limited statistical sense, not in the sense that they think in terms of the effects of their own actions on their opponents’ play, and, furthermore, players do not try to influence the future play of their opponents.[23] Since players only keep track of data about the frequency of opponents’ play, they ignore data on conditional probabilities, and therefore may not recognize the emergence of cycles. Hence, the notion of human learning employed in this literature is relatively simple and myopic, and behavior is surprisingly unstrategic, if not naive. This simplification is often justified by its proponents with the “economically interesting” case of large populations of players where considerations of strategic interaction can be omitted, or may be defended by an “as if” argument in the case of the theory’s empirical success.
b) Aggregate level approaches are usually identified with evolutionary models that deal with the aggregate behavior of populations of players. The analysis typically focuses on the emergence and stability of behavioral strategies under most of the assumptions and conditions mentioned above.[24] Learning is commonly represented by a “replicator dynamic” that was originally motivated by models of biological evolution.[25] The replicator dynamic assumes that a population (or a fraction of it) playing a particular strategy grows in proportion to how well that strategy is doing relative to the mean population payoff.[26] – Since the replicator is only a mechanism of reinforcement of certain strategies, the link to learning processes is established by one of the following stories: One is the idea of “asking around” or “social learning” where players can only learn from other players in the population because they do not remember their own past experience, or because they are periodically replaced so that only new players make choices.
A second story is based on the assumption that players do satisfice rather than optimize, i.e., they try to achieve only a certain aspiration level instead of a maximum. Hence, the payoff monotone dynamic[27] generated by the replicator it thought to imply that players choose only the best strategy “available in their neighborhood” (e.g., they compare an “inherited” strategy with one randomly observed), and do not screen all strategies for the one with maximum payoff.
Both of these stories – intended to motivate the reinforcement mechanism of replicator dynamics as being a good representation of human learning – do not seem to be very satisfying: In the first, individuals are assumed to be able only to copy existing strategies in an automated manner (thereby lacking any ability for cognition, creativity, or even learning from their own experience), and the second story matches more with a “best response behavior under bounded rationality”[28] than with (sophisticated) human learning.
More generally, most models abstract from forms of sophisticated learning in that considerations of repeated play are excluded.[29] Whereas a first level of sophistication may be included in the way that players use statistical procedures to forecast opposing players behavior, the second level where players consider the possibility that their current play may influence the future play of their opponent is mostly omitted by assuming large numbers of (anonymous) players, or simply by assuming that the incentives to alter the future play of the opponents are small enough to be neglected.
– Other models that are not limited to large populations (e.g., KALAI & LEHRER, 1993) seem to hold only under very strong assumptions (e.g., about the agents’ initial beliefs).[30] As BLUME & EASLEY (1995) have pointed out, existing results for models in which the only intertemporal link is learning are delicate, and one may remain rather skeptical when asking what can be learned from the current literature on learning in economics (indeed, Blume and Easley are pessimistic[31]).
20 Although it seems to be widely acknowledged that learning does not necessarily converge to any equilibrium concept (beyond the very weak notion of rationalizability), learning models are thought to suggest useful ways to evaluate and modify the traditional equilibrium concepts in that they lead to refinements of Nash equilibrium and to descriptions of long-run behavior weaker than Nash equilibrium (FUDENBERG & LEVINE, 1997).
21 For a comprehensive introduction to learning in games and an overview of current research see FUDENBERG & LEVINE (1997). A short overview can be found in FUDENBERG & LEVINE (1998).
22 Note that there is no unique fictitious play rule, and there are several variants of fictitious play (like smooth or stochastic fictitious play) that all share the same notion of learning as described below. – Another class of individual level approaches that may be referred to as “stimulus-response” or “reinforcement” models of learning do not use the opponents’ play as the object of updating, but the players’ own payoffs in that strategies that yield higher payoffs become more likely to being played.
23 This is due to the assumption that the distribution of opponents’ strategies is stationary which makes sense only if the system converges (at least in long-run behavior) and involves limited learning abilities by the opponents.
24 In addition to the mentioned assumptions about rationality, information, interaction, etc., and populations may be homogenous or heterogenous. See FUDENBERG & LEVINE (1996) and WEIBULL (1998) for short introductions to learning and evolution in games.
25 A related concept that is also widely used, is the notion of an evolutionary stable strategy, i.e., a strategy that is robust when it is invaded by a small population playing a different strategy.
26 Note that the replicator is based on basically the same idea as the individual level stimulus-response or reinforcement models mentioned above.
27 Following FUDENBERG & LEVINE (1997) a model that generates a “payoff monotone dynamic” is a system in which the growth rates of strategies are ordered by their expected payoff against the current population, so that strategies that yield a payoff above the mean grow faster. There are several ways to formulate such a model, and the replicator is only one particular form of a payoff monotone dynamic in which the rates of growth are proportional to payoff differences with the mean.
28 "Best response” here means that players simply pick the strategy with the higher payoff when observing two (or more) strategies that exist in the current population. Note that this concept of “best response” as it is incorporated in the replicator dynamic itself refers more to a maximizing choice in a static situation than to learning.
29 For a summary of literature on repeated-game play and some experimental evidence see CRAWFORD (1995a, 40), who suggests that existing results raise difficult theoretical issues because significant learning often seems to occur within a single play of the repeated game, whereas standard methods for analyzing learning would assume repeated play of the repeated game, so that theoretical analysis should be combined with plausible restrictions on behavior, in order to understand these phenomena.
30 See KIRMAN & SALMON (1995, 2), and CRAWFORD (1995a, 3), who finds that “theoretical analyses (traditional or adaptive) usually yield definite predictions only under strong assumptions, which are reasonable for some applications but unrealistic and potentially misleading for many others.” [emphasis added]
31 See KIRMAN & SALMON (1995, 2).
Prof. Tilman Slembeck
Next: Experimental Approaches: Tests of Learning Mechanisms
Summary: Index