Revisiting Log-Linear Learning:
Asynchrony, Completeness and a Payoff-based Implementation
Abstract:
Log-linear learning is a learning algorithm with equilibrium
selection properties. In potential games, log-linear learning
provides guarantees on the percentage of time that the joint
action profile will be at a potential maximizer. The traditional
analysis of log-linear learning has centered around explicitly
computing the stationary distribution. This analysis relied on a
highly structured setting: i) players' utility functions
constitute an exact potential game, ii) players update their
strategies one at a time, which we refer to as asynchrony, iii)
at any stage, a player can select any action in the action set,
which we refer to as completeness, and iv) each player is
endowed with the ability to assess the utility he would have
received for any alternative action provided that the actions of
all other players remain fixed. Since the appeal of log-linear
learning is not solely the explicit form of the stationary
distribution, we seek to address to what degree one can relax
the structural assumptions while maintaining that only potential
function maximizers are the stochastically stable action
profiles. In this talk, we introduce slight variants of
log-linear learning to include both synchronous updates and
incomplete action sets. In both settings, we prove that only
potential function maximizers are stochastically stable.
Furthermore, we introduce a payoff-based version of log-linear
learning, in which players are only aware of the utility they
received and the action that they played. Note that log-linear
learning in its original form is not a payoff-based learning
algorithm. In payoff-based log-linear learning, we also prove
that only potential maximizers are stochastically stable. The
key enabler for these results is to change the focus of the
analysis away from deriving the explicit form of the stationary
distribution of the learning process towards characterizing the
stochastically stable states. The resulting analysis uses the
theory of resistance trees for regular perturbed Markov decision
processes, thereby allowing a relaxation of the aforementioned
structural assumptions.
Paper:
Click to view.
Biography: Jason Marden's research interest is game
theoretic methods for feedback control of distributed
multi-agent systems. He received a BS in Mechanical Engineering
in 2001 from UCLA, and a PhD in Mechanical Engineering in 2007,
also from UCLA, under the supervision of Jeff S. Shamma. Since
2007, he has been a junior fellow in the Social and Information
Sciences Laboratory at the California Institute of Technology.
|