Revisiting LogLinear Learning:
Asynchrony, Completeness and a Payoffbased Implementation
Abstract:
Loglinear learning is a learning algorithm with equilibrium
selection properties. In potential games, loglinear learning
provides guarantees on the percentage of time that the joint
action proﬁle will be at a potential maximizer. The traditional
analysis of loglinear learning has centered around explicitly
computing the stationary distribution. This analysis relied on a
highly structured setting: i) players' utility functions
constitute an exact potential game, ii) players update their
strategies one at a time, which we refer to as asynchrony, iii)
at any stage, a player can select any action in the action set,
which we refer to as completeness, and iv) each player is
endowed with the ability to assess the utility he would have
received for any alternative action provided that the actions of
all other players remain ﬁxed. Since the appeal of loglinear
learning is not solely the explicit form of the stationary
distribution, we seek to address to what degree one can relax
the structural assumptions while maintaining that only potential
function maximizers are the stochastically stable action
proﬁles. In this talk, we introduce slight variants of
loglinear learning to include both synchronous updates and
incomplete action sets. In both settings, we prove that only
potential function maximizers are stochastically stable.
Furthermore, we introduce a payoffbased version of loglinear
learning, in which players are only aware of the utility they
received and the action that they played. Note that loglinear
learning in its original form is not a payoffbased learning
algorithm. In payoffbased loglinear learning, we also prove
that only potential maximizers are stochastically stable. The
key enabler for these results is to change the focus of the
analysis away from deriving the explicit form of the stationary
distribution of the learning process towards characterizing the
stochastically stable states. The resulting analysis uses the
theory of resistance trees for regular perturbed Markov decision
processes, thereby allowing a relaxation of the aforementioned
structural assumptions.
Paper:
Click to view.
Biography: Jason Marden's research interest is game
theoretic methods for feedback control of distributed
multiagent systems. He received a BS in Mechanical Engineering
in 2001 from UCLA, and a PhD in Mechanical Engineering in 2007,
also from UCLA, under the supervision of Jeff S. Shamma. Since
2007, he has been a junior fellow in the Social and Information
Sciences Laboratory at the California Institute of Technology.
