Revisiting Log-Linear Learning: Asynchrony, Completeness and a Payoff-based Implementation

Abstract: Log-linear learning is a learning algorithm with equilibrium selection properties. In potential games, log-linear learning provides guarantees on the percentage of time that the joint action profile will be at a potential maximizer. The traditional analysis of log-linear learning has centered around explicitly computing the stationary distribution. This analysis relied on a highly structured setting: i) players' utility functions constitute an exact potential game, ii) players update their strategies one at a time, which we refer to as asynchrony, iii) at any stage, a player can select any action in the action set, which we refer to as completeness, and iv) each player is endowed with the ability to assess the utility he would have received for any alternative action provided that the actions of all other players remain fixed. Since the appeal of log-linear learning is not solely the explicit form of the stationary distribution, we seek to address to what degree one can relax the structural assumptions while maintaining that only potential function maximizers are the stochastically stable action profiles. In this talk, we introduce slight variants of log-linear learning to include both synchronous updates and incomplete action sets. In both settings, we prove that only potential function maximizers are stochastically stable. Furthermore, we introduce a payoff-based version of log-linear learning, in which players are only aware of the utility they received and the action that they played. Note that log-linear learning in its original form is not a payoff-based learning algorithm. In payoff-based log-linear learning, we also prove that only potential maximizers are stochastically stable. The key enabler for these results is to change the focus of the analysis away from deriving the explicit form of the stationary distribution of the learning process towards characterizing the stochastically stable states. The resulting analysis uses the theory of resistance trees for regular perturbed Markov decision processes, thereby allowing a relaxation of the aforementioned structural assumptions.

Paper: Click to view.

Biography: Jason Marden's research interest is game theoretic methods for feedback control of distributed multi-agent systems. He received a BS in Mechanical Engineering in 2001 from UCLA, and a PhD in Mechanical Engineering in 2007, also from UCLA, under the supervision of Jeff S. Shamma.  Since 2007, he has been a junior fellow in the Social and Information Sciences Laboratory at the California Institute of Technology.


Jason Marden