Section: New Results
Learning
This section describes three contributions on machine learning.

In [12], we examine the convergence of noregret learning in games with continuous action sets. For concreteness, we focus on learning via "dual averaging", a widely used class of noregret learning schemes where players take small steps along their individual payoff gradients and then "mirror" the output back to their action sets. In terms of feedback, we assume that players can only estimate their payoff gradients up to a zeromean error with bounded variance. To study the convergence of the induced sequence of play, we introduce the notion of variational stability, and we show that stable equilibria are locally attracting with high probability whereas globally stable equilibria are globally attracting with probability 1. We also discuss some applications to mixedstrategy learning in finite games, and we provide explicit estimates of the method's convergence speed.

Resource allocation games such as the famous Colonel Blotto (CB) and HideandSeek (HS) games are often used to model a large variety of practical problems, but only in their oneshot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently learn in these games. In this work, we show that the online CB and HS games can be cast as path planning problems with sideobservations (SOPPP): at each stage, a learner chooses a path on a directed acyclic graph and suffers the sum of losses that are adversarially assigned to the corresponding edges; and she then receives semibandit feedback with sideobservations (i.e., she observes the losses on the chosen edges plus some others). We propose a novel algorithm, EXP3OE, the firstofitskind with guaranteed efficient running time for SOPPP without requiring any auxiliary oracle. We provide an expectedregret bound of EXP3OE in SOPPP matching the order of the best benchmark in the literature. Moreover, we introduce additional assumptions on the observability model under which we can further improve the regret bounds of EXP3OE. We illustrate the benefit of using EXP3OE in SOPPP by applying it to the online CB and HS games.
This contribution appeared in [29], [49]. In an earlier article [38], we also looked at the sequential Colonel Blotto game under bandit feedback and we proposed a blackbox optimization based method to optimize the exploration distribution of the classical ComBand algorithm.

In [32], we study nonzerosum hypothesis testing games that arise in the context of adversarial classification, in both the Bayesian as well as the NeymanPearson frameworks. We first show that these games admit mixed strategy Nash equilibria, and then we examine some interesting concentration phenomena of these equilibria. Our main results are on the exponential rates of convergence of classification errors at equilibrium, which are analogous to the wellknown ChernoffStein lemma and Chernoff information that describe the error exponents in the classical binary hypothesis testing problem, but with parameters derived from the adversarial model. The results are validated through numerical experiments.