Policy-based optimization: single-step policy gradient method seen as an evolution strategy
- Others:
- Centre de Mise en Forme des Matériaux (CEMEF) ; Mines Paris - PSL (École nationale supérieure des mines de Paris) ; Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)
- Analysis and Control of Unsteady Models for Engineering Sciences (ACUMES) ; Inria Sophia Antipolis - Méditerranée (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- University of Cambridge [UK] (CAM)
Description
This research reports on the recent development of black-box optimization methods based on single-step deep reinforcement learning (DRL) and their conceptual similarity to evolution strategy (ES) techniques. It formally introduces policy-based optimization (PBO), a policy-gradient method that relies on a policy network to describe the density function of its forthcoming evaluations, and uses covariance estimation to steer the policy improvement process in the right direction. The specifics of the PBO algorithm are detailed, and the connection to evolutionary strategies (especially covariance matrix adaptation evolutionary strategy) is discussed. Relevance is assessed by benchmarking PBO against classical ES techniques on analytic functions minimization problems, and by optimizing various parametric control laws intended for the Lorenz attractor. Given the scarce existing literature on the topic, this contribution definitely establishes PBO as a valid, versatile black-box optimization technique, and opens the way to multiple future improvements building on the inherent flexibility of the neural networks approach.
Abstract
International audience
Additional details
- URL
- https://hal.archives-ouvertes.fr/hal-03432655
- URN
- urn:oai:HAL:hal-03432655v1
- Origin repository
- UNICA