Published September 14, 2022 | Version v1
Journal article

Policy-based optimization: single-step policy gradient method seen as an evolution strategy

Description

This research reports on the recent development of black-box optimization methods based on single-step deep reinforcement learning (DRL) and their conceptual similarity to evolution strategy (ES) techniques. It formally introduces policy-based optimization (PBO), a policy-gradient method that relies on a policy network to describe the density function of its forthcoming evaluations, and uses covariance estimation to steer the policy improvement process in the right direction. The specifics of the PBO algorithm are detailed, and the connection to evolutionary strategies (especially covariance matrix adaptation evolutionary strategy) is discussed. Relevance is assessed by benchmarking PBO against classical ES techniques on analytic functions minimization problems, and by optimizing various parametric control laws intended for the Lorenz attractor. Given the scarce existing literature on the topic, this contribution definitely establishes PBO as a valid, versatile black-box optimization technique, and opens the way to multiple future improvements building on the inherent flexibility of the neural networks approach.

Abstract

International audience

Additional details

Created:
December 3, 2022
Modified:
November 28, 2023