How to compute Hessian-vector products?

Dagréou, Mathieu; Ablin, Pierre; Vaiter, Samuel; Moreau, Thomas

Published May 7, 2024 | Version v1

Publication Metadata-only

How to compute Hessian-vector products?

Contributors

Others:

Modèles et inférence pour les données de Neuroimagerie (MIND) ; IFR49 - Neurospin - CEA ; Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Inria Saclay - Ile de France ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
Apple Inc
Centre National de la Recherche Scientifique (CNRS)
Laboratoire Jean Alexandre Dieudonné (LJAD) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)
Modelling brain structure, function and variability based on high-field MRI data (PARIETAL) ; Service NEUROSPIN (NEUROSPIN) ; Université Paris-Saclay-Institut des Sciences du Vivant Frédéric JOLIOT (JOLIOT) ; Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Institut des Sciences du Vivant Frédéric JOLIOT (JOLIOT) ; Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Inria Saclay - Ile de France ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)

The product between the Hessian of a function and a vector, the Hessian-vector product (HVP), is a fundamental quantity to study the variation of a function. It is ubiquitous in traditional optimization and machine learning. However, the computation of HVPs is often considered prohibitive in the context of deep learning, driving practitioners to use proxy quantities to evaluate the loss geometry. Standard automatic differentiation theory predicts that the computational complexity of an HVP is of the same order of magnitude as the complexity of computing a gradient. The goal of this blog post is to provide a practical counterpart to this theoretical result, showing that modern automatic differentiation frameworks, JAX and PyTorch, allow for efficient computation of these HVPs in standard deep learning cost functions.

Abstract

https://iclr-blogposts.github.io/2024/blog/bench-hvp/

Additional details

URL: https://hal.science/hal-04869111
URN: urn:oai:HAL:hal-04869111v1

Origin repository: UNICA

	All versions	This version
Views	4	4
Downloads	0	0
Data volume	0 Bytes	0 Bytes

How to compute Hessian-vector products?

Creators

Contributors

Others:

Description

Abstract

Additional details

Identifiers

Origin repository