Experiments on Checkpointing Adjoint MPI Programs

Creators: Taftaf, Ala; Hascoët, Laurent

Other:: Transformations et outils informatiques pour le calcul scientifique (Ecuador) ; Inria Sophia Antipolis - Méditerranée (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)

Description

Checkpointing is a classical strategy to reduce the peak memory consumption of the adjoint. Checkpointing is vital for long run-time codes, which is the case of most MPI parallel applications. However, for MPI codes this question has always been addressed by ad-hoc hand manipulations of the differentiated code, and with no formal assurance of correctness. In a previous work, we investigated the assumptions implicitly made during past experiments, to clarify and generalize them. On one hand we proposed an adaptation of checkpointing to the case of MPI parallel programs with point-to-point communications, so that the semantics of an adjoint program is preserved for any choice of the checkpointed part. On the other hand, we proposed an alternative adaptation of checkpointing, more efficient but that requires a number of restrictions on the choice of the checkpointed part. In this work we see checkpointing MPI parallel programs from a practical point of view. We propose an implementation of the adapted techniques inside the AMPI library. We discuss practical questions about the choice of technique to be applied within a checkpointed part and the choice of the checkpointed part itself. Finally, we validate our theoretical results on representative CFD codes.

Abstract

International audience

Experiments on Checkpointing Adjoint MPI Programs

Description

Abstract

Additional details