Published December 7, 2010 | Version v1
Conference paper

Efficient resubmission strategies to design robust grid production environments

Description

Production grids exhibit high failure rates ham- pering the development of many large scale scientific ap- plications. End users require robust experiment production environments ensuring efficient resubmission of failed tasks. Proper parameterization of resubmission strategies is a com- plex problem that depends on the non-stationary workload conditions experienced by the infrastructure. In order to de- termine optimal resubmission parameters, probabilistic models of the overhead experienced by grid jobs are defined, taking into account the distribution of faults as measured on the infrastructure. Two strategies that can be implemented on the client side are proposed. Their models are evaluated under variable workload conditions to assess their validity along time. Their results are compared and a trade-off between usability and model accuracy is discussed.

Abstract

International audience

Additional details

Created:
December 3, 2022
Modified:
November 20, 2023