Efficient resubmission strategies to design robust grid production environments
- Creators
- Lingrand, Diane
- Montagnat, Johan
Description
Production grids exhibit high failure rates ham- pering the development of many large scale scientific ap- plications. End users require robust experiment production environments ensuring efficient resubmission of failed tasks. Proper parameterization of resubmission strategies is a com- plex problem that depends on the non-stationary workload conditions experienced by the infrastructure. In order to de- termine optimal resubmission parameters, probabilistic models of the overhead experienced by grid jobs are defined, taking into account the distribution of faults as measured on the infrastructure. Two strategies that can be implemented on the client side are proposed. Their models are evaluated under variable workload conditions to assess their validity along time. Their results are compared and a trade-off between usability and model accuracy is discussed.
Abstract
International audience
Additional details
- URL
- https://hal.archives-ouvertes.fr/hal-00677824
- URN
- urn:oai:HAL:hal-00677824v1
- Origin repository
- UNICA